Author's note: This post was adapted from a presentation at the Recurse Center. If you prefer to explore hands-on first, the interactive app can be found here.

Suppose we have a text-to-SQL agent [1] and the Titanic dataset in our relational database [2]. To the input text "Number of passengers", a reasonable response from the agent is something like "SELECT COUNT(*) FROM passengers".

image.png

If the table name is dim_passengers, however, the query should be "SELECT COUNT(*) FROM dim_passengers". Let’s take this a step further. What if we want only the passengers above the age of 21? How does the agent know the column to filter on is age, passenger_age, or something else?

To provide the necessary context to the agent, we’ll need to add a separate step to retrieve that context, and augment the generation by sharing that context alongside the input text. This is commonly referred to as retrieval-augmented generation, or RAG.

Set up

First, we load up the vector database with the table metadata. The table metadata for dim_passengers could look like this (full table metadata and sample rows here).

{
    "table_name": "dim_passengers",
    "columns": [
        {
            "column_name": "passenger_id",
            "column_type": "int",
            "column_description": "Unique identifier for each passenger",
        },

        {
            "column_name": "age",
            "column_type": "float",
            "column_description": "Age of the passenger in years",
        },
        ...
}

Next we convert the table metadata into vector embeddings, which are numerical representations of text that capture semantic meaning in a high-dimensional space. We use OpenAI’s text-embedding-3-small model, and end up with three vectors each of length 1,536.

    {
        "table_name":
            "dim_passengers",
        ...
    }

    {
        "table_name":
            "dim_tickets",
        ...
    }

    {
        "table_name":
            "dim_ports",
        ...
    }

# dim_passengers
[-0.027, -0.023, 0.047, ...]

# dim_tickets
[-0.068, -0.042, 0.048, ...]

# dim_ports
[-0.028, -0.004, 0.047, ...]

We then store the table metadata-vector pairs in the vector database.

Retrieval

Next, we figure out which table to send to the agent. To do this we convert the input text itself to a vector embedding, and calculate the distance between the input text vector and each of the three table metadata vectors [3]. For simplicity, we select the closest one.

image.png