Author's note: This post was adapted from a presentation at the Recurse Center. If you prefer to explore hands-on first, the interactive app can be found here.
Suppose we have a text-to-SQL agent [1] and the Titanic dataset in our relational database [2]. To the input text "Number of passengers"
, a reasonable response from the agent is something like "SELECT COUNT(*) FROM passengers"
.
If the table name is dim_passengers
, however, the query should be "SELECT COUNT(*) FROM dim_passengers"
. Let’s take this a step further. What if we want only the passengers above the age of 21? How does the agent know the column to filter on is age
, passenger_age
, or something else?
To provide the necessary context to the agent, we’ll need to add a separate step to retrieve that context, and augment the generation by sharing that context alongside the input text. This is commonly referred to as retrieval-augmented generation, or RAG.
First, we load up the vector database with the table metadata. The table metadata for dim_passengers
could look like this (full table metadata and sample rows here).
{
"table_name": "dim_passengers",
"columns": [
{
"column_name": "passenger_id",
"column_type": "int",
"column_description": "Unique identifier for each passenger",
},
{
"column_name": "age",
"column_type": "float",
"column_description": "Age of the passenger in years",
},
...
}
Next we convert the table metadata into vector embeddings, which are numerical representations of text that capture semantic meaning in a high-dimensional space. We use OpenAI’s text-embedding-3-small
model, and end up with three vectors each of length 1,536.
{
"table_name":
"dim_passengers",
...
}
{
"table_name":
"dim_tickets",
...
}
{
"table_name":
"dim_ports",
...
}
# dim_passengers
[-0.027, -0.023, 0.047, ...]
# dim_tickets
[-0.068, -0.042, 0.048, ...]
# dim_ports
[-0.028, -0.004, 0.047, ...]
We then store the table metadata-vector pairs in the vector database.
Next, we figure out which table to send to the agent. To do this we convert the input text itself to a vector embedding, and calculate the distance between the input text vector and each of the three table metadata vectors [3]. For simplicity, we select the closest one.