Knowledge Base
Ground your agent’s answers in your own documents with retrieval-augmented generation
A knowledge base is a bundle of documents (PDF, plain text, markdown, or HTML) that your voice agent can consult during a call. You upload once; the server extracts, chunks, embeds, and indexes the content. Every agent attached to the knowledge base gets a built-in search_knowledge tool that retrieves the most relevant excerpts in real time.
Why use it
The LLM only knows what’s in its prompt. If you need it to answer from product manuals, policy documents, an FAQ, or internal runbooks, inlining everything into the system prompt is expensive and doesn’t scale past a few pages. A knowledge base gives the agent a cheap, fast way to look up exactly the passage it needs, when it needs it.
Create a knowledge base
Python
TypeScript
cURL
Upload a document
Multipart upload. Max 10 MB per file.
Python
TypeScript
cURL
The response includes a status field that transitions from embedding to ready once every chunk is indexed. Upload is synchronous — expect a few seconds per megabyte of input.
Attach to an agent
Python
TypeScript
cURL
On the next conversation for that agent, search_knowledge is auto-registered as a function tool. The LLM decides when to call it based on the caller’s question; you don’t have to modify the agent prompt.
The tool is scoped to exactly the knowledge bases attached to the agent — it cannot query anything else, regardless of what the worker sends.
Search via the API
You can also run semantic search directly, outside a conversation. Useful for UIs that want to show grounded snippets, or for verifying what the agent would retrieve.
Python
TypeScript
cURL
Each hit includes the source filename, the chunk content, and a cosine-similarity score. Scores are relative — use them for ranking, not as an absolute confidence metric.
How it works
- Extract — the server reads the upload and extracts text. PDFs use per-page parsing with graceful skip-on-error; HTML is stripped to plain text; markdown and plain text pass through.
- Chunk — text is split into overlapping 1000-character windows with 200 characters of overlap. Chunk boundaries prefer paragraph breaks, then sentence ends, then spaces, so each chunk reads as a coherent passage.
- Embed — chunks are embedded in batches with OpenAI
text-embedding-3-large(1536 dimensions via Matryoshka truncation). - Index — embeddings land in Postgres
pgvectorwith a cosine-distance IVFFlat index. Search is ANN (approximate nearest-neighbor), sub-50ms on the indexed path. - Query — at call time, the
search_knowledgetool sends the user’s question to the server. The server embeds the query, runs the ANN search, and returns the top-k chunks with filenames and scores for the LLM to quote.
Tips
- One knowledge base per topic. A “Product Manuals” KB and a “Billing Policies” KB will retrieve more relevantly than a single “Everything” KB, because ANN search ranks within the whole pool.
- Curate your source documents. Out-of-date or contradictory documents will surface; the retriever has no way to know which version is correct.
- Expect a few seconds of latency on first retrieval. The
search_knowledgetool adds one embedding round-trip and one DB query to the turn. In our measurements this is typically 200-500ms — noticeable but not disruptive. - Monitor the transcript. Every
search_knowledgecall is logged as arole=toolmessage on the conversation, including the query the LLM used and the chunks returned. If the agent is answering incorrectly, that’s the first place to look.