Speechify Voice Agents Knowledge Bases — RAG Search

A knowledge base is a bundle of documents (PDF, text, markdown, or HTML) an agent can consult during a call. You upload once; the server extracts, chunks, embeds, and indexes the content. Every agent attached to the knowledge base gets a built-in search_knowledge tool that retrieves the most relevant excerpts in real time.

To build one, see Create a knowledge base.

Why use it

The LLM only knows what’s in its prompt. To answer from product manuals, policies, FAQs, or runbooks, inlining everything is expensive and doesn’t scale past a few pages. A knowledge base lets the agent look up exactly the passage it needs, when it needs it.

How it works

Extract

The server reads the upload and extracts text — PDFs page by page (skip-on-error), HTML stripped to text, markdown and text passed through.

Chunk

Text is split into overlapping 1000-character windows (200-character overlap), preferring paragraph, then sentence, then word boundaries so each chunk reads as a coherent passage.

Embed

Chunks are embedded in batches with OpenAI text-embedding-3-large (1536 dimensions via Matryoshka truncation).

Index

Embeddings land in Postgres pgvector with a cosine IVFFlat index — sub-50ms approximate nearest-neighbor search.

Query

At call time, search_knowledge embeds the caller’s question, runs the search, and returns the top-k chunks — with filenames and scores — for the LLM to quote.

The search_knowledge tool

When a knowledge base is attached, search_knowledge is auto-registered on the agent’s next call — no prompt change needed. The LLM decides when to call it, and it’s scoped to exactly the attached knowledge bases. Every call is logged as a role=tool message on the transcript, so you can see the query the LLM used and the chunks it got back.

Good to know

One knowledge base per topic

A “Product manuals” KB and a “Billing policies” KB retrieve more relevantly than a single “Everything” KB, because search ranks within the whole pool.

Curate your sources

Out-of-date or contradictory documents will surface — the retriever has no way to know which version is correct.

Expect a little first-retrieval latency

search_knowledge adds one embedding round-trip and one DB query to the turn — typically 200–500ms, noticeable but not disruptive.