Memory
Voice agents are amnesiacs by default — every conversation starts from scratch, even when the same caller is reaching the same agent for the tenth time. Memory turns that around: after each call an extractor writes short, durable facts about the caller, and at the next call the retriever injects the most relevant ones into the system prompt. Support bots stop asking for the caller’s plan, concierge agents remember preferred times, and retention flows can pick up where the last conversation left off.
How it works
- Extract — when a conversation ends (
room_finished), the server sends the transcript to a small LLM and asks for 0-5 short third-person facts about the caller with per-fact confidence scores. - Embed + store — each fact is embedded with the same OpenAI
text-embedding-3-largemodel we use for knowledge bases and stored in Postgres pgvector, scoped by(agent_id, caller_identity). - Retrieve — at the start of the next conversation with the same caller, the server ranks memories by recency-weighted cosine similarity and renders the top 10 into a plain-text block.
- Inject — the block substitutes for the
{{memory}}placeholder in the agent’s system prompt before dispatch. No placeholder in the prompt = no injection (but memory is still extracted and stored; enabling it later starts grounding future calls).
Memory is scoped per agent × caller. Anonymous widget sessions (callers without a stable user_identity) are never recorded and never retrieved from.
Enable it on an agent
Python
TypeScript
cURL
memory_retention_days caps both retrieval visibility and the nightly cleanup job — 0 means no cap.
Use {{memory}} in the prompt
The retrieved block is a numbered list that looks like this:
Reference it in your prompt wherever you want the facts grounded:
If memory_enabled is false, the placeholder is stripped — no literal braces reach the LLM.
Pass a stable caller identity
Memory pivots on caller_identity. The backend captures it from the LiveKit participant identity:
- Console test calls and authenticated SDK starts: identity is
user_<firebase-uid>automatically. - Widget (public session): pass
user_identitywhen starting the session. Use your own stable user ID (e.g. your product’s user_id or a hashed email).
Anonymous sessions (no user_identity) skip memory entirely.
List and delete memories
Python
TypeScript
cURL
What the extractor keeps and drops
Keeps: preferences, identifiers, commitments, recurring needs, constraints, things a future call will benefit from.
Drops: volatile details (today’s mood, weather), one-off factoids, anything flagged as sensitive at extraction time (health details, credit-card numbers, passwords). The extractor is tuned to emit zero facts rather than invent filler; it’s normal for many calls to produce no memories.
Try it end-to-end
Fastest path to confidence the feature works in your account:
- Create an agent whose prompt contains
{{memory}}— e.g. a scheduling assistant with the line “If you have prior knowledge about the caller, use it to personalise — don’t ask them to repeat things.” - Turn
memory_enabledon. - Place a test call and say something durable: “I can only do mornings before 9am; I’m vegetarian.” End the call.
- Open the conversation detail page — “Memories written this call” should list the facts within ~10s of the call completing.
- Place a second call as the same user and ask something open-ended: “What’s a good time next week?” The agent should surface your morning constraint without being re-asked.
Troubleshooting
If memory doesn’t seem to ground the call:
Interaction with Knowledge Base
Both search_knowledge (per-agent KB) and {{memory}} (per-caller) can be active on the same agent. They address different problems and compose cleanly:
- Knowledge base grounds domain facts that apply to every caller (product manuals, policies). Retrieved via the
search_knowledgetool the LLM calls when it needs context. - Memory grounds durable facts about the current caller. Retrieved server-side at session start and injected into the system prompt — no LLM tool call.
The agent’s prompt handles the memory block; the LLM decides when to reach for the KB. Nothing stops you from using both — many support agents will.
Tips
- Audit what’s written. Every conversation detail page shows “Memories written this call” so you can see the facts before they ever ground another call.
- Use short prompts around
{{memory}}. The injected block already speaks for itself — don’t wrap it in “Here are things I remember about you:” framing. The LLM handles it well. - Retention is non-negotiable for privacy. Default 90 days balances useful recall with the fact that most callers won’t remember what they told a bot 6 months ago. Enterprise tenants needing longer windows should raise it explicitly.
- Deletion is immediate. Memory deletes are soft by default (unreachable from search instantly) and hard-deleted by the retention job; no need for a separate purge request.