Memory | Speechify API

Voice agents are amnesiacs by default — every conversation starts from scratch, even when the same caller is reaching the same agent for the tenth time. Memory turns that around: after each call an extractor writes short, durable facts about the caller, and at the next call the retriever injects the most relevant ones into the system prompt. Support bots stop asking for the caller’s plan, concierge agents remember preferred times, and retention flows can pick up where the last conversation left off.

How it works

Extract — when a conversation ends (room_finished), the server sends the transcript to a small LLM and asks for 0-5 short third-person facts about the caller with per-fact confidence scores.
Embed + store — each fact is embedded with the same OpenAI text-embedding-3-large model we use for knowledge bases and stored in Postgres pgvector, scoped by (agent_id, caller_identity).
Retrieve — at the start of the next conversation with the same caller, the server ranks memories by recency-weighted cosine similarity and renders the top 10 into a plain-text block.
Inject — the block substitutes for the {{memory}} placeholder in the agent’s system prompt before dispatch. No placeholder in the prompt = no injection (but memory is still extracted and stored; enabling it later starts grounding future calls).

Memory is scoped per agent × caller. Anonymous widget sessions (callers without a stable user_identity) are never recorded and never retrieved from.

Enable it on an agent

Python

TypeScript

cURL

1 from speechify import Speechify
2 
3 client = Speechify()
4 agent = client.tts.agents.update(
5     id="agent_01H...",
6     memory_enabled=True,
7     memory_retention_days=90,
8 )

memory_retention_days caps both retrieval visibility and the nightly cleanup job — 0 means no cap.

Use `{{memory}}` in the prompt

The retrieved block is a numbered list that looks like this:

Known facts about this caller:
1. Caller prefers weekday mornings for support calls.
2. Caller has a Premium plan renewal due next month.
3. Caller asked about family-sharing last call.

Reference it in your prompt wherever you want the facts grounded:

You are a support agent for Acme. Speak concisely.
{{memory}}
Greet the caller, confirm what they need, and escalate only if
you cannot resolve in one turn.

If memory_enabled is false, the placeholder is stripped — no literal braces reach the LLM.

Pass a stable caller identity

Memory pivots on caller_identity. The backend captures it from the LiveKit participant identity:

Console test calls and authenticated SDK starts: identity is user_<firebase-uid> automatically.
Widget (public session): pass user_identity when starting the session. Use your own stable user ID (e.g. your product’s user_id or a hashed email).

1 import { connectAgent } from "@speechify/agents-js";
2 
3 await connectAgent({
4   agentId: "agent_01H...",
5   apiKey: tempSessionKey,
6   userIdentity: "acme_user_42",   // → caller_identity on every memory
7 });

Anonymous sessions (no user_identity) skip memory entirely.

List and delete memories

Python

TypeScript

cURL

1 # all memories on an agent (newest first, capped at 500)
2 result = client.tts.agents.list_memories(id="agent_01H...")
3 for m in result.memories:
4     print(m.caller_identity, m.fact, m.confidence)
5 
6 # privacy: delete everything for one caller on one agent
7 client.tts.agents.delete_memories_by_caller(
8     id="agent_01H...",
9     agent_id="agent_01H...",
10     caller_identity="acme_user_42",
11 )

What the extractor keeps and drops

Keeps: preferences, identifiers, commitments, recurring needs, constraints, things a future call will benefit from.

Drops: volatile details (today’s mood, weather), one-off factoids, anything flagged as sensitive at extraction time (health details, credit-card numbers, passwords). The extractor is tuned to emit zero facts rather than invent filler; it’s normal for many calls to produce no memories.

Try it end-to-end

Fastest path to confidence the feature works in your account:

Create an agent whose prompt contains {{memory}} — e.g. a scheduling assistant with the line “If you have prior knowledge about the caller, use it to personalise — don’t ask them to repeat things.”
Turn memory_enabled on.
Place a test call and say something durable: “I can only do mornings before 9am; I’m vegetarian.” End the call.
Open the conversation detail page — “Memories written this call” should list the facts within ~10s of the call completing.
Place a second call as the same user and ask something open-ended: “What’s a good time next week?” The agent should surface your morning constraint without being re-asked.

Troubleshooting

If memory doesn’t seem to ground the call:

Symptom	Check
Second call asks the same question again	Is `{{memory}}` literally in the agent prompt? Missing placeholder = facts are stored but never injected.
”Memories written this call” section never appears	Was the caller authenticated? Anonymous widget sessions (no `user_identity` passed) are never recorded. Check the conversation row’s `caller_identity` — empty = anon.
Facts appear in the admin list but aren’t surfacing in-call	Confidence floor is 0.5 at retrieval. Facts with `confidence < 0.5` render in the admin tab but don’t enter the `{{memory}}` block.
Second call ignores facts from first	Same `caller_identity` on both? Memory pivots on `(agent_id, caller_identity)`. A widget embed that passes a different `userIdentity` counts as a different caller.
Extractor writes zero facts	Normal. The extractor is tuned to emit zero rather than invent filler — mundane calls often produce no memories.

Interaction with Knowledge Base

Both search_knowledge (per-agent KB) and {{memory}} (per-caller) can be active on the same agent. They address different problems and compose cleanly:

Knowledge base grounds domain facts that apply to every caller (product manuals, policies). Retrieved via the search_knowledge tool the LLM calls when it needs context.
Memory grounds durable facts about the current caller. Retrieved server-side at session start and injected into the system prompt — no LLM tool call.

The agent’s prompt handles the memory block; the LLM decides when to reach for the KB. Nothing stops you from using both — many support agents will.

Tips

Audit what’s written. Every conversation detail page shows “Memories written this call” so you can see the facts before they ever ground another call.
Use short prompts around {{memory}}. The injected block already speaks for itself — don’t wrap it in “Here are things I remember about you:” framing. The LLM handles it well.
Retention is non-negotiable for privacy. Default 90 days balances useful recall with the fact that most callers won’t remember what they told a bot 6 months ago. Enterprise tenants needing longer windows should raise it explicitly.
Deletion is immediate. Memory deletes are soft by default (unreachable from search instantly) and hard-deleted by the retention job; no need for a separate purge request.

How it works

Enable it on an agent

Python

TypeScript

cURL

Use {{memory}} in the prompt

Pass a stable caller identity

List and delete memories

Python

TypeScript

cURL

What the extractor keeps and drops

Try it end-to-end

Troubleshooting

Interaction with Knowledge Base

Tips

Use `{{memory}}` in the prompt