Speechify Voice Agents Memory — Recall Across Calls

By default an agent starts every call from scratch — even with a repeat caller. Memory changes that: after each call an extractor saves short, durable facts about the caller, and the next call injects the most relevant ones into the system prompt. Support bots stop re-asking for the caller’s plan, concierge agents recall preferred times, and retention flows resume where they left off.

To turn it on, see Add memory.

How it works

Extract

When a call ends, the server sends the transcript to a small LLM, which returns 0–5 short third-person facts about the caller, each with a confidence score.

Embed and store

Each fact is embedded (OpenAI text-embedding-3-large, the same model knowledge bases use) and stored in Postgres pgvector, scoped to (agent_id, caller_identity).

Retrieve

At the next call with the same caller, the server ranks facts by recency-weighted similarity and renders the top 10 as plain text.

Inject

That block replaces the {{memory}} placeholder in the system prompt before dispatch. No placeholder means no injection — but facts are still extracted and stored, so enabling it later grounds future calls.

Scope

Memory is keyed on agent × caller. Anonymous widget sessions — callers without a stable user_identity — are never recorded or retrieved.

What it keeps and drops

Keeps: preferences, identifiers, commitments, recurring needs, and constraints — anything a future call benefits from.

Drops: volatile details (mood, weather), one-off facts, and anything sensitive (health, card numbers, passwords). The extractor is tuned to emit zero facts rather than invent filler, so many calls produce none.

Memory vs. knowledge base

Both can run on the same agent; they solve different problems.

	Knowledge base	Memory
Grounds	Domain facts for every caller (manuals, policies)	Durable facts about the current caller
Retrieval	`search_knowledge` tool, called by the LLM when it needs context	Server-side at session start, injected into the prompt

The prompt handles the memory block; the LLM decides when to reach for the knowledge base. Many support agents use both.

Privacy and retention

memory_retention_days caps both retrieval visibility and the nightly cleanup job (0 means no cap; the default is 90). Deletes are soft immediately — unreachable from retrieval — and hard-deleted by the retention job.