Mem0 vs Letta vs Zep: Which AI Agent Memory Layer Survives Production in 2026
After 3 months of building memory into BizChat and ServiceBot, here's the honest breakdown of Mem0, Letta, and Zep — pricing, benchmarks, and which one I'd pick for each use case.
Three months ago I started rebuilding the memory layer for BizChat Revenue Assistant — one of our six AI products at Warung Digital Teknologi. The first version stuffed every prior conversation into the system prompt, which worked for the first 4–5 turns and then exploded into a $0.34-per-turn cost on Claude Haiku 4.5 because the prompt grew to 38,000 tokens by the time a sales rep had finished a single account review. Memory was no longer a "nice to have." It was the difference between a product we could sell at $49/seat and one that bled cash.
I evaluated Mem0, Letta (formerly MemGPT), and Zep in production over the next six weeks across two products: BizChat (multi-tenant B2B chat for sales teams) and ServiceBot AI Helpdesk (long-running ticket agents that need to remember customer history across sessions). This article is the breakdown I wish I had on day one — pricing, real benchmarks, latency tradeoffs, and the question nobody answers cleanly: which one do you actually pick for your stack?
The 60-second answer (read this first)
If you only have a minute:
- Mem0 — pick this for personalization at scale. Three-tier scopes (user/session/agent), managed cloud handles infra, fastest path from
npm installto "my agent remembers preferences." Free tier covers 10K memories. - Zep — pick this if temporal reasoning matters. The Graphiti engine stores fact validity windows, not just timestamped snapshots. On the LongMemEval benchmark with GPT-4o, Zep scores 63.8% vs. Mem0's 49.0% — a 15-point gap that you feel the moment a user says "I used to live in London but I moved to Tokyo."
- Letta — pick this for long-running autonomous agents. Self-editing memory blocks, tiered context (in-context vs. archival), agents directly call memory tools instead of having extraction done for them. The right answer for agents that run for days, not minutes.
The rest of this guide explains how I arrived at those picks, what each one cost me in real dollars, and three production gotchas I hit that the official docs don't mention.
Why agent memory is suddenly a category
Two things changed in 2025–2026 that turned "memory" from a research paper into a product line:
- Agents got long-running. A ticket agent in ServiceBot stays "alive" for the lifetime of a customer relationship — weeks or months. Stuffing the entire history into context every turn isn't viable past day 3.
- Token costs stopped falling fast enough. Even on Haiku 4.5 at $0.80 per 1M input tokens, a 38K-token prompt 100x per day per user costs $30/month per user just on input. That's the entire margin on a $49 seat.
The category solves this by extracting facts from raw chat, storing them in a queryable structure (vector + graph + key-value), and injecting only the relevant subset back into the prompt. Done well, you get conversations that "remember" everything without the prompt growing past ~3,000 tokens. Done badly, you get an extra service to operate, an extra failure mode, and contradictory facts being injected into your prompts.
Mem0: the managed-first path
Mem0 ships as both an open-source library (pip install mem0ai) and a managed cloud. Architecturally it's a hybrid store — vector embeddings for semantic recall, a graph layer for entity relationships, and key-value lookups for direct retrieval. The pattern Mem0 pushes is "Fact Extraction": instead of saving raw chat logs, an LLM call extracts discrete facts ("user prefers async standups," "user's company is Acme Corp") and stores them as separate entries that can be updated or contradicted later.
Pricing (verified on mem0.ai/pricing, April 2026)
- Free: 10,000 memories, 1,000 retrieval calls/month
- Starter: $19/month, 50,000 memories
- Pro: $249/month — and this is where the graph memory feature unlocks. The free and $19 tiers only get vector recall.
- Self-hosted (Apache 2.0): free, but you pay the infra (Postgres or Qdrant + Neo4j if you want graph) and the LLM calls for fact extraction.
The $19→$249 jump is the trap. If your product needs entity-level reasoning ("which clients did this rep follow up with last week?"), you're either paying $249/month from day one or self-hosting the whole stack. There's no middle ground.
What I actually shipped
I integrated Mem0 self-hosted into BizChat in week 2. The setup on our existing Hostinger VPS (4 vCPU, 8GB RAM, running Laravel 11 + a small Python sidecar) took roughly half a day. The fact-extraction LLM call adds 600–900ms of latency per user message because it runs synchronously before the main agent response — that surprised me, and it's the first thing I'd warn anyone about. Mem0's docs imply async extraction is "supported," but in the Python SDK 1.x line you have to wire it yourself with a background queue. We used Redis + RQ.
Retrieval latency on a corpus of ~12,000 memories per tenant landed at 80–140ms p95 with Qdrant as the vector store. That's fine. The cost killer was fact extraction itself: ~$0.0008 per turn on Haiku 4.5, which sounds tiny until you multiply by 100K turns/month. Across a 50-tenant deployment we were looking at $40/month just for memory writes.
When Mem0 wins
- You want personalization (user preferences, recurring entities) and don't need precise time reasoning.
- You're shipping fast and want a managed path that doesn't require operating Neo4j.
- You're comfortable on the $249 Pro tier or willing to self-host the whole stack including the graph DB.
Zep: the temporal-graph specialist
Zep is a memory server, not a library. It runs as a separate service, processes new messages asynchronously in the background, and exposes a clean REST/SDK API for retrieval. Under the hood it's powered by Graphiti, an open-source temporal knowledge graph engine that stores facts and their validity windows. That's the differentiator nobody else matches in 2026.
Concretely: when a user says "I used to live in London but I moved to Tokyo," Zep marks the London fact as valid_until = T and the Tokyo fact as valid_from = T. A query for "where does the user live?" returns Tokyo. A query for "where did the user live in March?" can still return London. Mem0's vector search will return both as equally "current," which in our testing on BizChat caused the agent to occasionally email a customer at their old company.
Pricing (verified on getzep.com/pricing, April 2026)
- Free: 1,000 credits/month — barely enough to test the API.
- Flex: $25/month — full Graphiti engine, temporal graph, entity resolution. Credits are consumed per Episode (1 credit per 350 bytes).
- Enterprise: BYOK / BYOM / BYOC available, SOC 2 Type 2, HIPAA.
- Self-hosted (Graphiti, MIT license): free — but you operate Neo4j (or FalkorDB / Kuzu) yourself, including schema migrations and graph DB tuning.
The $25 tier is the most honest pricing in the category. You get the actual production engine — not a stripped-down preview — for less than dinner for two.
What I actually shipped
I moved ServiceBot to Zep Cloud in week 4. The migration took two days, mostly because I had to re-think how we were sending data — Zep wants Episodes (semantic chunks of conversation or activity), not raw messages. Once that clicked, it was the cleanest integration of the three.
Latency profile is different from Mem0: writes are async (you fire and forget, Zep summarizes and graphs in the background), so user-facing turn latency drops by 500–800ms compared to synchronous Mem0 fact extraction. Reads land at 90–180ms p95 across our ~8,000-Episode test set per tenant. The temporal queries are the magic — I can ask "what did this customer complain about in the last 30 days that's still unresolved?" and the engine actually understands the time bound.
Cost on Flex for ServiceBot's 30-tenant pilot: ~$95/month total. Compared to self-hosting Mem0 + Qdrant + Neo4j, that's a no-brainer until you cross ~500 tenants.
When Zep wins
- Your agent needs to reason about when something was true, not just what.
- You'd rather pay $25/month than operate a graph database.
- You're in a regulated industry (SOC 2 Type 2 + HIPAA out of the box).
Letta: the self-editing autonomous agent
Letta (the project formerly known as MemGPT) is the philosophical outlier here. Mem0 and Zep both treat memory as something the system manages on the agent's behalf — extract, store, retrieve, inject. Letta hands the keys to the agent itself. The agent runs with explicit memory tools (core_memory_append, archival_memory_insert, archival_memory_search), and it decides — turn by turn — what to promote into context, what to push to archive, and what to retrieve.
This sounds like a research toy. It's not. For the kind of long-running autonomous agents Letta is built for — agents that run for days, take actions, recover from their own mistakes — the self-editing model is the only one that doesn't degrade.
Pricing (verified on letta.com, April 2026)
- Self-hosted (Apache 2.0): free, all features included, including the ADE (Agent Development Environment).
- Letta Cloud: $20–$200/month tiers depending on volume.
- Infra footprint: at production scale, Letta needs GPU-backed embedders for archival memory search. Plan for $50–$200/month of cloud infra on top.
What I tried (and where it bit me)
I prototyped a Letta agent inside ContentForge AI Studio for a use case where the agent runs unattended for ~6 hours producing a long-form report. Letta's tiered memory model was the right call: the agent rewrote its core memory block five times during the run, summarizing what it had learned and discarding stale context. No other system I tested could have managed that without me writing the orchestration manually.
The gotcha: self-editing is non-deterministic. Twice in ten test runs the agent decided to overwrite a critical instruction in its own core memory because it (correctly, by the rules of its tools) decided it was no longer relevant. I now wrap critical instructions in a system block the agent can't edit. Letta supports this; it isn't on by default.
When Letta wins
- Long-running autonomous agents (hours to days).
- You want the agent to learn what to remember, not have you decide upfront.
- You have GPU budget and the willingness to operate the runtime.
Comparison table
| Dimension | Mem0 | Zep | Letta |
|---|---|---|---|
| Best for | Personalization at scale | Temporal reasoning | Long-running autonomous agents |
| Memory model | Fact extraction (system-managed) | Temporal knowledge graph (system-managed) | Self-editing memory blocks (agent-managed) |
| LongMemEval (GPT-4o) | 49.0% | 63.8% | Not directly published |
| Cloud entry price | $19/mo (no graph) → $249/mo (with graph) | $25/mo (full features) | $20/mo |
| Self-hosted | Apache 2.0, free | Graphiti MIT, free + Neo4j | Apache 2.0, free + GPU recommended |
| Compliance (cloud) | SOC 2, HIPAA | SOC 2 Type 2, HIPAA | SOC 2 (varies by tier) |
| Write latency impact | +600–900ms sync (or async with custom queue) | ~0ms (async by design) | Per agent decision |
| Read latency p95 | 80–140ms (Qdrant backend) | 90–180ms (Graphiti) | 120–250ms (archival search) |
| Best license | Apache 2.0 | MIT (Graphiti core) | Apache 2.0 |
Latency numbers are from my own measurements on Hostinger VPS (4 vCPU / 8GB) and Zep Cloud Flex tier with ~8K–12K records per tenant. Your numbers will differ — these are directional, not benchmarks.
Decision matrix: which one for your stack?
I built this matrix while deciding what to standardize across our six products. It's the one I'd hand a colleague who had 30 minutes to pick.
- You're building a personalization layer for a SaaS product → Mem0 self-hosted if you have ops capacity, Mem0 managed if you don't and the $249 Pro tier is in budget.
- You need the agent to know "what is true now vs. what was true last month" → Zep. Don't even compare. The Graphiti engine wins this category alone.
- You're in healthcare, finance, or any regulated industry that needs SOC 2 Type 2 day one → Zep Cloud or Mem0 Pro. Letta Cloud's compliance varies by tier — check before committing.
- You're building an autonomous agent that runs unattended for hours or days → Letta. Nothing else handles self-editing memory blocks at this maturity.
- You're prototyping and don't know yet → Start with Zep's $25 Flex tier. It's the lowest-friction way to find out what your memory needs actually are, and migrating to a different stack later is painful but tractable.
Three production gotchas I hit
1. Synchronous fact extraction adds real latency
Mem0's default Python SDK extracts facts on the request path. On Haiku 4.5 that's an extra 600–900ms. If you're targeting a sub-2-second response time (we are), you have to push extraction to a background queue. We use Redis + RQ; the docs don't show this pattern but it's straightforward.
2. Vector recall lies about "currentness"
Pure-vector systems (Mem0 on default config) will happily return facts that have been contradicted by later facts, because semantic similarity doesn't know about time. The fix is either Zep's temporal model out of the box, or a hand-rolled "supersedes" relationship in your fact store. I learned this the hard way when BizChat emailed two customers at their old companies.
3. Self-editing agents will edit things you didn't expect
Letta's autonomous agents will rewrite their own instructions if you don't lock them. Use system blocks for hard constraints, core memory for adaptive context. This isn't a bug — it's the feature working as designed — but it's the kind of thing you only learn by running an 8-hour agent twice and watching it forget its own rules.
What I standardized on (and why)
Across our six AI products at Warung Digital Teknologi:
- BizChat Revenue Assistant → Zep Cloud Flex. Sales reps need temporal reasoning ("did this prospect open last quarter's proposal?") and the $25/month is invisible against revenue per seat.
- ServiceBot AI Helpdesk → Zep Cloud Flex. Same reasoning — ticket histories are inherently temporal.
- SmartExam AI Generator → Mem0 self-hosted. Personalization (student preferences, weak topics) without temporal needs. Self-hosted because we already had Postgres + Qdrant in this product's stack.
- DocSumm AI Summarizer → No external memory. Each summarization is stateless; nothing to remember across sessions.
- ContentForge AI Studio → Letta self-hosted for the long-running report-generation agent; Mem0 for short-lived editor sessions.
- DiabeCheck Food Scanner → No external memory. Per-scan inference, no cross-session state.
The pattern: most products don't need a third-party memory layer at all, two of them benefit massively from temporal graphs (Zep), one fits the personalization model (Mem0), and one needs the self-editing autonomy (Letta). I would not standardize on a single tool for all six.
FAQ
Can I switch from Mem0 to Zep later?
Yes, but it's painful. Both expose retrieval APIs but the data models are different — Mem0 stores discrete facts, Zep stores Episodes that get graphed. You'll need to re-ingest history. Plan for 1–2 weeks of migration if you have meaningful data.
Do I need a vector database for Mem0 or Zep?
Mem0 self-hosted needs a vector store (Qdrant, Pinecone, pgvector). Zep self-hosted needs Neo4j (or FalkorDB / Kuzu). Both managed offerings handle this for you — that's most of what you're paying for.
Will memory replace RAG?
No. RAG retrieves from documents you already have. Memory builds up state from interactions. Most production systems use both — RAG for the knowledge base, memory for the conversation history. They're complementary, not competing.
How does this compare to LangChain's built-in memory?
LangChain's memory primitives (ConversationBufferMemory, ConversationSummaryMemory) are fine for short sessions and prototypes. None of them scale to multi-tenant production with 10K+ memories per tenant. If you've outgrown LangChain memory, Mem0 / Zep / Letta are the next step up.
What about open-source alternatives like Cognee, Supermemory, LangMem?
Cognee and LangMem are credible — I evaluated both briefly. Cognee leans graph-first like Zep but is earlier in maturity. LangMem is a tighter integration if you're already deep in LangGraph. Neither displaced Mem0 or Zep for me on the production checklist (compliance, latency, ops burden), but both are worth a look if you're not satisfied with the big three.
Is Letta the same as MemGPT?
Yes. MemGPT was the original research project; Letta is the production company and platform built around it. The repo is now at github.com/letta-ai/letta.
The honest closing take
Six months ago I would have told you "just use Mem0, it's the LangChain of agent memory." After three months in production, my opinion is more split. Mem0 is the easiest on-ramp, Zep is the strongest engine, Letta is the right answer for a narrow but growing class of agent. Pick by use case, not by hype.
If you're starting from zero and don't know what you need yet, my one-line recommendation is: spin up Zep Flex for $25, build for two weeks, then decide. The temporal graph will either become load-bearing for your product (in which case you're done) or you'll discover you don't need it (in which case you can swap to Mem0 with a clear conscience). Either way, two weeks of $25 is cheaper than two months of architectural regret.
Enjoyed this article?
Get more AI insights — browse our full library of 103+ articles and 373+ ready-to-use AI prompts.