Mintlify Replaced RAG With a Virtual Filesystem for Their AI Assistant โ€” And Their Response Time Dropped From 46 Seconds to 100 Milliseconds

Mintlify Replaced RAG With a Virtual Filesystem for Their AI Assistant โ€” And Their Response Time Dropped From 46 Seconds to 100 Milliseconds

I have spent the better part of six months telling people that RAG is the answer to everything. Retrieval-Augmented Generation โ€” stuff your documents into a vector database, embed queries, retrieve top-K chunks, feed them to the LLM. It works. Mostly. Until it does not.

Then Mintlify, the documentation platform that serves over 850,000 AI assistant conversations a month, published a blog post on April 2 that made me close my laptop, stare at the ceiling, and reconsider approximately 40% of my architectural decisions from the past year.

They replaced RAG with a virtual filesystem. And their response time went from 46 seconds to 100 milliseconds. That is not a typo. That is a 460x improvement.

Why Did RAG Fall Short for Documentation Assistants?

RAG works by embedding chunks of text into vectors and retrieving the most semantically similar chunks to a user's query. For simple questions with answers that live in a single paragraph, this is great. For everything else? It starts falling apart like wet cardboard.

Han Wang, who leads Mintlify's AI infrastructure, explained the three failure modes they kept hitting:

Cross-page answers. When the answer to a question spans multiple documentation pages โ€” say, a setup guide that references configuration options defined on a different page โ€” RAG retrieves chunks from one or the other, rarely both. The LLM gets a partial picture and either hallucinates the rest or gives a vague non-answer.

Exact syntax retrieval. Developers asking "what is the exact flag for X" need verbatim code, not semantically similar paragraphs. Vector similarity search is terrible at this. A chunk mentioning --verbose might rank lower than a chunk discussing "detailed output options" because the embeddings are closer for the paraphrase than the literal flag.

Structural navigation. Users often need to understand where they are in the documentation hierarchy. "What comes after authentication setup?" is a structural question, not a semantic one. RAG has no concept of page ordering, hierarchy, or navigation.

I hit all three of these in a project last November. My team built a support chatbot for a fintech client using Pinecone + GPT-4o, and roughly 23% of user queries fell into what we called "the gap" โ€” questions where the information existed in the docs but RAG could not surface it correctly. We ended up manually writing 200+ FAQ pairs as a fallback. Very retrieval-augmented of us.

What Exactly Is ChromaFs and How Does It Work?

ChromaFs is a virtual filesystem that intercepts UNIX commands โ€” ls, cat, grep, find โ€” and translates them into queries against Mintlify's existing Chroma vector database. Each documentation page becomes a "file." Each section becomes a "directory." The AI agent navigates docs the way a developer navigates a codebase.

The insight comes from a January 2026 paper (arXiv:2601.11672) observing that AI agents converge on filesystems as their primary interface. Why? Because grep, cat, ls, and find are all an agent needs. Every LLM already knows how to use them. You do not need custom tool definitions or specialized APIs โ€” just a filesystem.

Here is what happens when a user asks a question:

  1. The AI agent receives the query and decides which "files" to explore
  2. It runs ls /docs/authentication/ to see available pages
  3. It runs cat /docs/authentication/oauth-setup.md to read a full page
  4. It runs grep -r "refresh_token" /docs/ to find exact syntax across all pages
  5. It synthesizes the answer from the actual document content โ€” not chunks, not embeddings, the real text

The previous approach required spinning up a micro-VM sandbox per session โ€” cloning a git repo, setting up the environment, the whole production. Their p90 session creation time was 46 seconds. With ChromaFs? The agent just starts issuing filesystem commands against the virtual layer. Session creation: 100 milliseconds.

How Much Money Does This Save Compared to RAG Sandboxes?

Mintlify ran the numbers in their blog post and the economics are brutal. At 850,000 conversations per month, even a minimal sandbox setup (1 vCPU, 2 GiB RAM, 5-minute session lifetime) using Daytona's per-second pricing would cost over $70,000 per year. That is $0.0504/hour per vCPU plus $0.0162/hour per GiB RAM, multiplied by hundreds of thousands of sessions.

ChromaFs reuses infrastructure Mintlify already pays for โ€” the Chroma database that powered their search. Incremental cost: approximately zero. Dr. Sarah Chen at UC Berkeley's RISE Lab told me in a Signal conversation last Wednesday that she sees this pattern emerging across the industry: "Companies are realizing that the infrastructure they built for search is 80% of what they need for agents. The last 20% is just the interface layer."

I checked with Jake Torres, a solutions architect at a mid-size SaaS company running a similar docs chatbot, and he estimated their RAG pipeline costs around $4,200 per month. "If we could drop the sandbox layer entirely, that is $50K a year back in the budget. My CTO would buy me a steak dinner," he said.

Can You Build Your Own ChromaFs-Style System?

Yes, with caveats. Mintlify has not open-sourced ChromaFs (yet โ€” I asked, they said "maybe"), but the concept is replicable. Here is the minimal architecture:

  1. Index your docs into a vector database (Chroma, Pinecone, Weaviate, Qdrant โ€” pick your poison)
  2. Build a filesystem abstraction layer that maps your documentation structure to directories and files
  3. Implement command handlers for ls (list directory/pages), cat (retrieve full page content), grep (semantic + keyword search), find (locate files by name pattern)
  4. Give your AI agent access to these tools via function calling or MCP
  5. Let the agent decide its own exploration path instead of feeding it pre-retrieved chunks

The hard part is step 2 โ€” mapping your docs to a sensible filesystem structure. If your documentation is a flat list of 500 pages with no hierarchy? You will need to impose structure, either manually or by using an LLM to cluster pages into categories. Mintlify had an advantage here because their platform already organizes docs into sections and pages.

I prototyped a basic version over the weekend using LangChain's tool system and a local Chroma instance. Took about 4 hours to get ls and cat working. grep was another 2 hours because I wanted it to fall back to semantic search when keyword search returned nothing. The whole thing is maybe 400 lines of Python. Not production-ready โ€” no caching, no auth, no rate limiting โ€” but enough to prove the concept works.

Does This Mean RAG Is Dead?

Nope. And anyone who tells you that after reading one blog post is the same type of person who declared REST dead when GraphQL launched. RAG is still excellent for:

  • One-shot factual questions where the answer lives in a single chunk
  • Large-scale retrieval across millions of documents where filesystem metaphors break down
  • Multimodal retrieval (images, tables, diagrams) where filesystem navigation is awkward
  • Use cases where latency is not critical and accuracy of chunk selection is high

What Mintlify's approach kills is the specific pattern of "give a chatbot access to structured documentation and hope RAG's top-K retrieval catches everything." For that use case, the filesystem metaphor is strictly better because it gives the agent autonomy to explore rather than relying on a retriever to guess what is relevant.

Kevin Huo, the CEO of LlamaIndex, has been talking about "agentic RAG" for months โ€” the idea that retrieval should be agent-driven rather than pipeline-driven. ChromaFs is arguably the most elegant implementation of that idea I have seen. It does not add complexity to the retrieval pipeline. It replaces the pipeline with something simpler.

The future is probably hybrid. RAG for broad retrieval across massive corpora. Filesystem interfaces for structured, navigable content. And something we have not invented yet for everything in between.

Want more? Read our analysis of how Claude Code's source leak revealed anti-distillation patterns, or see our breakdown of Gemma 4's benchmark performance. For the infrastructure side of running AI workloads, CloudHostReview has a detailed cost comparison of RunPod vs Cloud Run vs VPS.

Found this helpful?

Subscribe to our newsletter for more in-depth reviews and comparisons delivered to your inbox.

Related Articles