LiteLLM vs Portkey vs OpenRouter: LLM Gateway Cost Control for Production AI in 2026
Hands-on comparison of LiteLLM, Portkey, and OpenRouter from running six AI products in production. Pricing, observability, guardrails, and the cost-bracket framework I use to pick between them.
If you ship more than one AI product on Claude, GPT, and Gemini at the same time, you eventually face the same fork in the road I hit at the start of 2026: keep wiring each app directly to provider SDKs and live with messy cost tracking, or put an LLM gateway in front and centralize routing, fallback, and observability. After running six AI-powered products through this rewiring exercise (SmartExam AI Generator, DiabeCheck Food Scanner, BizChat Revenue Assistant, DocSumm AI Summarizer, ServiceBot AI Helpdesk, and ContentForge AI Studio), I tested all three of the gateways everyone keeps recommending: LiteLLM, Portkey, and OpenRouter.
This is the comparison I wish someone had handed me three months ago. No vendor cheerleading — just what each one is actually good at, what it breaks on in production, and the spreadsheet math that decides which one wins for which monthly spend bracket.
The Three Bets: What Each Gateway Is Actually Optimizing For
The mistake most teams make is treating LLM gateways as interchangeable. They are not. Each one is making a different bet about how you should run AI in production:
- OpenRouter bets you want zero infrastructure and one API key for 200+ models — accept their hosted routing in exchange for never having to think about deployment.
- LiteLLM bets you want full control. Self-host the proxy, configure your own keys, route however you want, and pay only for the box you run it on.
- Portkey bets your real bottleneck is not the proxy itself but observability, governance, and guardrails — the production-safety layer most teams forget until something goes wrong.
I will say this upfront: there is no universal winner. The right answer depends on your monthly LLM spend, your DevOps capacity, and how badly you need audit trails. The body of this article is the framework I use to pick between them for each product line.
OpenRouter: The Zero-Setup Option
OpenRouter is the gateway you reach for when speed-to-first-call matters more than anything else. Sign up, fund credits, get one API key, and you have access to 200+ models from Anthropic, OpenAI, Google, Mistral, Meta, DeepSeek, and a long tail of open-weights providers — all through an OpenAI-compatible interface.
When I shipped the first version of ContentForge AI Studio, I used OpenRouter for the multi-model generation feature precisely because I did not want to negotiate with five different billing portals. One API key, one invoice, one place to set spending limits.
What OpenRouter Charges
The pricing model is the cleanest of the three: zero markup on inference. You pay the underlying provider rates as if you went direct. OpenRouter makes its money on a 5.5% fee when you purchase credits — so $1,000 of credits costs you $1,055.
For low-to-mid spend, this is a great deal. You get aggregated billing, fallbacks if a provider has an outage, and access to models that are otherwise tedious to provision (DeepSeek, certain regional Gemini variants). The catch is the part the marketing pages do not advertise: limited observability, no per-user budgets, no semantic caching, no guardrails, and no custom retry policies.
Where I Hit Its Limits
For ContentForge, where I wanted to know which prompt template was burning tokens and which user segment was over budget, OpenRouter's dashboard was not deep enough. You see total spend per model and rough latency, but if you want token attribution down to specific endpoints — say, "what did the trial users cost us last week on the Sonnet pipeline" — you are exporting CSVs and joining in your own database. At small scale that is fine. At any meaningful volume, it becomes a job.
LiteLLM: The Self-Hosted Workhorse
LiteLLM is the open-source proxy that took over my production stack for everything except the lightest workloads. It exposes 100+ providers behind one OpenAI-compatible API, supports cost tracking, virtual keys, guardrails, load balancing, fallbacks, and rate limits — and it costs zero dollars in license fees because it is Apache 2.0.
The current stable release as of March 2026 is v1.81.14, and the proxy itself is load-tested to 1,000 requests per second on a single instance, with SOC-2 Type 2 and ISO 27001 certifications for enterprise deployments. In other words, it is no longer a hobby project — it is the kind of infrastructure you are comfortable putting between your customers and a $40,000/month inference bill.
What It Actually Costs to Run
The dollar cost of LiteLLM is not zero — that is the tradeoff that catches teams off guard. I run it on a Hostinger VPS with 4 vCPU and 8 GB RAM at roughly $14/month, plus a managed PostgreSQL instance for virtual key storage and request logs at another $12/month. Add monitoring, log aggregation (I use Loki), and the engineer time to maintain it, and the real all-in cost lands around $40-$60/month for a small ops team running it lean.
That number sounds high until you compare it to gateway markup at high volume. A 2% markup on $20,000/month of inference is $400/month. A 5.5% fee on the same volume through OpenRouter is $1,100. At that point the LiteLLM operational cost is effectively free.
Why I Standardized on It for the Heavy Workloads
Three things pushed me toward LiteLLM for the production workloads on SmartExam and DocSumm:
- Virtual keys with budgets per project. Each AI product gets its own virtual key with a hard monthly cap. I learned this the hard way after a buggy retry loop on an early version of BizChat ate $87 of Claude tokens in one afternoon. Hard caps at the gateway level mean a runaway script tops out at a known number, not at "however much credit was on the master account."
- Realtime guardrails. LiteLLM's unified guardrail path supports Presidio for PII redaction, Bedrock guardrails, OpenAI Moderation, and custom-code guardrails — and as of 2024 the proxy guardrail layer is free, no enterprise license required. For the helpdesk product (ServiceBot), running input through Presidio before it ever hits Anthropic is non-negotiable for the regulated clients we work with.
- No vendor lock-in on observability. LiteLLM emits structured logs that flow into Langfuse, Helicone, or your own database. I write to a local PostgreSQL table and join it against my product analytics — something Portkey does not let me do as cleanly without paying for the per-log tier.
The Honest Downsides
LiteLLM is not effortless. You are running a Docker container, a Postgres instance, and a config file with provider routing rules. Upgrades mean reading the changelog and testing in staging. If your team has zero DevOps capacity — for instance, if you are a two-person solo SaaS — you will lose more time fighting deployment than you save in markup.
Portkey: The Observability-First Gateway
Portkey is the third option, and the one I use specifically when the question is "how do I prove to a customer or auditor what happened in this AI conversation last Tuesday."
It is positioned as the enterprise gateway, but the pitch I find honest is narrower: Portkey treats the gateway as the place where every request is logged, traced, attributed, and guard-railed by default. You see latency distributions, cost breakdowns by feature/user/model, error rates, guardrail violations, and cache hit rates in a real-time dashboard, with no extra integration work.
Pricing That Scales With Volume
Here is the part to read carefully, because it is the main tradeoff. Portkey has a free tier for prototyping, a $49/month production tier, and per-log pricing that climbs with throughput:
- 500K requests/month: ~$36 add-on (≈ $85 total)
- 1M requests/month: ~$81 add-on (≈ $130 total)
- 2M requests/month: ~$171 add-on (≈ $220 total)
For low-volume production apps with strong observability needs, this is a great deal. For high-volume apps where you are sending tens of millions of requests, you start to feel the per-log pricing. The escape valve is that as of March 2026, Portkey open-sourced their entire gateway under Apache 2.0. You can self-host the routing and guardrails for free; you only pay for the hosted observability platform if you want it managed.
What I Use Portkey For
I run Portkey in front of BizChat Revenue Assistant specifically because the customer in that flow is a paying B2B account, and "show me the prompt and the response that produced this revenue forecast" is a thing they ask for. The ability to pull a single request ID and see prompt, completion, latency, model, cost, and any guardrail decisions in one view turns a multi-hour audit response into a five-minute one. Whether that is worth $130/month is a per-product decision.
Side-by-Side: The Decision Matrix
| Capability | OpenRouter | LiteLLM | Portkey |
|---|---|---|---|
| Models supported | 200+ | 100+ | 200+ (via providers) |
| Self-hostable | No | Yes (Apache 2.0) | Yes since Mar 2026 (Apache 2.0) |
| Pricing model | 5.5% credit fee, zero markup | Free OSS, pay infra | Free OSS gateway + per-log tier |
| Per-user budgets | Limited | Yes (virtual keys) | Yes |
| Semantic caching | No | Plugin via Redis | Built-in (Pro+) |
| PII redaction / guardrails | No | Yes (Presidio, Bedrock, custom) | Yes (built-in) |
| Real-time dashboard | Basic | Via Langfuse/Helicone | Built-in |
| Setup time | ~5 minutes | ~2-4 hours | ~30 minutes |
| DevOps overhead | None | Medium | Low (managed) / Medium (self-host) |
The Cost-Bracket Decision Framework
I have run all three through real workloads now, and the framework I land on is simpler than the marketing pages suggest. It comes down to monthly LLM spend:
Under $2,000/month in inference
Use OpenRouter or Portkey Free. The 5.5% fee or the free Portkey tier is cheaper than the time and infra cost of running LiteLLM yourself. This is where every prototype, side project, and first-version SaaS should live. I started ContentForge here.
$2,000-$10,000/month in inference
All three are viable. Pick based on what you actually need:
- Need observability for compliance or audits? Portkey. The $130/month for 1M requests pays for itself the first time legal or a customer asks for an audit trail.
- Have a backend engineer with 4-6 hours of capacity? LiteLLM self-hosted. You will save the markup and own the data.
- Need it running by tomorrow? OpenRouter. Move on to the next problem.
Over $10,000/month in inference
LiteLLM becomes the clear cost winner. The 5.5% markup on OpenRouter is over $550/month and only goes up. Even Portkey's per-log tier starts to feel chunky if you are pushing high request volume. At this point you almost always have the DevOps capacity to run a proxy properly, and the savings fund the engineering time.
The hybrid pattern I use across my own products: LiteLLM as the primary self-hosted proxy, with Portkey or Langfuse hooked up as the observability sink for the products that need it. This lets you keep the cost ceiling low while still getting the dashboards where they matter.
What These Comparison Articles Usually Get Wrong
Most "best LLM gateway" listicles I read in early 2026 missed three things that actually matter in production. Calling them out:
- Latency overhead is real but often invisible. Adding any proxy adds 10-40ms to your tail latency. For chat UX this is invisible. For low-latency real-time use cases (voice agents, streaming code completions), test before you commit. I measured 18ms median added by LiteLLM running on the same VPS region as the app, and 31ms median through OpenRouter from a Singapore region.
- Fallback chains are only as good as your testing. Every gateway sells "automatic fallback if Anthropic goes down." None of them tell you that if your prompt uses Claude-specific tags like
<thinking>, the fallback to GPT will produce subtly broken output. Test your fallback chain with chaos engineering, not vendor demos. - The lock-in is the integration, not the API. All three are OpenAI-compatible. Switching the proxy URL is a 10-minute job. Switching the dashboards, the budget rules, the per-key configurations, the guardrails — that is a project. Pick based on where you want that operational sunk cost to live.
FAQ
Can I use more than one gateway at the same time?
Yes — and I do. LiteLLM in front of the heavy workloads, OpenRouter for prototyping, and Portkey wrapping the audit-sensitive product. The OpenAI-compatible interface means each app just points at a different base URL.
Does putting a gateway in front break streaming or tool use?
All three handle SSE streaming and OpenAI-style tool calls correctly in 2026. Anthropic's native tool format and extended thinking are passed through with minor caveats — LiteLLM exposes them most transparently in my testing.
Is OpenRouter's 5.5% credit fee really the only cost?
Mostly. Watch for the small markup on certain BYOK (bring-your-own-key) models, and the fact that some providers route through their cheaper "free tier" endpoints with stricter rate limits unless you explicitly opt out.
Is Portkey's open-source gateway feature-equivalent to the managed version?
The routing, guardrails, and config primitives are the same. The hosted dashboard, the observability storage, and the prebuilt analytics views are the part you pay for if you want them managed.
What about Helicone, Langfuse, and the others?
They are observability tools, not full gateways. I run Langfuse alongside LiteLLM as the analytics layer; it does that one job better than any embedded dashboard. Pick a gateway first, then pick the observability that fits.
My Recommendation
If you are starting today and have not picked a gateway yet, here is the path I would take:
Start with OpenRouter in week one. You need to ship; the 5.5% fee is the cheapest possible price for not thinking about infrastructure. Validate the product. When you cross roughly $2,000/month in inference or your customers start asking "where do these answers come from," reassess.
From there, go to LiteLLM self-hosted if you have backend engineering capacity, or Portkey if observability and audit trails are your real bottleneck. You will probably end up running both, the way I do, with each one in front of the workload it is best at.
The wrong move is to overthink this in week one. The right move is to pick the cheapest viable option, ship the product, and migrate the gateway later when the volume justifies it. None of these three lock you in deeply enough to make migration painful — that part of the market has matured nicely. Use that to your advantage.
Enjoyed this article?
Get more AI insights — browse our full library of 103+ articles and 373+ ready-to-use AI prompts.