Tools & APIs

OpenRouter vs LiteLLM vs Portkey: Best LLM Gateway in 2026

Your production AI application probably uses more than one model. Claude for reasoning, GPT-4o for function calling, Gemini Flash for cheap…

March 21, 2026·20 min read·4,192 words

Your production AI application probably uses more than one model. Claude for reasoning, GPT-4o for function calling, Gemini Flash for cheap classification, Llama for self-hosted fallback. Without a gateway layer, you're maintaining separate API clients, handling failovers manually, and running blind on costs until the invoice hits.

LLM gateways solve this by sitting between your application and model providers. One API format, automatic routing, fallback chains, cost tracking. Three platforms dominate: OpenRouter (the marketplace), LiteLLM (the open-source proxy), and Portkey (the observability-first gateway).

Each makes a fundamentally different bet. OpenRouter bets you want access to every model through a single API key with zero infrastructure. LiteLLM bets you want full control, self-hosted, with zero vendor lock-in. Portkey bets you need enterprise-grade observability and reliability features more than you need another API proxy. Here's how they compare.

Quick Comparison

Feature	OpenRouter	LiteLLM	Portkey
Type	Hosted marketplace	Self-hosted proxy (OSS)	Managed gateway
Models	300+ (all providers)	100+ providers supported	250+ (all providers)
Pricing model	5.5% credit purchase fee	Free (MIT license)	Per recorded log
Token markup	None (pass-through)	None	None
Free tier	✅ Some free models	✅ Unlimited (self-hosted)	✅ 10K logs/month
Observability	Basic usage stats	DIY (Prometheus/Grafana)	✅ Built-in (logs, traces, analytics)
Fallback/retry	✅ Provider routing	✅ Configurable	✅ Advanced (circuit breakers)
Rate limiting	Per-model limits	✅ Per-key budgets	✅ Per-key/user
Semantic caching	❌	❌ (external)	✅ Built-in
Guardrails	❌	✅ (Enterprise)	✅ Built-in
BYOK	✅ (5% fee)	✅ (native)	✅
Self-hostable	❌	✅	❌ (edge deployment)
Compliance	—	DIY	SOC2, ISO 27001, HIPAA
Best for	Developers, prototyping	DevOps teams, self-hosters	Enterprise, production AI

OpenRouter: The Model Marketplace

OpenRouter is the simplest answer to "I need one API key for every model." Sign up, add credits, and call 300+ models — Claude, GPT-4o, Gemini, Llama, DeepSeek, Mistral, and hundreds more — through a single OpenAI-compatible endpoint. No infrastructure, no provider accounts, no configuration.

Think of OpenRouter as the Stripe of LLMs: it handles provider credentials, routing, and billing while you focus on building. It's what most developers use before they need a gateway — and many keep using it long after.

How OpenRouter Pricing Works

OpenRouter's pricing model is unusually transparent:

No per-token markup. You pay the exact same token rate as going directly to each provider. Claude Sonnet at $3/$15 per million tokens on Anthropic costs $3/$15 on OpenRouter.
5.5% credit purchase fee. When you buy credits (minimum $0.80), OpenRouter takes 5.5%. This is the entire business model. Buy $100 in credits, get $94.50 in model usage.
BYOK (Bring Your Own Key): Use your own provider API keys through OpenRouter's routing. 5% fee on underlying provider cost, deducted from your credit balance.
Free models available. Several models (Llama variants, Mistral, some fine-tunes) are available at zero cost, funded by OpenRouter or community providers.

What does 5.5% actually mean? For a team spending $1,000/month on LLM APIs, OpenRouter adds $55 in fees. That's less than one hour of engineer time — and less than you'd spend building and maintaining direct integrations with multiple providers.

OpenRouter Strengths

Zero friction. One API key, one endpoint, 300+ models. No provider accounts, no credential management, no rate limit juggling. For developers who want to test Claude vs GPT-4o vs Gemini on the same dataset, OpenRouter eliminates the setup entirely. It's why OpenRouter has become the default API for tools like Cursor, Windsurf, and Cline.

Provider routing. OpenRouter automatically routes requests to the best available provider for each model. If one provider is down or slow, it fails over transparently. For models available from multiple providers (like Llama via Together AI, Fireworks, or Lepton), OpenRouter picks the cheapest or fastest endpoint based on your preferences.

Model discovery. The OpenRouter catalog is the best single-page view of the LLM market. Sort by price, context window, speed, or modality. Compare Claude 4 Sonnet vs GPT-4o vs Gemini 2.0 Flash side-by-side with real pricing. For teams evaluating models for different use cases, this visibility is invaluable.

OpenAI-compatible. Drop-in replacement for the OpenAI SDK. Change base_url to https://openrouter.ai/api/v1, swap your API key, and your existing code works. Every model responds in the same format, regardless of provider.

Free models for development. Several models are free (rate-limited), making OpenRouter the cheapest way to prototype. Combined with the best free AI APIs, you can build entire prototypes without spending anything.

OpenRouter Weaknesses

No observability. OpenRouter shows basic usage statistics, but there's no request-level logging, latency tracing, prompt analytics, or cost attribution by feature/user. For production applications, you're flying blind unless you build your own monitoring layer on top.

No guardrails or governance. No content filtering, no prompt management, no output validation. OpenRouter routes your request and returns the response — nothing in between. For applications that need input/output guardrails, you'll need a separate layer.

5.5% adds up at scale. At $10,000/month in LLM spend, you're paying $550 in OpenRouter fees. At $100,000/month, $5,500. For high-volume production applications, direct provider integrations (or a self-hosted proxy like LiteLLM) become more cost-effective.

No SLA. OpenRouter doesn't offer enterprise SLAs. If OpenRouter goes down, your application goes down — unless you've built failover to direct provider APIs. For mission-critical applications, this is a non-trivial risk.

Limited caching. No semantic caching or prompt caching passthrough (though individual providers like Anthropic and OpenAI handle their own caching). For workloads with repetitive prompts — common in agent frameworks and prompt caching strategies — you're leaving money on the table.

Best For

Individual developers, startups, and small teams who need access to many models without infrastructure overhead. Prototyping and model evaluation. AI-powered tools and coding assistants that need multi-model access. Projects where convenience matters more than observability or cost optimization at scale.

LiteLLM: The Self-Hosted Proxy

LiteLLM takes the opposite approach from OpenRouter: instead of a hosted service, it gives you an open-source proxy server that you deploy on your own infrastructure. MIT-licensed, Python-based, OpenAI-compatible. No vendor lock-in, no per-request fees, no data leaving your network.

For teams with DevOps capacity and a privacy-first mindset, LiteLLM is the gateway that gets out of your way. You control everything: routing logic, caching strategy, rate limits, model access, and every byte of data flowing through the system.

How LiteLLM Pricing Works

The software is free. MIT license, no usage fees, no token markup, no log limits. You download it, deploy it, and pay nothing to LiteLLM.

What you actually pay for:

Cost Layer	Monthly Cost	Details
LiteLLM software	$0	MIT license, unlimited usage
Infrastructure (server + DB)	$200–$500	Kubernetes, PostgreSQL, Redis, load balancing
Monitoring stack	$200–$800	Prometheus, Grafana, ELK, PagerDuty
DevOps time (20 hrs/mo)	~$1,500–$2,500	Maintenance, patches, incidents, scaling
LLM provider costs	Variable	Direct provider rates, zero markup
Total (excl. LLM costs)	$400–$3,800	Depends on scale and DevOps rates

Enterprise tiers (optional):

Tier	Price	Key Features
Enterprise Basic	$250/month	Prometheus metrics, guardrails, JWT auth, SSO, audit logs
Enterprise Premium	$30,000/year	Compliance, priority support, advanced features

Most teams use the free open-source version. The enterprise tiers are for organizations that need vendor-backed support and compliance certifications.

LiteLLM Strengths

Total control. Every configuration choice is yours: which providers to route to, how to handle failovers, what to cache, who gets access to which models. LiteLLM's proxy config file is a YAML document that describes your entire routing strategy. No black-box decisions.

Zero vendor lock-in. Your proxy, your infrastructure, your data. If LiteLLM the project disappears tomorrow, you have the source code. Fork it, maintain it, or replace it. For organizations with strict vendor risk policies, this matters.

No markup, ever. LiteLLM adds zero cost to your LLM provider bills. At $100,000/month in token usage, you save the $5,500 that OpenRouter would charge. At scale, this pays for the entire DevOps overhead and then some.

Privacy by default. No data leaves your network except to the LLM providers you explicitly configure. No third-party logging, no telemetry, no analytics sent elsewhere. For healthcare, legal, and financial applications, this is often a hard requirement.

Per-key budgets and rate limits. Assign API keys to teams, users, or applications with individual budget caps and rate limits. When your intern's experimental agent burns through $500 in an hour, the budget cap stops it at $50. Critical for organizations with multiple teams using shared LLM infrastructure.

OpenAI-compatible. Same as OpenRouter — swap base_url to your LiteLLM proxy endpoint, and existing OpenAI SDK code works unchanged. Compatible with 100+ LLM providers including all major APIs, local Ollama instances, and custom endpoints.

Pairs with self-hosted models. LiteLLM can route to locally-hosted models (Ollama, vLLM, TGI) alongside cloud APIs. Run Llama 70B on your own RTX 4090 for cheap workloads, fall back to Claude for complex reasoning. Unified interface, hybrid routing. If you're running local LLMs on Apple Silicon or cloud GPUs, LiteLLM connects them all under one API.

LiteLLM Weaknesses

You own the ops. Deployment, scaling, monitoring, patching, incident response — it's all your responsibility. A 2 AM outage means your on-call engineer, not a vendor support team. Initial setup takes 2-4 weeks for a production-grade deployment. Teams without DevOps experience will struggle.

Observability is DIY. The open-source version includes basic logging, but production-grade observability requires you to set up Prometheus, Grafana, ELK, and alerting from scratch. Enterprise Basic ($250/mo) adds Prometheus metrics, but it's still far from Portkey's built-in analytics dashboard.

Community support only (free tier). Bug reports go to GitHub issues. Feature requests compete with hundreds of others. No SLA on response time. For the enterprise tiers, you get priority support, but the free version is you-and-the-community.

Configuration complexity. LiteLLM's flexibility is also its complexity. Configuring fallback chains, load balancing, budget policies, and model routing requires understanding the proxy's YAML schema in detail. Misconfiguration can silently route traffic incorrectly or fail to enforce budget limits.

No semantic caching built-in. You can integrate external caching (Redis, custom layers), but LiteLLM doesn't provide semantic caching — the ability to serve cached responses for semantically similar (not exact) prompts. For high-volume workloads with repetitive queries, this is a meaningful cost savings you miss.

Best For

Teams with DevOps capacity that value control, privacy, and zero vendor lock-in. Organizations running hybrid setups (cloud APIs + self-hosted models). High-volume applications where OpenRouter's 5.5% fee exceeds the cost of self-hosting. Companies with strict data residency or compliance requirements that preclude managed services.

Portkey: The Observability Gateway

Portkey positions itself as the "control panel for production AI." While OpenRouter focuses on model access and LiteLLM focuses on self-hosted routing, Portkey focuses on what happens after you deploy: observability, reliability, and governance.

Every LLM request flowing through Portkey is logged, traced, analyzed, and attributed. You see latency distributions, cost breakdowns by feature/user/model, error rates, guardrail violations, and cache hit rates — all in a real-time dashboard. If you've ever looked at your LLM bill and wondered "where did that $3,000 go?", Portkey answers that question.

How Portkey Pricing Works

Portkey uses a unique pricing model based on "recorded logs" — the number of LLM requests captured in the observability system:

Tier	Monthly Price	Recorded Logs	Retention	Key Features
Dev (Free)	$0	10,000	30 days	Basic observability, single workspace
Pro	Starting ~$49	100K–3M	30 days	Advanced routing, semantic caching, $9/100K overage
Enterprise	Custom	10M+	90+ days	SOC2, HIPAA, ISO 27001, SSO, custom retention

Critical nuance: When you exceed your log limit, the gateway *keeps routing requests* — you just lose observability on the excess. Your app doesn't go down, but you go blind. This is both a safety net and a trap: your production app stays up, but you lose the visibility that Portkey exists to provide.

What you're paying for:

Managed edge infrastructure (99.99% uptime SLA, 20-40ms latency overhead)
Request log storage, indexing, and querying
Advanced routing (fallbacks, load balancing, conditional routing)
Semantic caching to reduce repeated calls
Guardrails and prompt management
Compliance certifications (Enterprise tier)

LLM costs are separate. Portkey doesn't mark up token prices. You still pay OpenAI, Anthropic, etc. directly. Portkey is the middleware layer — you're paying for the gateway's features, not for model access.

Portkey Strengths

Best-in-class observability. Every request gets a detailed log: latency, tokens, cost, model, provider, custom metadata, guardrail results. Trace multi-step agent workflows end-to-end. Attribute costs to specific users, features, or teams. For production applications, this visibility is the difference between optimizing costs and guessing.

This is especially relevant for AI agent architectures where a single user request can trigger 10-50 LLM calls across multiple models. Without request-level tracing, debugging a slow agent response is near impossible.

Semantic caching. Portkey caches responses for semantically similar prompts — not just exact matches. If user A asks "What's the capital of France?" and user B asks "Capital city of France?", the second request hits cache. For applications with repetitive query patterns (customer support, FAQ bots, document retrieval), this reduces both latency and cost significantly.

Advanced reliability features. Automatic fallbacks, circuit breakers, conditional routing, retries with exponential backoff. If Claude is down, route to GPT-4o. If GPT-4o is slow, route to Gemini Flash. If all primary models are degraded, trigger an alert and route to a fallback. This is the kind of production infrastructure that takes months to build from scratch — and Portkey ships it out of the box.

Guardrails and governance. Define input/output validation rules: block PII in prompts, enforce response formats, flag hallucination indicators, limit token usage per request. For regulated industries and enterprise AI applications, built-in guardrails are a compliance requirement, not a nice-to-have.

Enterprise compliance. SOC2 Type 2, ISO 27001, GDPR, HIPAA at the Enterprise tier. For organizations in healthcare, finance, and government, these certifications eliminate months of vendor security reviews.

Low integration overhead. Portkey claims 2-minute integration — swap your base URL and add a Portkey header. Compatible with OpenAI, Anthropic, Google, AWS Bedrock, Azure, and more. The SDK handles the rest.

Portkey Weaknesses

Log-based pricing is confusing. "Recorded logs" is not an intuitive metric. Teams need to estimate their monthly request volume, understand the distinction between routed requests and recorded logs, and predict overage costs. The Pro tier's $9/100K overage logs can add up quickly for high-volume applications.

Expensive at high volume. A production application making 5 million requests/month could easily exceed Enterprise pricing thresholds. At that scale, the total Portkey cost (gateway fees + LLM provider costs) may exceed what self-hosted LiteLLM costs in infrastructure + DevOps time.

Not self-hostable. Portkey runs on their managed infrastructure. Your data flows through Portkey's edge network. For organizations with strict data residency requirements or zero-trust policies, this is a dealbreaker. Portkey offers data processing agreements and compliance certifications, but "our own servers only" is a common enterprise policy that Portkey can't satisfy.

Latency overhead. Portkey adds 20-40ms of latency per request. For most applications, this is negligible. For latency-critical workloads — real-time chat, interactive coding assistants, high-speed agent frameworks — every millisecond matters, and 40ms per call across a 20-call agent chain is nearly a second of added latency.

Vendor lock-in. Your observability data, caching rules, routing configurations, and guardrails all live on Portkey's platform. Migrating away means rebuilding observability, reconfiguring routing, and losing historical data. The switching cost grows with time.

Best For

Production AI teams that need deep observability and cost attribution across multiple models and providers. Enterprises in regulated industries requiring compliance certifications. Teams building multi-agent systems where request-level tracing is essential for debugging. Organizations willing to pay for managed reliability features instead of building them.

Head-to-Head: Key Decision Factors

Cost at Scale

The break-even analysis depends on your monthly LLM spend and request volume:

Monthly LLM Spend	OpenRouter Fee (5.5%)	LiteLLM TCO (infra + ops)	Portkey Pro (~$49 + overages)
$500	$28	$400–$1,500	~$49–$150
$2,000	$110	$400–$1,500	~$49–$500
$10,000	$550	$500–$2,000	~$200–$1,000
$50,000	$2,750	$800–$3,000	Custom (Enterprise)
$100,000	$5,500	$1,000–$3,800	Custom (Enterprise)

Under $2,000/mo: OpenRouter or Portkey Free are the pragmatic choices. LiteLLM's infrastructure costs exceed the savings.

$2,000–$10,000/mo: All three are viable. OpenRouter is simplest, Portkey adds observability, LiteLLM saves money if you have DevOps capacity.

Over $10,000/mo: LiteLLM becomes the clear cost winner. OpenRouter's fee is a significant line item. Portkey Enterprise may be worth it for the observability and compliance features.

Routing Intelligence

OpenRouter routes based on provider availability, price, and speed. It knows which providers are serving each model and picks the best one automatically. You can override with provider preferences. For multi-provider models like Llama (served by Together AI, Fireworks, Lepton, etc.), this automatic optimization is genuinely valuable.

LiteLLM gives you full control over routing logic via YAML configuration. Define fallback chains, load balancing strategies (round-robin, least-latency, cost-optimized), and model-specific routing rules. More powerful than OpenRouter's routing, but requires manual configuration.

Portkey offers the most advanced routing: conditional routing based on request metadata, load balancing across providers, circuit breakers that remove unhealthy providers, and request-level routing overrides. The closest thing to intelligent traffic management for LLM APIs.

Observability

Capability	OpenRouter	LiteLLM (OSS)	LiteLLM (Enterprise)	Portkey
Usage stats	✅ Basic	❌	✅	✅ Detailed
Request logs	❌	✅ (DB)	✅	✅ (searchable)
Latency tracing	❌	DIY	DIY	✅ Built-in
Cost attribution	❌	✅ Per-key	✅ Per-key	✅ Per-user/feature
Prompt analytics	❌	❌	❌	✅
Guardrail metrics	❌	❌	✅	✅
Dashboard	Basic web	DIY (Grafana)	DIY (Grafana)	✅ Built-in

If observability is your primary requirement, Portkey wins by a wide margin. LiteLLM can achieve similar capabilities with significant DIY effort. OpenRouter doesn't compete on this axis.

Integration with Agent Frameworks

All three work as drop-in OpenAI-compatible endpoints, which means they integrate with every major agent framework: LangChain, CrewAI, AutoGen, Dify, Flowise, LangFlow, and custom implementations.

However, the value each provides to agent workflows differs:

OpenRouter provides model variety — agents can dynamically select different models for different tasks through one API.
LiteLLM provides cost control — per-key budgets prevent runaway agents from draining your credit.
Portkey provides visibility — trace an agent's 30-call reasoning chain and see exactly where time and money went.

For teams building multi-agent orchestration systems, the combination of LiteLLM's budget controls and Portkey's tracing is particularly valuable. Some teams run both: LiteLLM as the proxy layer, Portkey for observability.

Other Gateways Worth Knowing

Martian (martian.ai)

AI-powered model routing that automatically selects the best model for each prompt. Instead of rules-based routing, Martian uses a routing model to match prompts to providers. Best for: teams that want hands-off cost optimization without configuring routing rules.

Helicone

Open-source LLM observability platform. Similar to Portkey's monitoring but fully self-hostable. Best for: teams that want Portkey-level observability without managed service dependency.

Kong AI Gateway

Enterprise API gateway with LLM-specific features (rate limiting, auth, traffic management). Best for: organizations already using Kong for their API infrastructure that want to extend it to LLM traffic.

The Self-Hosting Case: Building Your Own Gateway

For teams with high volume and strong DevOps capacity, self-hosting goes beyond LiteLLM:

The stack: LiteLLM proxy + Prometheus + Grafana + Redis (caching) + PostgreSQL (logging) + custom routing logic. Running on a dedicated GPU server with Ollama for local models, routing cloud calls through LiteLLM to OpenAI/Anthropic/Together AI.

The math: An RTX 4090 (~$1,600) running Llama 3.1 70B handles ~40 tokens/second. For internal tooling, summarization, and classification workloads, self-hosted inference through a LiteLLM gateway can reduce LLM costs by 80%+ versus cloud APIs. Pair with cloud GPU providers for burst capacity, and use LiteLLM to route between local and cloud seamlessly.

When it makes sense:

$10,000+/month in LLM spend
Strong DevOps team (or willingness to invest in one)
Privacy requirements that preclude managed gateways
Hybrid architecture (local models + cloud APIs)
Need for custom routing logic beyond what any gateway provides

When it doesn't:

Under $5,000/month (managed services are cheaper than ops overhead)
Small team without dedicated infrastructure engineers
Rapidly iterating on model selection (OpenRouter's flexibility wins here)

For a complete guide to self-hosting open-source models, see Ollama production configuration.

Use Case Recommendations

Indie Developer / Small Startup

Winner: OpenRouter

You need models, not infrastructure. OpenRouter gives you 300+ models with one API key, free models for development, and the 5.5% fee is pocket change at low volume. When you outgrow it — and you'll know when — migrate to LiteLLM or Portkey.

Start with OpenRouter + the best free AI APIs to build your MVP without spending anything on infrastructure.

Production AI Application (Mid-Scale)

Winner: Portkey

You're past prototyping. You need to know where your money goes, why requests fail, and how to optimize costs. Portkey's observability pays for itself the first time you discover a prompt template that's wasting 40% of your token budget. The guardrails and fallback features prevent the kind of production incidents that cost more than Portkey's annual subscription.

Enterprise / Regulated Industry

Winner: LiteLLM (self-hosted) or Portkey (Enterprise)

Depends on your primary constraint. If it's data residency and vendor risk, LiteLLM gives you full control. If it's compliance certifications and managed reliability, Portkey Enterprise delivers SOC2, HIPAA, and ISO 27001 with a 99.99% SLA. Many enterprises use both: LiteLLM as the proxy, Portkey for observability.

Multi-Agent System

Winner: Portkey + LiteLLM

Agent frameworks make dozens of LLM calls per task. You need Portkey's tracing to debug multi-step chains and LiteLLM's budgets to prevent runaway costs. Route through LiteLLM for cost control and budget enforcement, pipe logs to Portkey (or Helicone) for end-to-end visibility. See our multi-agent orchestration guide for architecture patterns.

Cost Optimization at Scale

Winner: LiteLLM + self-hosted models

At $50,000+/month in LLM spend, every percentage point matters. LiteLLM's zero markup saves $2,750/month versus OpenRouter. Self-hosted Llama via Ollama handles the commodity workloads. Cloud APIs handle the complex reasoning. LiteLLM routes between them intelligently. Pair with prompt caching strategies and Together AI's batch processing at 50% off for maximum savings.

Decision Matrix

If you need...	Choose...	Why
Fastest setup, most models	OpenRouter	One API key, 300+ models, zero config
Full control, self-hosted	LiteLLM	MIT license, zero markup, your infra
Production observability	Portkey	Built-in logs, traces, cost attribution
Cheapest at high volume	LiteLLM	No fees at any scale
Compliance certs (SOC2/HIPAA)	Portkey Enterprise	Built-in certifications
Best routing intelligence	Portkey	Circuit breakers, conditional routing
Agent cost control	LiteLLM	Per-key budgets, hard limits
Zero vendor lock-in	LiteLLM	MIT license, self-hosted
Model evaluation/testing	OpenRouter	Free models, instant access to all providers
Semantic caching	Portkey	Built-in, no additional infra

FAQ

What is an LLM gateway?

An LLM gateway sits between your application and AI model providers (OpenAI, Anthropic, Google, etc.). It provides unified API access, automatic failover, cost tracking, load balancing, and caching. Think of it as a reverse proxy specifically designed for AI API calls.

Is OpenRouter free to use?

OpenRouter charges a 5.5% markup on provider prices with no subscription fee — pay only for what you use. 27 models are available completely free (community-subsidized). For most developers, it's the lowest-friction way to access 300+ models.

Should I self-host LiteLLM or use the cloud version?

Self-host if you need full control, air-gapped deployment, or want to avoid per-seat licensing. The open-source proxy is free and handles 100+ providers. Use the cloud version (Enterprise from $250/month) if you want managed infrastructure, SSO, and audit logging without ops overhead.

Which LLM gateway is best for startups?

OpenRouter for early-stage (no commitment, pay-per-use). LiteLLM self-hosted for engineering-heavy teams who want control. Portkey for funded startups needing enterprise features (SOC2, guardrails, observability) — starts around $499/month.

Can I use multiple LLM providers with one API key?

Yes, that's the core value of all three gateways. OpenRouter gives you one API key for 300+ models. LiteLLM provides a unified OpenAI-compatible endpoint for any provider. Portkey adds intelligent routing and automatic fallback between providers.

The Right Gateway in 2026

The LLM gateway market is converging: OpenRouter is adding observability features, LiteLLM is adding managed hosting options, and Portkey is expanding model access. In 12 months, the differences may narrow.

But today, the choice is clear:

Start with OpenRouter if you're building. It's the fastest path from idea to working prototype. Switch when you hit scale, compliance, or observability walls.

Move to Portkey when you need visibility. The first production incident you debug in 5 minutes instead of 5 hours pays for the subscription.

Build on LiteLLM when you need control. Self-hosted, zero markup, fully customizable. The DevOps investment is front-loaded, but the long-term economics and flexibility are unbeatable.

> Self-hosting tip: Running LiteLLM with local models (Ollama, vLLM) gives you zero per-token cost. Need GPU power without buying hardware? Vast.ai offers GPU instances from $0.20/hr — run your own inference stack for a fraction of API pricing.

The best gateway is the one that matches your team's maturity. Don't over-engineer day one. Don't under-invest at scale. Start simple, add complexity when the cost of not having it exceeds the cost of building it.

*For inference provider comparisons, see Hugging Face vs Replicate vs Together AI and Groq vs Together AI vs Fireworks AI. For local inference alternatives, check LM Studio vs Jan vs GPT4All and Ollama production config.*

*Disclosure: Links above are affiliate links. ToolHalla may earn a commission at no extra cost to you. We only recommend hardware we'd actually use.*

Frequently Asked Questions

What is an LLM gateway?

Is OpenRouter free to use?

Should I self-host LiteLLM or use the cloud version?

Which LLM gateway is best for startups?

Can I use multiple LLM providers with one API key?

🔧 Tools in This Article

Microsoft AutoGen

Make (Integromat)

Hugging Face

Fireworks AI

Together AI

OpenRouter

Perplexity

Replicate

Related Guides

All guides →

Tools & APIs

Hugging Face vs Replicate vs Together AI: Best Inference API in 2026

You've trained or chosen an open-source model. Now you need to serve it. Not on your own GPU — you need an API endpoint that scales, stays up, and doesn't…

18 min read

Tools & APIs

Best Vibe Coding Tools in 2026: AI Assistants That Keep You in Flow State

Andrej Karpathy coined the term "vibe coding" in early 2025 and it stuck because it described something real: a way of writing software where you describe…

22 min read

Tools & APIs

GitHub Copilot vs Tabnine vs Amazon Q vs Gemini Code Assist: Best AI Coding Assistant for Teams in 2026

AI code completion went from novelty to necessity in about two years. By early 2026, over 70% of professional developers use some form of AI-assisted…

22 min read