AI Infrastructure

Vercel AI Gateway Provider Sorting: Cost, Latency, and Throughput

Vercel AI Gateway now lets developers sort providers behind a model by cost, time to first token, or throughput. Here is what the new sort option changes, and what it still does not prove.

May 17, 2026·7 min read·1,356 words

In short: Vercel AI Gateway now lets you sort the providers behind a model by cost (lowest input price), ttft (lowest median time to first token), or tps (highest median throughput). Set it on providerOptions.gateway per request, ranked at request time. It's a routing preference, not a guaranteed bill or SLA.

Vercel says its AI Gateway can now sort providers behind a model by cost, time to first token, or throughput. The change, published in the Vercel changelog on May 15, 2026 by Walter Korman and Jerilyn Zheng, gives developers an explicit knob to choose which provider behind a given model should be tried first.

For teams using the AI SDK against multi-provider models, this turns an implicit default into a deliberate routing policy. That is useful, but it does not change what a gateway can and cannot prove.

Source: Vercel changelog: Sort providers by cost, latency, or throughput on AI Gateway.

What Vercel changed on May 15, 2026

Vercel's default provider order blends provider reliability, quality of model output, cost, and response speed. According to the changelog, the new sort option on providerOptions.gateway lets developers override that blended default with one of three explicit criteria.

The values are:

cost: rank providers by listed input price per million tokens, lowest first.
ttft: rank by median time to first token in milliseconds, lowest first.
tps: rank by median throughput in tokens per second, highest first.

Vercel says ranking is computed at request time, so newly added providers, price changes, and shifts in observed latency or throughput flow through automatically without code changes. Providers are tried in sort order, and fallback to the next provider only happens when the higher-ranked one is unavailable.

The three sort modes

Cost

sort: 'cost' orders providers by listed input price per million tokens, cheapest first. Vercel's changelog uses GPT OSS 120B as an example: AI Gateway exposes more than five providers for that model, and they do not all charge the same per-token rate, which is the situation where price-first routing matters most.

This mode is positioned for high-volume, cost-sensitive work where input price is the dominant variable. Vercel's note describes it as ranking by listed input price, so output price, request volume, retries, and caching are not part of the sort itself.

TTFT

sort: 'ttft' orders providers by median time to first token in milliseconds, lowest latency first. The intent is to send latency-sensitive traffic to the provider that has historically responded quickest.

This mode is positioned for interactive or perceived-latency workloads: chat, autocomplete, and real-time agent steps where the person on the other end is waiting for the first chunk of output. Because ranking is computed at request time, the order tracks observed median latency over time rather than a static benchmark.

TPS

sort: 'tps' orders providers by median throughput in tokens per second, highest first. This mode is positioned for long-output generation, where the time to produce the full response matters more than the time to the first token.

For workloads like long summaries, batch reports, or large structured outputs, the provider with the lowest TTFT may not be the provider that finishes generation first. TPS-sorted routing targets that second case.

When each routing mode makes sense

The three modes map to three different shapes of workload:

Use cost when input volume is high, latency is acceptable, and a lower per-million-token rate dominates the bill. Common cases include large-batch classification, embedding-adjacent rewrites, and back-office summarization.
Use ttft when a person is waiting for the first chunk. Conversational UIs, agent loops with interactive feedback, and IDE-style assistants tend to live or die by first-token latency.
Use tps when total wall-clock matters more than the first token. Long-form generation, code rewrites, and report rendering benefit when the chosen provider sustains a higher tokens-per-second rate.

Most production stacks will end up using more than one. A common pattern is ttft for interactive surfaces, cost for background jobs, and tps for long-running generations, all hitting the same model through the same gateway, with the sort chosen per request.

For a wider view of where this kind of routing fits alongside other gateway products, see our comparison of OpenRouter, LiteLLM, and Portkey in 2026.

What provider sorting does not prove

Sorting is a routing preference, not a guarantee. A few things worth being explicit about:

sort: 'cost' does not guarantee the lowest total bill. Vercel's note describes it as ranking by listed input price per million tokens. Output length, prompt caching behavior, retry rates on failures, and any output-token premium can change the actual invoice. The cheapest input price is not always the cheapest job.
sort: 'ttft' and sort: 'tps' describe routing inputs, not delivered SLAs. Median latency or throughput is what the gateway uses to rank, not what any specific request is contractually guaranteed to receive. A tail-latency event still happens at the tail.
Sorting does not normalize quality. Vercel describes the default as also weighing model output quality. Once you override that default with a single dimension, you are explicitly accepting whatever quality variance exists between providers for that model. If a provider runs a quantized or otherwise different deployment, the sort does not surface that.
Fallback is conditional. Vercel says the next provider is used only when the higher-ranked one is unavailable. That covers outages and errors. It does not automatically swap providers because the current one is slow on a particular request.
Vercel did not publish hands-on benchmarks in the changelog. Anything not in the source — including specific price comparisons, latency numbers, or provider rankings that Toolhalla has not measured — is not something we are asserting.

Directory implications for Toolhalla and AI infra buyers

For Toolhalla's directory, the change reinforces a category trend rather than creating a new product class. AI Gateway already belonged in the LLM gateway and provider-routing bucket alongside OpenRouter, LiteLLM, and Portkey. The sort option is a feature update, not a new entry.

What it does change is what buyers should ask when evaluating an LLM gateway:

Does it expose ranking criteria explicitly, or only as a blended default?
Are rankings recomputed at request time, or pinned at deploy time?
How is fallback triggered — outage, error rate, latency, or something else?
Are listed prices the actual billed prices for your account, or list-price indicators?

A gateway that lets you choose between cost, ttft, and tps per request is one shape of answer. A gateway that pins routing in policy files or an admin UI is another. Neither is wrong, but they imply different operational habits.

For AI infrastructure buyers, the practical implication is to map each workload to a single dimension before turning on per-request sorting. If you cannot say whether a workload is cost-, latency-, or throughput-bound, picking a sort value is guesswork.

FAQ

Where is `sort` configured?

sort is set on providerOptions.gateway in the AI SDK request, per Vercel's changelog.

What values does `sort` accept?

cost, ttft, and tps. Vercel's changelog defines cost as listed input price per million tokens (lowest first), ttft as median time to first token in milliseconds (lowest first), and tps as median tokens per second (highest first).

Does `sort: 'cost'` guarantee the cheapest total bill?

No. The sort ranks by listed input price per million tokens. Output length, retries, caching, and any output-token premium can still change the final cost.

How does fallback work with sorted routing?

Vercel says providers are tried in sort order, and fallback to the next provider only happens when the higher-ranked one is unavailable.

Are the rankings static?

No. Vercel says ranking is computed at request time, so newly added providers, price changes, and shifts in observed latency or throughput flow through automatically without code changes.

Has Toolhalla tested this hands-on?

No. This article is a sourced summary of Vercel's May 15, 2026 changelog plus an analysis of what sort does and does not prove. We have not run our own measurements.

Sources

Vercel changelog, "Sort providers by cost, latency, or throughput on AI Gateway" (May 15, 2026, by Walter Korman and Jerilyn Zheng): https://vercel.com/changelog/sort-providers-by-cost-latency-or-throughput-on-ai-gateway
Vercel AI Gateway product page: https://vercel.com/ai-gateway

Frequently Asked Questions

Where is `sort` configured?

sort is set on providerOptions.gateway in the AI SDK request, per Vercel's changelog.

What values does `sort` accept?

cost, ttft, and tps. Vercel's changelog defines cost as listed input price per million tokens (lowest first), ttft as median time to first token in milliseconds (lowest first), and tps as median tokens per second (highest first).

Does `sort: 'cost'` guarantee the cheapest total bill?

No. The sort ranks by listed input price per million tokens. Output length, retries, caching, and any output-token premium can still change the final cost.

How does fallback work with sorted routing?

Vercel says providers are tried in sort order, and fallback to the next provider only happens when the higher-ranked one is unavailable.

Are the rankings static?

No. Vercel says ranking is computed at request time, so newly added providers, price changes, and shifts in observed latency or throughput flow through automatically without code changes.

Has Toolhalla tested this hands-on?

No. This article is a sourced summary of Vercel's May 15, 2026 changelog plus an analysis of what sort does and does not prove. We have not run our own measurements.

🔧 Tools in This Article

Make (Integromat)

Vercel AI Gateway

OpenRouter

SE Ranking

LiteLLM

Portkey

Related Guides

All guides →

AI Infrastructure

Qwen 3.7 Max on Vercel AI Gateway: what builders get

Alibaba's Qwen 3.7 Max is now callable through Vercel AI Gateway and the AI SDK. Here is what Vercel actually says, what builders should verify, and what remains unproven.

8 min read

AI Infrastructure

SpaceX S-1: AI compute, xAI, and Starlink terms

SpaceX's preliminary S-1 introduces formal definitions for AI compute, AI compute satellites, and orbital AI compute, and folds xAI into a new AI segment. A sourced Toolhalla explainer of what the filing actually says.

9 min read

AI Infrastructure

AI Infrastructure Geopolitics: Why the Stargate Threat Matters

The Stargate UAE threat shows how AI infrastructure geopolitics now shapes compute concentration, location risk, and frontier AI resilience.

10 min read

#Vercel#Vercel AI Gateway#AI SDK#LLM gateways#provider routing#AI infrastructure

What Vercel changed on May 15, 2026

The three sort modes

Cost

TTFT

TPS

When each routing mode makes sense

What provider sorting does not prove

Directory implications for Toolhalla and AI infra buyers

FAQ

Where is sort configured?

What values does sort accept?

Does sort: 'cost' guarantee the cheapest total bill?

How does fallback work with sorted routing?

Are the rankings static?

Has Toolhalla tested this hands-on?

Sources

Frequently Asked Questions

🔧 Tools in This Article

Related Guides

Qwen 3.7 Max on Vercel AI Gateway: what builders get

SpaceX S-1: AI compute, xAI, and Starlink terms

AI Infrastructure Geopolitics: Why the Stargate Threat Matters

Where is `sort` configured?

What values does `sort` accept?

Does `sort: 'cost'` guarantee the cheapest total bill?