OpenRouter vs Together vs Groq vs Fireworks vs Cerebras: the per-token model gateways compared (April 2026)
If you don’t want to rent a GPU and run an LLM yourself, you’re picking between per-token gateways — services that host the model and bill you per million input/output tokens. The big five in 2026 are OpenRouter, Together AI, Groq, Fireworks AI, and Cerebras Inference. This article is a side-by-side of what each vendor publishes on its pricing page, with links. There are no latency or throughput numbers I measured. Where I quote a vendor’s claim, I link to where they made it.
Methodology note. Every price below is from the vendor’s own pricing page as of April 2026. I checked each link the day this was written. The per-token economy moves fast — confirm before you commit. Where a vendor uses fuzzy language (“blazing fast”, “industry-leading”), I say so and don’t repeat the claim as fact.
What is a “model gateway”, actually
Three different things have collapsed into the same product category, and the word “gateway” muddles them:
- Aggregators — one API in front of many providers’ models. You pick a model, they route the request. OpenRouter is the canonical example.
- First-party model hosts — they host the open-weights models themselves on their own GPU fleet, sell access per-token. Together, Fireworks, Groq, Cerebras, Anyscale, Deepinfra are all this.
- Specialty silicon hosts — same as #2 but the differentiator is custom hardware. Groq (LPU), Cerebras (Wafer-Scale Engine), SambaNova (RDU). They claim huge throughput on a narrow model list.
OpenRouter sits on top of #2 and #3. The other four sit alongside each other. Treating them as a single “gateway” market hides the most important fact: when you call OpenRouter, your tokens often actually go to one of the others.
Headline pricing — input/output per million tokens
All numbers in USD per million tokens, rounded to two decimals. Where a vendor offers multiple tiers (e.g., “Reference” vs “Pro”), I quote the default cited on the pricing page. Token prices change frequently — these are April 2026 snapshots.
Llama 3.3 70B Instruct
| Provider | Input ($/M tok) | Output ($/M tok) | Source |
|---|---|---|---|
| OpenRouter | varies (routes to underlying provider) | varies | OpenRouter Llama 3.3 70B |
| Together AI | $0.88 | $0.88 | Together pricing |
| Groq | $0.59 | $0.79 | Groq pricing |
| Fireworks AI | $0.90 | $0.90 | Fireworks pricing |
| Cerebras | not currently offered on default plan | — | Cerebras pricing |
Llama 3.1 8B Instruct
| Provider | Input ($/M tok) | Output ($/M tok) | Source |
|---|---|---|---|
| OpenRouter | varies | varies | OpenRouter Llama 3.1 8B |
| Together AI | $0.18 | $0.18 | Together pricing |
| Groq | $0.05 | $0.08 | Groq pricing |
| Fireworks AI | $0.20 | $0.20 | Fireworks pricing |
| Cerebras | $0.10 | $0.10 | Cerebras pricing |
DeepSeek-V3 / -R1 family (where offered)
| Provider | Model | Input ($/M tok) | Output ($/M tok) | Source |
|---|---|---|---|---|
| OpenRouter | DeepSeek-R1 | varies — currently routes to multiple providers | varies | OpenRouter DeepSeek-R1 |
| Together AI | DeepSeek-V3 | $1.25 | $1.25 | Together pricing |
| Fireworks AI | DeepSeek-V3 | $0.90 | $0.90 | Fireworks pricing |
| Groq | not currently in the supported list | — | — | Groq supported models |
The OpenRouter prices are deliberately blank because OpenRouter charges you what the underlying provider charges plus a 5% platform fee on top of credits (OpenRouter fees doc). Always check the model page on OpenRouter for the live number — it updates as routing changes.
What you’re actually paying for
Per-token pricing is comparable in the abstract but the service behind a token differs in three ways that the matrix hides:
1. Throughput and latency. Groq and Cerebras market themselves heavily on tokens-per-second. Groq’s pricing page claims hundreds of tokens/sec for Llama 3.3 70B on their LPU hardware. Cerebras claims similar numbers on their inference page. I have not verified these claims. They may be true under specific batch sizes and contexts, but a single number on a pricing page is not a methodology.
2. Context window in practice. Many providers list “128K context” or “1M context” on the spec sheet but cap actual usable context lower on cheaper tiers, or charge a separate rate above 32K. Read the small print on each pricing page.
3. Rate limits. Free or low tiers have aggressive RPM/TPM (requests-per-minute, tokens-per-minute) caps that don’t show up in the per-token math but absolutely show up in production. Check each vendor’s rate-limit doc before assuming you can sustain your projected throughput.
OpenRouter — the meta-question
OpenRouter is the only true aggregator of the five. When you call meta-llama/llama-3.3-70b-instruct, OpenRouter picks one of the underlying providers (currently includes Together, Fireworks, Lambda, DeepInfra, and others — see the model page for the live list). You pay the underlying rate plus OpenRouter’s credit fee.
When OpenRouter wins:
- You want a single API across 100+ models without juggling five accounts.
- You want OpenRouter to fail over automatically if one provider has an outage. They publish their routing and uptime logic.
- You want to BYOK (bring your own key) for one provider but route everything else through them. Documented here.
When going direct wins:
- You’re sending serious volume to one model — direct contracts at Together, Fireworks, or DeepInfra get cheaper.
- You need a feature OpenRouter doesn’t expose (e.g., Together’s fine-tuning API, Fireworks’ on-demand model deployment).
- You need streaming with absolute lowest latency — one fewer hop matters here.
Specialty silicon — Groq and Cerebras
Both make extraordinary claims about throughput on their respective custom chips. Both are real companies shipping production traffic. Both have narrower model menus than the GPU-based hosts.
Groq — runs Llama, Mixtral, Whisper, Qwen, DeepSeek-R1 distill, and a handful of others on their LPU. Full supported list. Their value prop is consistent low latency for chat-style workloads. Free tier exists with hard rate limits.
Cerebras — runs Llama 3.1 8B, 70B, 405B, Qwen, and a handful of others on the WSE-3. Pricing here. Their value prop is highest tokens/sec for streaming generation. Free trial, then paid tier.
The honest take. If you have a workload where chat latency dominates UX (think: realtime voice agents, autocomplete, IDE), trial both Groq and Cerebras side-by-side with your prompts and your model and see what you actually get. Vendor benchmarks are best-case numbers and the gap between best-case and your-case can be 5×.
When per-token gateways are the wrong choice
Three cases where you should not use any of these:
- You need a private model fine-tuned on your data. Per-token gateways host public model weights. For a custom fine-tune, go to Together’s fine-tuning, Fireworks’ on-demand, or rent GPUs (see our serverless GPU pricing matrix).
- You need on-prem or VPC deployment. None of the per-token gateways run inside your network. AWS Bedrock, Azure AI Foundry, or self-hosted is the path.
- Your volume is so high that the per-token margin matters more than dev velocity. At that point, GPU rental + your own inference server (vLLM, TGI, sgLang) starts to pencil out.
For the small-to-medium case — you’re shipping a product, you want a chat or completion endpoint, you don’t want to think about hardware — the per-token gateway market in 2026 is the cleanest deal in cloud computing. Just check the price the morning you launch, because it might have changed.
Sources
- OpenRouter Llama 3.3 70B model page — https://openrouter.ai/meta-llama/llama-3.3-70b-instruct
- OpenRouter Llama 3.1 8B model page — https://openrouter.ai/meta-llama/llama-3.1-8b-instruct
- OpenRouter DeepSeek-R1 model page — https://openrouter.ai/deepseek/deepseek-r1
- OpenRouter BYOK / fees doc — https://openrouter.ai/docs/use-cases/byok
- OpenRouter provider routing — https://openrouter.ai/docs/features/provider-routing
- Together AI pricing — https://www.together.ai/pricing
- Together AI rate limits — https://docs.together.ai/docs/rate-limits
- Groq pricing — https://groq.com/pricing/
- Groq supported models — https://console.groq.com/docs/models
- Fireworks AI pricing — https://fireworks.ai/pricing
- Cerebras inference pricing — https://www.cerebras.ai/inference