Groq

The fastest AI inference platform — LPU-powered, 1000+ tokens/sec

0
LLM APIs & InferenceFree tier available, pay-per-token for production

About Groq

Groq builds custom LPU (Language Processing Unit) chips designed specifically for AI inference. Their cloud API delivers the fastest token generation speeds available — up to 1000 tokens/sec. Featured models include GPT OSS 120B (500 tok/s, $0.15/M input), GPT OSS 20B (1000 tok/s, $0.075/M input), Llama 3.3 70B (280 tok/s), Llama 4 Scout 17B (750 tok/s), Qwen3-32B (400 tok/s), and Kimi K2 (200 tok/s, 262K context). OpenAI-compatible API with free tier. Also offers Compound AI systems with built-in web search and code execution.

Features

LPU hardware — custom chips for inference, not repurposed GPUs
GPT OSS 120B at 500 tok/s ($0.15/M input)
GPT OSS 20B at 1000 tok/s ($0.075/M input)
Llama 4 Scout 17B at 750 tok/s with 131K context + vision
Qwen3-32B at 400 tok/s with 131K context
Compound AI systems with web search + code execution
Whisper transcription ($0.04-0.11/hour)
OpenAI-compatible API — drop-in replacement
Free developer tier: 250-300K TPM, 1K RPM

Pros & Cons

Pros

  • +Fastest inference available (500-1000 tok/s)
  • +Free tier with generous limits (250K+ tokens/min)
  • +OpenAI-compatible API — swap one line of code
  • +Latest open-source models (GPT OSS, Llama 4, Qwen3)
  • +Compound AI for agentic workflows (search + code exec)

Cons

  • Cloud-only — cannot self-host LPU hardware
  • Rate limits on free tier (1K RPM)
  • Smaller model catalog than running locally via Ollama

Platforms

Web

Tags

Related AI Concepts

Similar Tools

📰 Featured In

All guides →

Need help choosing?

Compare Groq with alternatives side by side

Compare Tools →