Groq
About Groq
Groq builds custom LPU (Language Processing Unit) chips designed specifically for AI inference. Their cloud API delivers the fastest token generation speeds available — up to 1000 tokens/sec. Featured models include GPT OSS 120B (500 tok/s, $0.15/M input), GPT OSS 20B (1000 tok/s, $0.075/M input), Llama 3.3 70B (280 tok/s), Llama 4 Scout 17B (750 tok/s), Qwen3-32B (400 tok/s), and Kimi K2 (200 tok/s, 262K context). OpenAI-compatible API with free tier. Also offers Compound AI systems with built-in web search and code execution.
Features
The tally
- +Fastest inference available (500-1000 tok/s)
- +Free tier with generous limits (250K+ tokens/min)
- +OpenAI-compatible API — swap one line of code
- +Latest open-source models (GPT OSS, Llama 4, Qwen3)
- +Compound AI for agentic workflows (search + code exec)
- −Cloud-only — cannot self-host LPU hardware
- −Rate limits on free tier (1K RPM)
- −Smaller model catalog than running locally via Ollama
Related concepts
Kept nearby
Unified API gateway for routing app calls across hundreds of AI models
The AI community platform with 500K+ models and datasets
Fast and efficient LLM inference platform
Fast inference and fine-tuning for open-source models
Browse all LLM APIs & Inference tools →