Skip to content

Groq Models

Groq provides ultra-fast inference using custom LPU hardware, delivering the fastest token generation speeds available. They host popular open-weight models like Llama and Mixtral with industry-leading latency.

Visit Groq →

11

Models Available

$0.050

Cheapest Input / 1M

262K

Largest Context

What is Groq?

Groq is an AI model provider offering 11 large language models for developers. Their cheapest model starts at $0.050 per 1M input tokens, and their largest context window reaches 262K. Groq provides ultra-fast inference using custom LPU hardware, delivering the fastest token generation speeds available. They host popular open-weight models like Llama and Mixtral with industry-leading latency.

Groq Strengths

Fastest inference speeds
Custom LPU hardware
Competitive open-model pricing
Low latency

All Groq Models

Model Input $/1M Output $/1M Context Max Output Released
Llama 3.1 8b Instant $0.050 $0.080 128K 8,192
Gemma 7b It $0.050 $0.080 8K 8,192
Openai/Gpt Oss 20b $0.075 $0.30 131K 32,768
Openai/Gpt Oss Safeguard 20b $0.075 $0.30 131K 65,536
Meta Llama/Llama 4 Scout 17b 16e Instruct $0.11 $0.34 131K 8,192
Openai/Gpt Oss 120b $0.15 $0.60 131K 32,766
Meta Llama/Llama Guard 4 12b $0.20 $0.20 8K 8,192
Meta Llama/Llama 4 Maverick 17b 128e Instruct $0.20 $0.60 131K 8,192
Qwen/Qwen3 32b $0.29 $0.59 131K 131,000
Llama 3.3 70b Versatile $0.59 $0.79 128K 32,768
Moonshotai/Kimi K2 Instruct 0905 $1.00 $3.00 262K 16,384

Model Details

Llama 3.1 8b Instant

Llama 3.1 8b Instant is available via Groq with a 128K context window and up to 8,192 output tokens. Pricing: $0.0500/1M input tokens, $0.0800/1M output tokens.

Input: $0.050/1M Output: $0.080/1M Context: 128K
text function calling

Gemma 7b It

Gemma 7b It is available via Groq with a 8K context window and up to 8,192 output tokens. Pricing: $0.0500/1M input tokens, $0.0800/1M output tokens.

Input: $0.050/1M Output: $0.080/1M Context: 8K
text function calling

Openai/Gpt Oss 20b

Openai/Gpt Oss 20b is available via Groq with a 131K context window and up to 32,768 output tokens. Pricing: $0.0750/1M input tokens, $0.3000/1M output tokens.

Input: $0.075/1M Output: $0.30/1M Context: 131K
text function calling reasoning web search json mode

Openai/Gpt Oss Safeguard 20b

Openai/Gpt Oss Safeguard 20b is available via Groq with a 131K context window and up to 65,536 output tokens. Pricing: $0.0750/1M input tokens, $0.3000/1M output tokens.

Input: $0.075/1M Output: $0.30/1M Context: 131K
text function calling reasoning web search json mode

Meta Llama/Llama 4 Scout 17b 16e Instruct

Meta Llama/Llama 4 Scout 17b 16e Instruct is available via Groq with a 131K context window and up to 8,192 output tokens. Pricing: $0.1100/1M input tokens, $0.3400/1M output tokens.

Input: $0.11/1M Output: $0.34/1M Context: 131K
text vision function calling json mode

Openai/Gpt Oss 120b

Openai/Gpt Oss 120b is available via Groq with a 131K context window and up to 32,766 output tokens. Pricing: $0.1500/1M input tokens, $0.6000/1M output tokens.

Input: $0.15/1M Output: $0.60/1M Context: 131K
text function calling reasoning web search json mode

Meta Llama/Llama Guard 4 12b

Meta Llama/Llama Guard 4 12b is available via Groq with a 8K context window and up to 8,192 output tokens. Pricing: $0.2000/1M input tokens, $0.2000/1M output tokens.

Input: $0.20/1M Output: $0.20/1M Context: 8K
text

Meta Llama/Llama 4 Maverick 17b 128e Instruct

Meta Llama/Llama 4 Maverick 17b 128e Instruct is available via Groq with a 131K context window and up to 8,192 output tokens. Pricing: $0.2000/1M input tokens, $0.6000/1M output tokens.

Input: $0.20/1M Output: $0.60/1M Context: 131K
text vision function calling json mode

Qwen/Qwen3 32b

Qwen/Qwen3 32b is available via Groq with a 131K context window and up to 131,000 output tokens. Pricing: $0.2900/1M input tokens, $0.5900/1M output tokens.

Input: $0.29/1M Output: $0.59/1M Context: 131K
text function calling reasoning

Llama 3.3 70b Versatile

Llama 3.3 70b Versatile is available via Groq with a 128K context window and up to 32,768 output tokens. Pricing: $0.5900/1M input tokens, $0.7900/1M output tokens.

Input: $0.59/1M Output: $0.79/1M Context: 128K
text function calling

Moonshotai/Kimi K2 Instruct 0905

Moonshotai/Kimi K2 Instruct 0905 is available via Groq with a 262K context window and up to 16,384 output tokens. Pricing: $1.00/1M input tokens, $3.00/1M output tokens.

Input: $1.00/1M Output: $3.00/1M Context: 262K
text function calling json mode

Compare Groq model pricing

Use our pricing calculator to find the cheapest Groq model for your workload.

Pricing Calculator Compare Models All Models Directory

Related Reading

OpenAI vs Anthropic vs Google: Which AI API Should You Choose? → Cheapest LLM API in 2026: Complete Pricing Comparison → OpenAI API Pricing Guide 2026 →