Groq Models

Groq provides ultra-fast inference using custom LPU hardware, delivering the fastest token generation speeds available. They host popular open-weight models like Llama and Mixtral with industry-leading latency.

Visit Groq →

Models Available

$0.050

Cheapest Input / 1M

262K

Largest Context

What is Groq?

Groq is an AI model provider offering 11 large language models for developers. Their cheapest model starts at $0.050 per 1M input tokens, and their largest context window reaches 262K. Groq provides ultra-fast inference using custom LPU hardware, delivering the fastest token generation speeds available. They host popular open-weight models like Llama and Mixtral with industry-leading latency.

Groq Strengths

✓ Fastest inference speeds

✓ Custom LPU hardware

✓ Competitive open-model pricing

✓ Low latency

All Groq Models

Model	Input $/1M	Output $/1M	Context	Max Output	Released
Llama 3.1 8b Instant	$0.050	$0.080	128K	8,192	—
Gemma 7b It	$0.050	$0.080	8K	8,192	—
Openai/Gpt Oss 20b	$0.075	$0.30	131K	32,768	—
Openai/Gpt Oss Safeguard 20b	$0.075	$0.30	131K	65,536	—
Meta Llama/Llama 4 Scout 17b 16e Instruct	$0.11	$0.34	131K	8,192	—
Openai/Gpt Oss 120b	$0.15	$0.60	131K	32,766	—
Meta Llama/Llama Guard 4 12b	$0.20	$0.20	8K	8,192	—
Meta Llama/Llama 4 Maverick 17b 128e Instruct	$0.20	$0.60	131K	8,192	—
Qwen/Qwen3 32b	$0.29	$0.59	131K	131,000	—
Llama 3.3 70b Versatile	$0.59	$0.79	128K	32,768	—
Moonshotai/Kimi K2 Instruct 0905	$1.00	$3.00	262K	16,384	—

Model Details

Llama 3.1 8b Instant

Llama 3.1 8b Instant is available via Groq with a 128K context window and up to 8,192 output tokens. Pricing: $0.0500/1M input tokens, $0.0800/1M output tokens.

Input: $0.050/1M Output: $0.080/1M Context: 128K

text function calling

Gemma 7b It

Gemma 7b It is available via Groq with a 8K context window and up to 8,192 output tokens. Pricing: $0.0500/1M input tokens, $0.0800/1M output tokens.

Input: $0.050/1M Output: $0.080/1M Context: 8K

text function calling

Openai/Gpt Oss 20b

Openai/Gpt Oss 20b is available via Groq with a 131K context window and up to 32,768 output tokens. Pricing: $0.0750/1M input tokens, $0.3000/1M output tokens.

Input: $0.075/1M Output: $0.30/1M Context: 131K

text function calling reasoning web search json mode

Openai/Gpt Oss Safeguard 20b

Openai/Gpt Oss Safeguard 20b is available via Groq with a 131K context window and up to 65,536 output tokens. Pricing: $0.0750/1M input tokens, $0.3000/1M output tokens.

Input: $0.075/1M Output: $0.30/1M Context: 131K

text function calling reasoning web search json mode

Meta Llama/Llama 4 Scout 17b 16e Instruct

Meta Llama/Llama 4 Scout 17b 16e Instruct is available via Groq with a 131K context window and up to 8,192 output tokens. Pricing: $0.1100/1M input tokens, $0.3400/1M output tokens.

Input: $0.11/1M Output: $0.34/1M Context: 131K

text vision function calling json mode

Openai/Gpt Oss 120b

Openai/Gpt Oss 120b is available via Groq with a 131K context window and up to 32,766 output tokens. Pricing: $0.1500/1M input tokens, $0.6000/1M output tokens.

Input: $0.15/1M Output: $0.60/1M Context: 131K

text function calling reasoning web search json mode

Meta Llama/Llama Guard 4 12b

Meta Llama/Llama Guard 4 12b is available via Groq with a 8K context window and up to 8,192 output tokens. Pricing: $0.2000/1M input tokens, $0.2000/1M output tokens.

Input: $0.20/1M Output: $0.20/1M Context: 8K

text

Meta Llama/Llama 4 Maverick 17b 128e Instruct

Meta Llama/Llama 4 Maverick 17b 128e Instruct is available via Groq with a 131K context window and up to 8,192 output tokens. Pricing: $0.2000/1M input tokens, $0.6000/1M output tokens.

Input: $0.20/1M Output: $0.60/1M Context: 131K

text vision function calling json mode

Qwen/Qwen3 32b

Qwen/Qwen3 32b is available via Groq with a 131K context window and up to 131,000 output tokens. Pricing: $0.2900/1M input tokens, $0.5900/1M output tokens.

Input: $0.29/1M Output: $0.59/1M Context: 131K

text function calling reasoning

Llama 3.3 70b Versatile

Llama 3.3 70b Versatile is available via Groq with a 128K context window and up to 32,768 output tokens. Pricing: $0.5900/1M input tokens, $0.7900/1M output tokens.

Input: $0.59/1M Output: $0.79/1M Context: 128K

text function calling

Moonshotai/Kimi K2 Instruct 0905

Moonshotai/Kimi K2 Instruct 0905 is available via Groq with a 262K context window and up to 16,384 output tokens. Pricing: $1.00/1M input tokens, $3.00/1M output tokens.

Input: $1.00/1M Output: $3.00/1M Context: 262K

text function calling json mode

Compare Groq model pricing

Use our pricing calculator to find the cheapest Groq model for your workload.

Pricing Calculator Compare Models All Models Directory

Groq Models

What is Groq?

Groq Strengths

All Groq Models

Model Details

Llama 3.1 8b Instant

Gemma 7b It

Openai/Gpt Oss 20b

Openai/Gpt Oss Safeguard 20b

Meta Llama/Llama 4 Scout 17b 16e Instruct

Openai/Gpt Oss 120b

Meta Llama/Llama Guard 4 12b

Meta Llama/Llama 4 Maverick 17b 128e Instruct

Qwen/Qwen3 32b

Llama 3.3 70b Versatile

Moonshotai/Kimi K2 Instruct 0905

Compare Groq model pricing

Related Reading