67
Models Available
$0.020
Cheapest Input / 1M
1.0M
Largest Context
What is DeepInfra?
DeepInfra is an AI model provider offering 67 large language models for developers. Their cheapest model starts at $0.020 per 1M input tokens, and their largest context window reaches 1.0M. DeepInfra provides 67 AI models accessible via API.
DeepInfra Strengths
All DeepInfra Models
Model Details
Meta Llama/Llama 3.2 3B Instruct
Meta Llama/Llama 3.2 3B Instruct is available via DeepInfra with a 131K context window and up to 131,072 output tokens. Pricing: $0.0200/1M input tokens, $0.0200/1M output tokens.
Meta Llama/Meta Llama 3.1 8B Instruct Turbo
Meta Llama/Meta Llama 3.1 8B Instruct Turbo is available via DeepInfra with a 131K context window and up to 131,072 output tokens. Pricing: $0.0200/1M input tokens, $0.0300/1M output tokens.
Mistralai/Mistral Nemo Instruct 2407
Mistralai/Mistral Nemo Instruct 2407 is available via DeepInfra with a 131K context window and up to 131,072 output tokens. Pricing: $0.0200/1M input tokens, $0.0400/1M output tokens.
Meta Llama/Meta Llama 3 8B Instruct
Meta Llama/Meta Llama 3 8B Instruct is available via DeepInfra with a 8K context window and up to 8,192 output tokens. Pricing: $0.0300/1M input tokens, $0.0600/1M output tokens.
Meta Llama/Meta Llama 3.1 8B Instruct
Meta Llama/Meta Llama 3.1 8B Instruct is available via DeepInfra with a 131K context window and up to 131,072 output tokens. Pricing: $0.0300/1M input tokens, $0.0500/1M output tokens.
Qwen/Qwen2.5 7B Instruct
Qwen/Qwen2.5 7B Instruct is available via DeepInfra with a 33K context window and up to 32,768 output tokens. Pricing: $0.0400/1M input tokens, $0.1000/1M output tokens.
Sao10K/L3 8B Lunaris V1 Turbo
Sao10K/L3 8B Lunaris V1 Turbo is available via DeepInfra with a 8K context window and up to 8,192 output tokens. Pricing: $0.0400/1M input tokens, $0.0500/1M output tokens.
Google/Gemma 3 4b It
Google/Gemma 3 4b It is available via DeepInfra with a 131K context window and up to 131,072 output tokens. Pricing: $0.0400/1M input tokens, $0.0800/1M output tokens.
Nvidia/NVIDIA Nemotron Nano 9B
Nvidia/NVIDIA Nemotron Nano 9B is available via DeepInfra with a 131K context window and up to 131,072 output tokens. Pricing: $0.0400/1M input tokens, $0.1600/1M output tokens.
Openai/Gpt Oss 20b
Openai/Gpt Oss 20b is available via DeepInfra with a 131K context window and up to 131,072 output tokens. Pricing: $0.0400/1M input tokens, $0.1500/1M output tokens.
Meta Llama/Llama 3.2 11B Vision Instruct
Meta Llama/Llama 3.2 11B Vision Instruct is available via DeepInfra with a 131K context window and up to 131,072 output tokens. Pricing: $0.0490/1M input tokens, $0.0490/1M output tokens.
Google/Gemma 3 12b It
Google/Gemma 3 12b It is available via DeepInfra with a 131K context window and up to 131,072 output tokens. Pricing: $0.0500/1M input tokens, $0.1000/1M output tokens.
Mistralai/Mistral Small 24B Instruct 2501
Mistralai/Mistral Small 24B Instruct 2501 is available via DeepInfra with a 33K context window and up to 32,768 output tokens. Pricing: $0.0500/1M input tokens, $0.0800/1M output tokens.
Openai/Gpt Oss 120b
Openai/Gpt Oss 120b is available via DeepInfra with a 131K context window and up to 131,072 output tokens. Pricing: $0.0500/1M input tokens, $0.4500/1M output tokens.
Meta Llama/Llama Guard 3 8B
Meta Llama/Llama Guard 3 8B is available via DeepInfra with a 131K context window and up to 131,072 output tokens. Pricing: $0.0550/1M input tokens, $0.0550/1M output tokens.
Qwen/Qwen3 14B
Qwen/Qwen3 14B is available via DeepInfra with a 41K context window and up to 40,960 output tokens. Pricing: $0.0600/1M input tokens, $0.2400/1M output tokens.
Microsoft/Phi 4
Microsoft/Phi 4 is available via DeepInfra with a 16K context window and up to 16,384 output tokens. Pricing: $0.0700/1M input tokens, $0.1400/1M output tokens.
Mistralai/Mistral Small 3.2 24B Instruct 2506
Mistralai/Mistral Small 3.2 24B Instruct 2506 is available via DeepInfra with a 128K context window and up to 128,000 output tokens. Pricing: $0.0750/1M input tokens, $0.2000/1M output tokens.
Gryphe/MythoMax L2 13b
Gryphe/MythoMax L2 13b is available via DeepInfra with a 4K context window and up to 4,096 output tokens. Pricing: $0.0800/1M input tokens, $0.0900/1M output tokens.
Qwen/Qwen3 30B A3B
Qwen/Qwen3 30B A3B is available via DeepInfra with a 41K context window and up to 40,960 output tokens. Pricing: $0.0800/1M input tokens, $0.2900/1M output tokens.
Meta Llama/Llama 4 Scout 17B 16E Instruct
Meta Llama/Llama 4 Scout 17B 16E Instruct is available via DeepInfra with a 328K context window and up to 327,680 output tokens. Pricing: $0.0800/1M input tokens, $0.3000/1M output tokens.
Qwen/Qwen3 235B A22B Instruct 2507
Qwen/Qwen3 235B A22B Instruct 2507 is available via DeepInfra with a 262K context window and up to 262,144 output tokens. Pricing: $0.0900/1M input tokens, $0.6000/1M output tokens.
Google/Gemma 3 27b It
Google/Gemma 3 27b It is available via DeepInfra with a 131K context window and up to 131,072 output tokens. Pricing: $0.0900/1M input tokens, $0.1600/1M output tokens.
Qwen/Qwen3 32B
Qwen/Qwen3 32B is available via DeepInfra with a 41K context window and up to 40,960 output tokens. Pricing: $0.1000/1M input tokens, $0.2800/1M output tokens.
Google/Gemini 2.0 Flash 001
Google/Gemini 2.0 Flash 001 is available via DeepInfra with a 1M context window and up to 1,000,000 output tokens. Pricing: $0.1000/1M input tokens, $0.4000/1M output tokens.
Meta Llama/Meta Llama 3.1 70B Instruct Turbo
Meta Llama/Meta Llama 3.1 70B Instruct Turbo is available via DeepInfra with a 131K context window and up to 131,072 output tokens. Pricing: $0.1000/1M input tokens, $0.2800/1M output tokens.
Nvidia/Llama 3.3 Nemotron Super 49B V1.5
Nvidia/Llama 3.3 Nemotron Super 49B V1.5 is available via DeepInfra with a 131K context window and up to 131,072 output tokens. Pricing: $0.1000/1M input tokens, $0.4000/1M output tokens.
Qwen/Qwen2.5 72B Instruct
Qwen/Qwen2.5 72B Instruct is available via DeepInfra with a 33K context window and up to 32,768 output tokens. Pricing: $0.1200/1M input tokens, $0.3900/1M output tokens.
Meta Llama/Llama 3.3 70B Instruct Turbo
Meta Llama/Llama 3.3 70B Instruct Turbo is available via DeepInfra with a 131K context window and up to 131,072 output tokens. Pricing: $0.1300/1M input tokens, $0.3900/1M output tokens.
Qwen/Qwen3 Next 80B A3B Instruct
Qwen/Qwen3 Next 80B A3B Instruct is available via DeepInfra with a 262K context window and up to 262,144 output tokens. Pricing: $0.1400/1M input tokens, $1.40/1M output tokens.
Qwen/Qwen3 Next 80B A3B Thinking
Qwen/Qwen3 Next 80B A3B Thinking is available via DeepInfra with a 262K context window and up to 262,144 output tokens. Pricing: $0.1400/1M input tokens, $1.40/1M output tokens.
Qwen/QwQ 32B
Qwen/QwQ 32B is available via DeepInfra with a 131K context window and up to 131,072 output tokens. Pricing: $0.1500/1M input tokens, $0.4000/1M output tokens.
Meta Llama/Llama 4 Maverick 17B 128E Instruct FP8
Meta Llama/Llama 4 Maverick 17B 128E Instruct FP8 is available via DeepInfra with a 1.0M context window and up to 1,048,576 output tokens. Pricing: $0.1500/1M input tokens, $0.6000/1M output tokens.
Qwen/Qwen3 235B A22B
Qwen/Qwen3 235B A22B is available via DeepInfra with a 41K context window and up to 40,960 output tokens. Pricing: $0.1800/1M input tokens, $0.5400/1M output tokens.
Meta Llama/Llama Guard 4 12B
Meta Llama/Llama Guard 4 12B is available via DeepInfra with a 164K context window and up to 163,840 output tokens. Pricing: $0.1800/1M input tokens, $0.1800/1M output tokens.
Qwen/Qwen2.5 VL 32B Instruct
Qwen/Qwen2.5 VL 32B Instruct is available via DeepInfra with a 128K context window and up to 128,000 output tokens. Pricing: $0.2000/1M input tokens, $0.6000/1M output tokens.
Deepseek Ai/DeepSeek R1 Distill Llama 70B
Deepseek Ai/DeepSeek R1 Distill Llama 70B is available via DeepInfra with a 131K context window and up to 131,072 output tokens. Pricing: $0.2000/1M input tokens, $0.6000/1M output tokens.
Meta Llama/Llama 3.3 70B Instruct
Meta Llama/Llama 3.3 70B Instruct is available via DeepInfra with a 131K context window and up to 131,072 output tokens. Pricing: $0.2300/1M input tokens, $0.4000/1M output tokens.
Deepseek Ai/DeepSeek V3 0324
Deepseek Ai/DeepSeek V3 0324 is available via DeepInfra with a 164K context window and up to 163,840 output tokens. Pricing: $0.2500/1M input tokens, $0.8800/1M output tokens.
Allenai/OlmOCR 7B 0725 FP8
Allenai/OlmOCR 7B 0725 FP8 is available via DeepInfra with a 16K context window and up to 16,384 output tokens. Pricing: $0.2700/1M input tokens, $1.50/1M output tokens.
Deepseek Ai/DeepSeek R1 Distill Qwen 32B
Deepseek Ai/DeepSeek R1 Distill Qwen 32B is available via DeepInfra with a 131K context window and up to 131,072 output tokens. Pricing: $0.2700/1M input tokens, $0.2700/1M output tokens.
Deepseek Ai/DeepSeek V3.1
Deepseek Ai/DeepSeek V3.1 is available via DeepInfra with a 164K context window and up to 163,840 output tokens. Pricing: $0.2700/1M input tokens, $1.00/1M output tokens.
Deepseek Ai/DeepSeek V3.1 Terminus
Deepseek Ai/DeepSeek V3.1 Terminus is available via DeepInfra with a 164K context window and up to 163,840 output tokens. Pricing: $0.2700/1M input tokens, $1.00/1M output tokens.
Qwen/Qwen3 Coder 480B A35B Instruct Turbo
Qwen/Qwen3 Coder 480B A35B Instruct Turbo is available via DeepInfra with a 262K context window and up to 262,144 output tokens. Pricing: $0.2900/1M input tokens, $1.20/1M output tokens.
NousResearch/Hermes 3 Llama 3.1 70B
NousResearch/Hermes 3 Llama 3.1 70B is available via DeepInfra with a 131K context window and up to 131,072 output tokens. Pricing: $0.3000/1M input tokens, $0.3000/1M output tokens.
Qwen/Qwen3 235B A22B Thinking 2507
Qwen/Qwen3 235B A22B Thinking 2507 is available via DeepInfra with a 262K context window and up to 262,144 output tokens. Pricing: $0.3000/1M input tokens, $2.90/1M output tokens.
Google/Gemini 2.5 Flash
Google/Gemini 2.5 Flash is available via DeepInfra with a 1M context window and up to 1,000,000 output tokens. Pricing: $0.3000/1M input tokens, $2.50/1M output tokens.
Deepseek Ai/DeepSeek V3
Deepseek Ai/DeepSeek V3 is available via DeepInfra with a 164K context window and up to 163,840 output tokens. Pricing: $0.3800/1M input tokens, $0.8900/1M output tokens.
Qwen/Qwen3 Coder 480B A35B Instruct
Qwen/Qwen3 Coder 480B A35B Instruct is available via DeepInfra with a 262K context window and up to 262,144 output tokens. Pricing: $0.4000/1M input tokens, $1.60/1M output tokens.
Meta Llama/Meta Llama 3.1 70B Instruct
Meta Llama/Meta Llama 3.1 70B Instruct is available via DeepInfra with a 131K context window and up to 131,072 output tokens. Pricing: $0.4000/1M input tokens, $0.4000/1M output tokens.
Mistralai/Mixtral 8x7B Instruct V0.1
Mistralai/Mixtral 8x7B Instruct V0.1 is available via DeepInfra with a 33K context window and up to 32,768 output tokens. Pricing: $0.4000/1M input tokens, $0.4000/1M output tokens.
Zai Org/GLM 4.5
Zai Org/GLM 4.5 is available via DeepInfra with a 131K context window and up to 131,072 output tokens. Pricing: $0.4000/1M input tokens, $1.60/1M output tokens.
Microsoft/WizardLM 2 8x22B
Microsoft/WizardLM 2 8x22B is available via DeepInfra with a 66K context window and up to 65,536 output tokens. Pricing: $0.4800/1M input tokens, $0.4800/1M output tokens.
Deepseek Ai/DeepSeek R1 0528
Deepseek Ai/DeepSeek R1 0528 is available via DeepInfra with a 164K context window and up to 163,840 output tokens. Pricing: $0.5000/1M input tokens, $2.15/1M output tokens.
Moonshotai/Kimi K2 Instruct
Moonshotai/Kimi K2 Instruct is available via DeepInfra with a 131K context window and up to 131,072 output tokens. Pricing: $0.5000/1M input tokens, $2.00/1M output tokens.
Moonshotai/Kimi K2 Instruct 0905
Moonshotai/Kimi K2 Instruct 0905 is available via DeepInfra with a 262K context window and up to 262,144 output tokens. Pricing: $0.5000/1M input tokens, $2.00/1M output tokens.
Nvidia/Llama 3.1 Nemotron 70B Instruct
Nvidia/Llama 3.1 Nemotron 70B Instruct is available via DeepInfra with a 131K context window and up to 131,072 output tokens. Pricing: $0.6000/1M input tokens, $0.6000/1M output tokens.
Sao10K/L3.1 70B Euryale V2.2
Sao10K/L3.1 70B Euryale V2.2 is available via DeepInfra with a 131K context window and up to 131,072 output tokens. Pricing: $0.6500/1M input tokens, $0.7500/1M output tokens.
Sao10K/L3.3 70B Euryale V2.3
Sao10K/L3.3 70B Euryale V2.3 is available via DeepInfra with a 131K context window and up to 131,072 output tokens. Pricing: $0.6500/1M input tokens, $0.7500/1M output tokens.
Deepseek Ai/DeepSeek R1
Deepseek Ai/DeepSeek R1 is available via DeepInfra with a 164K context window and up to 163,840 output tokens. Pricing: $0.7000/1M input tokens, $2.40/1M output tokens.
NousResearch/Hermes 3 Llama 3.1 405B
NousResearch/Hermes 3 Llama 3.1 405B is available via DeepInfra with a 131K context window and up to 131,072 output tokens. Pricing: $1.00/1M input tokens, $1.00/1M output tokens.
Deepseek Ai/DeepSeek R1 0528 Turbo
Deepseek Ai/DeepSeek R1 0528 Turbo is available via DeepInfra with a 33K context window and up to 32,768 output tokens. Pricing: $1.00/1M input tokens, $3.00/1M output tokens.
Deepseek Ai/DeepSeek R1 Turbo
Deepseek Ai/DeepSeek R1 Turbo is available via DeepInfra with a 41K context window and up to 40,960 output tokens. Pricing: $1.00/1M input tokens, $3.00/1M output tokens.
Google/Gemini 2.5 Pro
Google/Gemini 2.5 Pro is available via DeepInfra with a 1M context window and up to 1,000,000 output tokens. Pricing: $1.25/1M input tokens, $10.00/1M output tokens.
Anthropic/Claude 3 7 Sonnet Latest
Anthropic/Claude 3 7 Sonnet Latest is available via DeepInfra with a 200K context window and up to 200,000 output tokens. Pricing: $3.30/1M input tokens, $16.50/1M output tokens.
Anthropic/Claude 4 Sonnet
Anthropic/Claude 4 Sonnet is available via DeepInfra with a 200K context window and up to 200,000 output tokens. Pricing: $3.30/1M input tokens, $16.50/1M output tokens.
Anthropic/Claude 4 Opus
Anthropic/Claude 4 Opus is available via DeepInfra with a 200K context window and up to 200,000 output tokens. Pricing: $16.50/1M input tokens, $82.50/1M output tokens.
Compare DeepInfra model pricing
Use our pricing calculator to find the cheapest DeepInfra model for your workload.