AI Resources

AI API Cost Benchmarks: Price vs. Intelligence Audit

Which model provides the highest intelligence per dollar spent? This benchmark report audits major LLM cost profiles against standardized coding and reasoning evaluations (MMLU, HumanEval) to identify high-leverage APIs.

Interactive LLM Pricing Specs Explorer

Model NameProviderContext SizeInput / MTokOutput / MTokCached / MTokAction
DeepSeek V3DeepSeek128k$0.140$0.28$0.014
GPT-4o-miniOpenAI128k$0.150$0.60$0.075
Gemini 1.5 FlashGoogle2M$0.075$0.30$0.037
GPT-4oOpenAI128k$2.500$10.00$1.250
Claude 3.5 SonnetAnthropic200k$3.000$15.00$0.300
Gemini 1.5 ProGoogle2M$1.250$5.00$0.625
Claude 3.5 HaikuAnthropic200k$0.800$4.00$0.080
Mistral Large 2Mistral128k$2.000$6.00$2.000
Llama 3.3 70B (Serverless)Meta128k$0.350$0.40$0.350

Estimate Your Billing on: GPT-4o

Cost Per Query
$0.0075
Daily Running Bill
$0.75
Monthly API Expense
$22.50

1. Price-to-Performance Ratio Explained

We define the Price-to-Performance Ratio as a model's benchmark score (MMLU) divided by its blended cost per million tokens. This highlights models that punch above their weight class financially.

2. Flagship Models vs. Budget Models

Flagship models (Claude 3.5 Sonnet, GPT-4o) score high (88%+ MMLU) but cost $3.00 to $5.00 per blended million tokens. Budget models (GPT-4o-mini, Gemini 1.5 Flash) score ~82% MMLU but cost under $0.30 per million tokens. For 80% of routine workflows, budget models offer 10x superior cost-efficiency.

Frequently Asked Questions

What is the most cost-effective model for coding?

Claude 3.5 Sonnet is the gold standard for complex coding, but for simple script edits, GPT-4o-mini offers excellent accuracy at 5% of the cost.

How does DeepSeek V3 fit in price-to-performance?

DeepSeek V3 achieves scores comparable to GPT-4o while costing only 10% of OpenAI's rate, currently representing the highest price-to-performance ratio in the industry.