1. Base Pricing Tiers: Pro vs. Flash vs. GPT-4o
GPT-4o costs $2.50 / MTok input and $10.00 / MTok output. Gemini 1.5 Pro costs $1.25 / MTok input (under 128k context) and $5.00 / MTok output. However, for prompts over 128k tokens, Gemini's pricing doubles to $2.50 / MTok input and $10.00 / MTok output. Gemini 1.5 Flash offers a budget option at $0.075 / MTok input.
2. Caching Implementation and Cost Savings
Gemini 1.5 Pro supports prompt caching, charging a flat 50% discount on input tokens that hit the cache (requiring contexts of 32k+ tokens). GPT-4o also provides a 50% cache discount but does not require a minimum context size, making GPT-4o's caching more accessible for smaller prompts.
3. Volume and Context Size Decisions
If your context size stays below 128k tokens, Gemini 1.5 Pro is 50% cheaper than GPT-4o. If you need to build RAG models with massive documents, Gemini's 2 million token window is essential, though you should use Gemini 1.5 Flash to prevent API costs from escalating.