1. Model Context Specifications Table
Below is a consolidated list of model specifications and input/output pricing.
2. Understanding Output Token Ceilings
A model's context window represents its total capacity. However, models also have output token ceilings. For example, GPT-4o has a 128k context but is capped at 4k output tokens per request.
Frequently Asked Questions
Why are output limits different from context windows?
Autoregressive generation is slow and computationally demanding. Providers restrict output sizes to prevent individual queries from hogging GPU resources.
Which model has the largest output limit?
Anthropic's Claude 3.5 Sonnet supports up to 8,000 output tokens per request, while Claude 3.5 Opus is capped at 4,096 tokens.