Interactive Context Window Calculator
Want to calculate your exact parameters and operational expenses? Run the calculations locally inside your browser.
Launch Context Window Calculator1. Short-Term Context Window Stuffing
The simplest way to maintain state is sending the entire conversation history in every API request. This provides the model with perfect context, but it increases token consumption and costs with each subsequent message exchange.
2. Long-Term Semantic Memory (Vector Stores)
Instead of sending the entire conversation history, store past logs in a vector database. Use semantic search to retrieve only the relevant past messages and inject them into the prompt. This keeps token counts low but can occasionally result in the model missing context.
3. Summary and State-Distillation Systems
A middle-ground approach is summarization: run a background process that condenses past messages into a brief summary paragraph. Send this summary alongside the most recent messages, keeping token consumption stable over long chat sessions.
Frequently Asked Questions
Are LLMs stateless?
Yes. APIs do not retain any data from previous requests. You must send the history of the conversation in every new request to maintain context.
What is semantic memory?
A system where past conversations are saved as embedding vectors in a database. When a user asks a question, the system retrieves only the semantically similar past logs to include in the prompt.