Context Window vs. Memory: How LLMs Keep State | ToolStrategyHub

Interactive Context Window Calculator

Want to calculate your exact parameters and operational expenses? Run the calculations locally inside your browser.

1. Short-Term Context Window Stuffing

The simplest way to maintain state is sending the entire conversation history in every API request. This provides the model with perfect context, but it increases token consumption and costs with each subsequent message exchange.

2. Long-Term Semantic Memory (Vector Stores)

Instead of sending the entire conversation history, store past logs in a vector database. Use semantic search to retrieve only the relevant past messages and inject them into the prompt. This keeps token counts low but can occasionally result in the model missing context.

3. Summary and State-Distillation Systems

A middle-ground approach is summarization: run a background process that condenses past messages into a brief summary paragraph. Send this summary alongside the most recent messages, keeping token consumption stable over long chat sessions.

Frequently Asked Questions

Are LLMs stateless?

Yes. APIs do not retain any data from previous requests. You must send the history of the conversation in every new request to maintain context.

What is semantic memory?

A system where past conversations are saved as embedding vectors in a database. When a user asks a question, the system retrieves only the semantically similar past logs to include in the prompt.