
How It Works
Cline monitors token usage during your conversation. When you’re getting close to the limit, he:- Creates a comprehensive summary of everything that’s happened
- Preserves all technical details, code changes, and decisions
- Replaces the conversation history with the summary
- Continues exactly where he left off
Why This Matters
Previously, Cline would truncate older messages when hitting context limits, losing important context. Now with summarization:- All technical decisions and code patterns are preserved
- File changes and project context remain intact
- Cline remembers everything he’s done
- You can work on much larger projects without interruption
Cost Considerations
Summarization leverages your existing prompt cache from the conversation, so it costs about the same as any other tool call. Since most input tokens are already cached, you’re primarily paying for summary generation (output tokens), making it cost-effective.Supported Models
Auto Compact uses advanced LLM-based summarization for these models:- Claude 4 series
- Gemini 2.5 series
- GPT-5
- Grok 4
With other models, Cline falls back to standard rule-based context truncation, even if Auto Compact is enabled.

