Context Compaction¶
As conversations grow, they eventually exceed the model's context window. Compaction reduces the session's event history to fit within that window while preserving conversational coherence. It is driven by two composable abstractions: triggers (when to compact) and strategies (how to compact).
Entry point¶
SessionService.compact() is the single entry point. It evaluates the trigger first and
only runs the strategy — and writes back to the repository — when the trigger fires:
// Compact when turn count exceeds 20, keeping the last 10 events
CompactionResult result = service.compact(
sessionId,
new TurnCountTrigger(20),
SlidingWindowCompactionStrategy.builder().maxEvents(10).build()
);
System.out.println(result.eventsRemoved()); // derived: archivedEvents().size()
System.out.println(result.compactedEvents()); // the kept event list
System.out.println(result.archivedEvents()); // the removed event list
System.out.println(result.tokensEstimatedSaved()); // rough token saving estimate
// Compact unconditionally — pass an always-fire trigger
service.compact(sessionId, req -> true, SlidingWindowCompactionStrategy.builder().maxEvents(10).build());
CAS write safety
DefaultSessionService.compact() reads the event-log version before fetching
events. If another writer mutated the log between that read and the write,
replaceEvents() returns false and compaction is silently skipped — the concurrent
writer already handled the session. No-op results skip the write entirely, important
for production persistence backends.
Compaction Triggers¶
Triggers implement CompactionTrigger (a @FunctionalInterface) and decide whether
compaction should run based on the current CompactionRequest.
TurnCountTrigger¶
Fires when the session has more than n complete turns. Only non-synthetic, root-level
(branch == null) USER events count toward the turn total.
TokenCountTrigger¶
Fires when the estimated total token count is at or above a threshold.
// Uses JTokkitTokenCountEstimator by default
TokenCountTrigger.builder().threshold(4000).build();
// Custom estimator (e.g. for a different model's tokenizer)
TokenCountTrigger.builder().threshold(4000).tokenCountEstimator(myEstimator).build();
CompositeCompactionTrigger¶
Combines multiple triggers with OR semantics — compaction fires if any trigger fires.
CompactionTrigger trigger = CompositeCompactionTrigger.anyOf(
new TurnCountTrigger(20),
TokenCountTrigger.builder().threshold(4000).build()
);
Compaction Strategies¶
Strategies implement CompactionStrategy (a @FunctionalInterface) and define what to
do with the event history. Each strategy receives a CompactionRequest containing the
session metadata and the full event list.
SlidingWindowCompactionStrategy¶
Keeps the last N real events. Simple, predictable, no LLM call required. Synthetic
summary events are always preserved and placed first; they do not count against the
maxEvents budget.
// keep the last 20 real events
SlidingWindowCompactionStrategy.builder().maxEvents(20).build();
// custom token estimator
SlidingWindowCompactionStrategy.builder().maxEvents(20).tokenCountEstimator(myEstimator).build();
Algorithm
- Separate synthetic events (always preserved, placed first in output).
- Keep the last
maxEventsreal events. - Snap the cut point forward to the nearest
USERmessage (turn-boundary safety). - Return:
[synthetics] + [kept real events].
TurnWindowCompactionStrategy¶
Keeps the last N complete turns. Unlike the sliding window, this never cuts inside a
turn — it always archives whole user↔agent exchanges.
// keep the last 10 turns
TurnWindowCompactionStrategy.builder().maxTurns(10).build();
// custom token estimator
TurnWindowCompactionStrategy.builder().maxTurns(10).tokenCountEstimator(myEstimator).build();
Algorithm
- Strip synthetic events (always preserved, placed first in output).
- Collect preamble events that appear before the first
USERmessage. - Group remaining events into turns (each turn starts at a
USERmessage). - Archive the oldest turns until only
maxTurnsremain. - Return:
[synthetics] + [preamble] + [kept turns].
TokenCountCompactionStrategy¶
Keeps a contiguous suffix of events that fits within a token budget, walking from newest to oldest.
// stay within 4000 tokens
TokenCountCompactionStrategy.builder().maxTokens(4000).build();
// custom estimator
TokenCountCompactionStrategy.builder().maxTokens(4000).tokenCountEstimator(myEstimator).build();
Algorithm
- Separate synthetic events (their token cost is deducted from the budget first).
- Walk real events from newest to oldest, accumulating token cost. Stop at the first event that would exceed the remaining budget. This produces a contiguous suffix — skipping individual oversize events would create non-contiguous gaps that break conversation coherence.
- Drop any leading kept events that are not
USERmessages (turn-boundary safety). - Return:
[synthetics] + [kept events].
RecursiveSummarizationCompactionStrategy¶
LLM-powered strategy that uses a ChatClient to summarize the events being archived.
The summary is stored as a synthetic user+assistant turn so subsequent compaction passes
can build on it — creating a rolling, recursive compressed history.
RecursiveSummarizationCompactionStrategy strategy =
RecursiveSummarizationCompactionStrategy.builder(chatClient)
.maxEventsToKeep(10) // active window size (real events kept intact)
.overlapSize(2) // events from active window fed to summary prompt
// must be < maxEventsToKeep; defaults to 2
.systemPrompt("...") // optional custom system prompt
.shadowPrompt("...") // optional custom USER shadow prompt
.tokenCountEstimator(myEst) // custom estimator (default: JTokkitTokenCountEstimator)
.build();
Builder validation
overlapSize must be >= 0 and strictly less than maxEventsToKeep. Violating this
throws IllegalArgumentException.
LLM failure handling
If the LLM returns a null or blank summary, the strategy logs a WARN-level message and
skips compaction — the event history is left unchanged. Register an optional failure
callback to react programmatically:
RecursiveSummarizationCompactionStrategy strategy =
RecursiveSummarizationCompactionStrategy.builder(chatClient)
.maxEventsToKeep(10)
.onSummarizationFailure(req -> {
log.error("Compaction failed for session {}", req.session().id());
// retry, alert, increment a metric, etc.
})
.build();
Algorithm
- Separate synthetic and real events.
- Compute the raw cut point: the newest
maxEventsToKeepreal events form the active window. - Snap the cut point forward to the nearest turn boundary.
- Feed
[prior synthetic summaries] + [events to archive] + [overlap events]to the LLM. - Replace the archived events with a new synthetic summary turn
[USER shadow, ASSISTANT summary]. - Return:
[summary turn] + [active window].
The recursive property: the ASSISTANT text from any prior synthetic summary is fed
back to the LLM as === PRIOR SUMMARY === context, so each summary builds on its
predecessors without starting from scratch.
Turn-boundary Safety¶
All four strategies share a common safety rule enforced by
CompactionUtils.snapToTurnStart: the kept window always starts at a USER message.
If a raw cut point lands on an ASSISTANT or TOOL event in the middle of a turn, it
is advanced forward to the next USER message. This prevents keeping a tool result or
assistant reply without the user message that originated its turn.
Before snap: [u1, a1, u2, a2, | a3, u3, a3] ← cut lands on a3 (middle of turn 2)
After snap: [u1, a1, u2, a2, a3, | u3, a3] ← cut moved to u3 (turn start)
Choosing a strategy¶
| Strategy | LLM call? | Context preserved | Best for |
|---|---|---|---|
SlidingWindowCompactionStrategy |
No | Last N messages verbatim | Cost-sensitive, short-term context |
TurnWindowCompactionStrategy |
No | Last N complete turns verbatim | Turn-structured dialogues |
TokenCountCompactionStrategy |
No | Token-budget suffix verbatim | Hard context window limits |
RecursiveSummarizationCompactionStrategy |
Yes | Rolling LLM summary + active window | Long-running, context-rich sessions |