Compaction
As a conversation grows, its history eventually approaches the model's context window. Compaction keeps a session going by replacing older turns with a single checkpoint summary produced by the agent's own model, while keeping the most recent turns verbatim. The work done so far is preserved as a concise handoff; the tokens it used are freed.
Compaction is built in. It runs automatically when context fills up, and you can also trigger it on demand from the TUI or the Harness API.
How it works
A compaction does three things:
- Plans retention. Leading template messages (system-prompt-like content)
are always kept. The most recent
keep_turnsturns are kept verbatim. The turns in between are the compact region. The boundary always falls on a user turn, so an assistant tool call is never separated from its tool result. - Summarizes. The compact region is sent to the agent's model with a summarization prompt asking for a handoff summary for "another LLM that will resume the work." This is a side-channel call — it does not add the request to your history, and uses no tools.
- Replaces. History becomes
templates + summary + recent turns. The summary is a clearly-marked message (it shows ascompactedin/history), not an ordinary user message. The original pre-compaction history is archived to acompacted_*.jsonfile in the session directory, so nothing is lost.
If the summarization call fails or returns nothing, history is left untouched.
Automatic compaction
By default (compaction.auto: true), fast-agent checks context usage after each
completed turn and compacts when usage crosses compaction.threshold (see the
generated default below). The trigger uses server-observed token usage from
the last response, not an estimate, so it reflects what the provider actually
charged.
This applies everywhere agents run: the TUI, fast.run(), and the Harness API.
There is nothing to enable.
compaction:
auto: true # automatically compact when the threshold is crossed
threshold: 0.85 # fraction of the context window that triggers compaction
keep_turns: 2 # recent turns kept verbatim after compaction
# prompt: null # built-in prompt; set inline text or a relative file path
To turn auto-compaction off and rely solely on manual /compact, set
compaction.auto: false.
Tuning the trigger
The point at which compaction kicks in is compaction.threshold. Lower it to
compact earlier (more headroom, more frequent summarization); raise it to let
context fill closer to the limit before compacting.
| Setting | Default | Description |
|---|---|---|
compaction.auto |
true |
Automatically compact history when context usage crosses the threshold |
compaction.threshold |
0.85 |
Fraction of the model context window that triggers auto-compaction |
compaction.keep_turns |
2 |
Number of recent complete turns kept verbatim after compaction |
compaction.prompt |
null |
Custom summarization prompt for compaction. Inline text, or a path to a text/markdown file. None uses the built-in prompt (see /compact prompt). |
The summarization prompt
The built-in prompt asks the model for a structured handoff summary covering goals, decisions, progress, what remains, and any critical data needed to continue. You can see the exact prompt in use at any time:
To customize it, set compaction.prompt to either inline text or a path to a
text/markdown file. Relative file paths resolve from the directory of the loaded
config file (not the process working directory):
Manual compaction in the TUI
| Command | Effect |
|---|---|
/compact |
Compact now, showing the before → after context usage. |
/compact <instructions> |
Steer the summary, e.g. /compact focus on the database migration. |
/compact preview |
Show what would be kept and dropped — no model call. |
/compact prompt |
Print the active summarization prompt. |
/compact preview is free: it reports which turns would be summarized and an
estimated before → after token count without calling the model. Use it to decide
whether compaction is worthwhile before paying for it.
While /compact runs, the streaming token progress display is shown — the same
live indicator as a normal turn — so you can watch the summary being generated.
After compaction, the before/after context window is visualized, and the summary remains in history. Because the summary is an ordinary part of the conversation, you can correct or extend it with your next message if the model missed something.
How /history behaves after compaction
The checkpoint summary appears in /history as a compacted row (marked with
≡), distinct from user and assistant turns, with a short preview of the summary
content. Turns that were folded into the summary no longer appear individually —
they live in the compacted_*.json archive instead. The recent turns kept
verbatim continue to display normally.
To recover the full pre-compaction transcript, load the archive:
Manual compaction from the Harness API
Under the Harness API, auto-compaction is on by default just as in the TUI. To compact a session explicitly:
result = await session.compact()
print(f"{result.messages_before} → {result.messages_after} messages")
compact() accepts optional instructions (one-off focus for the summary) and
agent_name (target a specific agent in the session). It returns a
CompactionResult and raises CompactionSkipped when there is nothing worth
compacting. See the Harness API
guide for details.
For building custom triggers or tooling, the primitives in
fast_agent.history.compaction are importable directly — including
plan_compaction for a model-call-free retention preview and should_auto_compact
for the threshold check.
What is kept, summarized, and archived
| Kept verbatim | Summarized | Archived | |
|---|---|---|---|
| Leading template / system messages | ✅ | ||
| Older turns (the compact region) | ✅ | ✅ | |
Most recent keep_turns turns |
✅ | ||
| Full pre-compaction history | ✅ |
The summary itself is retained in the live history as the checkpoint. Tool calls
and their results are summarized together (never split). A single very large turn
is always eligible for compaction, even when keep_turns would otherwise retain
it, so a runaway tool loop can still be brought back under the limit.