ipc — Infinite Persistent Context
Category: context · Cloud + Local · Status: v1 — production
Long sessions normally hit the model’s context limit and start dropping turns. ipc compresses old turns into shorter representations as the conversation grows, so the session can keep running for hours, days, or weeks without losing structure.
What it does
Watches the running token count. When it crosses targetUtilization of the model’s context window, the oldest uncompressed turns get compressed. They’re never deleted from the archive — just replaced in the active prompt with shorter forms.
Active context (sent to model):
[L0] recent turns — verbatim
[L1] older turns — tool results truncated
[L2] older turns — full summary
[L3] oldest turns — single sentence per block
Archive (kept in storage):
every original message, byte-for-byte, available for rehydrationWhen to use it
✅ Long coding sessions (hours of back-and-forth) ✅ Research agents (read 200 sources, compose 1 report) ✅ Customer-support agents handling complex multi-turn tickets ✅ Any agent that hits “context window full” today
❌ One-shot Q&A ❌ Apps that already keep their own message history outside the LLM context
Configuration
ipc:
targetUtilization: 0.75 # compress when prompt > 75% of context window
geminiCompression: false # use small LLM for L2 summaries (v1.1)
archiveToBlob: true # back up evicted messages for rehydration
preserveLastTurns: 5 # never compress the N most recent turns
preserveSystem: true # never compress the system messageMetrics emitted
ipc.tokens.before(number)ipc.tokens.after(number)ipc.tokens.saved(number)ipc.compressed_turns(number)
Examples
Default — kicks in when you’re getting close to the wall:
ipc:
targetUtilization: 0.75Aggressive — compress earlier, keep more headroom:
ipc:
targetUtilization: 0.5
preserveLastTurns: 3Quality-first — wait until you’re really pushing it:
ipc:
targetUtilization: 0.9
preserveLastTurns: 10How it works
-
Pre hook:
- Estimate total tokens in the request.
- Get model’s context window from the pricing table.
- If
tokens / contextSize < targetUtilization: do nothing. - Otherwise, walk turns oldest-to-newest. Compress each one a level until under target.
-
Compression levels:
- L0 → no change (recent turns).
- L1 → tool results truncated to 200 chars + length suffix.
- L2 → first sentence of each turn (extractive summary).
- L3 → one-line block summary across multiple adjacent turns.
-
Archive (if
archiveToBlob: true):- Original messages are written to blob storage keyed by session + index.
- The
rehydratormodule (v1.1) can pull them back when the user references something we compressed.
v1 uses extractive compression (first-sentence-per-turn). It’s deterministic, fast, and free. v1.1 swaps in abstractive compression with a small LLM call for L2+ levels — meaningfully better summaries at the cost of one extra round trip per compression event.
Compatibility
ipc plays nicely with:
mcp-optimizer— run mcp-optimizer first (drops tools), thenipcmeasures the actual prompt size.prompt-optimizer(v1.1) — run afteripcso cache markers land on the compressed-stable prefix.rehydrator(v1.1) — depends onipcfor the archive.
Cloud vs Local
| Mode | Archive backend |
|---|---|
| Cloud | R2 |
| Local | ~/.prxy/blob/{session}/{turn}.json |
The archive is keyed by session ID, so even if a user starts a new session the old archive stays available for rehydrator lookups.