Skip to Content
prxy.monster v1 is in early access. See what shipped →

ipc — Infinite Persistent Context

Category: context · Cloud + Local · Status: v1 — production

Long sessions normally hit the model’s context limit and start dropping turns. ipc compresses old turns into shorter representations as the conversation grows, so the session can keep running for hours, days, or weeks without losing structure.

What it does

Watches the running token count. When it crosses targetUtilization of the model’s context window, the oldest uncompressed turns get compressed. They’re never deleted from the archive — just replaced in the active prompt with shorter forms.

Active context (sent to model): [L0] recent turns — verbatim [L1] older turns — tool results truncated [L2] older turns — full summary [L3] oldest turns — single sentence per block Archive (kept in storage): every original message, byte-for-byte, available for rehydration

When to use it

✅ Long coding sessions (hours of back-and-forth) ✅ Research agents (read 200 sources, compose 1 report) ✅ Customer-support agents handling complex multi-turn tickets ✅ Any agent that hits “context window full” today

❌ One-shot Q&A ❌ Apps that already keep their own message history outside the LLM context

Configuration

ipc: targetUtilization: 0.75 # compress when prompt > 75% of context window geminiCompression: false # use small LLM for L2 summaries (v1.1) archiveToBlob: true # back up evicted messages for rehydration preserveLastTurns: 5 # never compress the N most recent turns preserveSystem: true # never compress the system message

Metrics emitted

  • ipc.tokens.before (number)
  • ipc.tokens.after (number)
  • ipc.tokens.saved (number)
  • ipc.compressed_turns (number)

Examples

Default — kicks in when you’re getting close to the wall:

ipc: targetUtilization: 0.75

Aggressive — compress earlier, keep more headroom:

ipc: targetUtilization: 0.5 preserveLastTurns: 3

Quality-first — wait until you’re really pushing it:

ipc: targetUtilization: 0.9 preserveLastTurns: 10

How it works

  1. Pre hook:

    • Estimate total tokens in the request.
    • Get model’s context window from the pricing table.
    • If tokens / contextSize < targetUtilization: do nothing.
    • Otherwise, walk turns oldest-to-newest. Compress each one a level until under target.
  2. Compression levels:

    • L0 → no change (recent turns).
    • L1 → tool results truncated to 200 chars + length suffix.
    • L2 → first sentence of each turn (extractive summary).
    • L3 → one-line block summary across multiple adjacent turns.
  3. Archive (if archiveToBlob: true):

    • Original messages are written to blob storage keyed by session + index.
    • The rehydrator module (v1.1) can pull them back when the user references something we compressed.

v1 uses extractive compression (first-sentence-per-turn). It’s deterministic, fast, and free. v1.1 swaps in abstractive compression with a small LLM call for L2+ levels — meaningfully better summaries at the cost of one extra round trip per compression event.

Compatibility

ipc plays nicely with:

  • mcp-optimizer — run mcp-optimizer first (drops tools), then ipc measures the actual prompt size.
  • prompt-optimizer (v1.1) — run after ipc so cache markers land on the compressed-stable prefix.
  • rehydrator (v1.1) — depends on ipc for the archive.

Cloud vs Local

ModeArchive backend
CloudR2
Local~/.prxy/blob/{session}/{turn}.json

The archive is keyed by session ID, so even if a user starts a new session the old archive stays available for rehydrator lookups.

Source

packages/modules-core/src/ipc.ts

Last updated on