Pipeline execution

A pipeline is an ordered list of modules. Each request walks the list, then hits the provider, then the response walks back out through the post hooks.

Lifecycle


1. HTTP request arrives at /v1/messages or /v1/chat/completions
2. Auth middleware validates the prxy_xxx key
3. Pipeline loader reads the config and instantiates modules
4. For each module in order:
     a. Run module.pre(ctx) if it exists
     b. If pre returns { continue: false, response }: short-circuit, skip remaining pre hooks
        and the provider call. Jump straight to step 6.
     c. Otherwise continue to next module
5. Provider router picks the upstream and forwards the (possibly mutated) request
6. For each module in order:
     a. Run module.post(ctx) (fire-and-forget — does not block client)
7. Response returned to client (or streamed in real-time for SSE)

Pre hooks

A pre hook can:

Mutate the request. Strip tools, inject memory, restructure messages.
Add metadata. Attach hints (cost estimate, similarity score) for downstream modules.
Short-circuit. Return a complete response (cache hit) to skip the provider call entirely.

The shape:


async pre(ctx: RequestContext): Promise<PreResult> {
  // ctx.request    — canonical request (mutable)
  // ctx.metadata   — Map<string, unknown> shared across modules
  // ctx.storage    — KV / DB / blob adapter
  // ctx.apiKey     — { id, userId, tier, ... }
  // ctx.logger     — pino-style structured logger
 
  return { continue: true };
  // OR
  return { continue: false, response: cachedResponse };
}

Post hooks

A post hook runs after the response is sent to the client. Side effects only — there’s no way to mutate the response from a post hook.


async post(ctx: ResponseContext): Promise<void> {
  // ctx adds: ctx.response, ctx.durationMs
}

Use cases: cache writes, pattern forging, usage tracking, audit logs.

Stream hooks

For SSE streaming responses, modules can intercept each chunk:


async stream(chunk: CanonicalChunk, ctx: ResponseContext): Promise<CanonicalChunk> {
  // return possibly-modified chunk
}

In v1, post hooks are skipped on streaming responses. Pre hooks (including cache short-circuits, which replay as synthetic streams) still apply. Post-hook-on-stream lands in v1.1.

Failure handling

Modules are non-critical by default:

Module throws in pre → log, mark preFailed, continue pipeline.
Module returns { continue: false, response: errorResponse } → halt, return that response. (This is the path cost-guard uses to enforce a 429.)
Provider 5xx → retry per router config, then surface to client as a normal upstream error.
Storage adapter unavailable → modules fall back to no-op (semantic-cache becomes a miss, pattern injection injects nothing).

The principle: a broken module should never deny a user a response. Modules that intentionally halt the pipeline must do so explicitly via the short-circuit return.

Streaming

For SSE streaming responses:

Pre hooks see the full request before any streaming starts.
Stream hooks intercept individual chunks.
A cache short-circuit on a streaming request replays as a synthetic stream — a streaming client expects a stream, so we synthesize message_start → content_block_* → message_stop events from the cached response.

Example trace


[trace 8a3f] POST /v1/messages
  ├─ auth: OK (key prefix prxy_live_a1b2c3d4)
  ├─ pipeline loaded: mcp-optimizer, semantic-cache, patterns
  ├─ mcp-optimizer.pre:    pruned 18 tools → 4 tools (reason: query irrelevance)
  ├─ semantic-cache.pre:   miss (best score 0.71, threshold 0.85)
  ├─ patterns.pre:         injected 2 patterns into system
  ├─ provider.call:        anthropic claude-sonnet-4-6 (1.4s, 8.2k in / 412 out)
  ├─ patterns.post:        no fix detected
  ├─ semantic-cache.post:  cached request embedding + response (TTL 3600)
  └─ usage-tracker.post:   logged 8612 tokens, $0.0312 estimated