feat: universal Headroom prompt-compression middleware (#151) #152

Open
rawdaGastan wants to merge 1 commit from feat/headroom-compression into main
Member

Closes #151.

What

Universal Headroom prompt-compression middleware wired into Router::chat_completions. Runs on every backend (openai, openrouter, groq, sambanova, kimi, alibaba, mother brokers). Default-off via Config::compression_enabled; force-enabled via new --compression CLI flag.

How it hooks in

In Router::chat_completions, right after attach_attribution_headers(...) and before the streaming-vs-blocking dispatch:

crate::service::compression::maybe_compress_chat_request(
    &mut request,
    self.compression_enabled,
    &compression_request_id,
);

Single chokepoint covers both response paths. Streaming responses are untouched (compression only mutates the request body).

Fail-open

Every error path — serialize, deserialize, panic in the upstream call (catch_unwind), JSON shape mismatch — emits tracing::warn! at target = "aibroker.compression" and leaves the request untouched.

rusqlite downgrade

Headroom's headroom-core pins rusqlite 0.32; broker was on 0.39. Cargo's links = "sqlite3" constraint forbids two crates linking the same native library in one binary. Downgraded broker to 0.32 — usage limited to middleware/apikey.rs and middleware/request_log.rs with only stable API (Connection, params!, ToSql, Error::SqliteFailure). All 13 middleware::request_log::tests pass post-downgrade.

Tests

  • Unit: 113 / 113 pass on cargo test -p hero_aibroker_server --bin hero_aibroker_server. Includes 3 new compression-related tests + all SQLite-using tests.
  • Integration: openrpc 3/3, domains 15/15 pass. fake_server + e2e failures pre-date this branch (PATH_SOCKET handling, hard-coded 127.0.0.1:0 targets).
  • Live: against hero_aibroker_server --fake --compression on a 29 KB log-heavy prompt:
    • Live-zone tokens: 8,381 → 67 (99.2% reduction)
    • Body bytes: 29,718 → 9,863
    • Off-path verified clean (no compression log when flag is off)
    • Streaming SSE intact under stream: true with compression on
  • See issue comment for full live-test transcripts.

Files changed

File Change
Cargo.toml (workspace) Added headroom-proxy and headroom-core git deps pinned at 01fdedc6; downgraded rusqlite from 0.390.32
crates/hero_aibroker_server/Cargo.toml Pulled both Headroom deps into the server crate
crates/hero_aibroker_server/src/config/mod.rs New compression_enabled: bool field + default test
crates/hero_aibroker_server/src/service/compression.rs New — fail-open maybe_compress_chat_request helper with catch_unwind, structured tracing, 2 unit tests
crates/hero_aibroker_server/src/service/mod.rs pub mod compression;
crates/hero_aibroker_server/src/service/router.rs compression_enabled field on Router, with_compression(bool) builder, middleware call in chat_completions
crates/hero_aibroker_server/src/api_openrpc/mod.rs
crates/hero_aibroker_server/src/api_openrpc/admin/common.rs
Wired config.compression_enabledRouter::with_compression at both construction sites
crates/hero_aibroker_server/src/main.rs New --compression CLI flag forcing config.compression_enabled = true at startup
README.md New "Prompt compression (universal, opt-in)" section

Deviations from the spec (recorded in the issue comment)

  1. No new tests/compression.rs — existing tests cover default-off; on-path covered by live test (network-dependent in CI).
  2. --compression CLI flag added (not in original spec) — necessary for live-test toggle.
  3. rusqlite downgrade — forced by libsqlite3-sys links constraint.

Out of scope (separate tickets if/when needed)

  • Native Anthropic /v1/messages compression (compress_anthropic_request). Claude/Anthropic models routed via OpenRouter — the broker's default — are already covered by this PR because the broker is OpenAI-shape end to end.
  • Admin RPC to flip compression_enabled at runtime without restart.
  • Per-model compression budget tuning.
  • Admin UI surfacing of tokens_saved.
Closes #151. ## What Universal Headroom prompt-compression middleware wired into `Router::chat_completions`. Runs on every backend (openai, openrouter, groq, sambanova, kimi, alibaba, mother brokers). Default-off via `Config::compression_enabled`; force-enabled via new `--compression` CLI flag. ## How it hooks in In `Router::chat_completions`, right after `attach_attribution_headers(...)` and before the streaming-vs-blocking dispatch: ```rust crate::service::compression::maybe_compress_chat_request( &mut request, self.compression_enabled, &compression_request_id, ); ``` Single chokepoint covers both response paths. Streaming responses are untouched (compression only mutates the request body). ## Fail-open Every error path — serialize, deserialize, panic in the upstream call (`catch_unwind`), JSON shape mismatch — emits `tracing::warn!` at `target = "aibroker.compression"` and leaves the request untouched. ## rusqlite downgrade Headroom's `headroom-core` pins `rusqlite 0.32`; broker was on `0.39`. Cargo's `links = "sqlite3"` constraint forbids two crates linking the same native library in one binary. Downgraded broker to `0.32` — usage limited to `middleware/apikey.rs` and `middleware/request_log.rs` with only stable API (`Connection`, `params!`, `ToSql`, `Error::SqliteFailure`). All 13 `middleware::request_log::tests` pass post-downgrade. ## Tests - **Unit**: 113 / 113 pass on `cargo test -p hero_aibroker_server --bin hero_aibroker_server`. Includes 3 new compression-related tests + all SQLite-using tests. - **Integration**: `openrpc` 3/3, `domains` 15/15 pass. `fake_server` + `e2e` failures pre-date this branch (PATH_SOCKET handling, hard-coded `127.0.0.1:0` targets). - **Live**: against `hero_aibroker_server --fake --compression` on a 29 KB log-heavy prompt: - Live-zone tokens: **8,381 → 67** (99.2% reduction) - Body bytes: 29,718 → 9,863 - Off-path verified clean (no compression log when flag is off) - Streaming SSE intact under `stream: true` with compression on - See [issue comment](https://forge.ourworld.tf/lhumina_code/hero_aibroker/issues/151#issuecomment-41348) for full live-test transcripts. ## Files changed | File | Change | |---|---| | `Cargo.toml` (workspace) | Added `headroom-proxy` and `headroom-core` git deps pinned at `01fdedc6`; downgraded `rusqlite` from `0.39` → `0.32` | | `crates/hero_aibroker_server/Cargo.toml` | Pulled both Headroom deps into the server crate | | `crates/hero_aibroker_server/src/config/mod.rs` | New `compression_enabled: bool` field + default test | | `crates/hero_aibroker_server/src/service/compression.rs` | **New** — fail-open `maybe_compress_chat_request` helper with `catch_unwind`, structured tracing, 2 unit tests | | `crates/hero_aibroker_server/src/service/mod.rs` | `pub mod compression;` | | `crates/hero_aibroker_server/src/service/router.rs` | `compression_enabled` field on `Router`, `with_compression(bool)` builder, middleware call in `chat_completions` | | `crates/hero_aibroker_server/src/api_openrpc/mod.rs`<br>`crates/hero_aibroker_server/src/api_openrpc/admin/common.rs` | Wired `config.compression_enabled` → `Router::with_compression` at both construction sites | | `crates/hero_aibroker_server/src/main.rs` | New `--compression` CLI flag forcing `config.compression_enabled = true` at startup | | `README.md` | New "Prompt compression (universal, opt-in)" section | ## Deviations from the spec (recorded in the issue comment) 1. No new `tests/compression.rs` — existing tests cover default-off; on-path covered by live test (network-dependent in CI). 2. `--compression` CLI flag added (not in original spec) — necessary for live-test toggle. 3. rusqlite downgrade — forced by `libsqlite3-sys` `links` constraint. ## Out of scope (separate tickets if/when needed) - Native Anthropic `/v1/messages` compression (`compress_anthropic_request`). Claude/Anthropic models routed via OpenRouter — the broker's default — are already covered by this PR because the broker is OpenAI-shape end to end. - Admin RPC to flip `compression_enabled` at runtime without restart. - Per-model compression budget tuning. - Admin UI surfacing of `tokens_saved`.
This pull request can be merged automatically.
You are not authorized to merge this pull request.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin feat/headroom-compression:feat/headroom-compression
git switch feat/headroom-compression

Merge

Merge the changes and update on Forgejo.

Warning: The "Autodetect manual merge" setting is not enabled for this repository, you will have to mark this pull request as manually merged afterwards.

git switch main
git merge --no-ff feat/headroom-compression
git switch feat/headroom-compression
git rebase main
git switch main
git merge --ff-only feat/headroom-compression
git switch feat/headroom-compression
git rebase main
git switch main
git merge --no-ff feat/headroom-compression
git switch main
git merge --squash feat/headroom-compression
git switch main
git merge --ff-only feat/headroom-compression
git switch main
git merge feat/headroom-compression
git push origin main
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_aibroker!152
No description provided.