lhumina_code/hero_aibroker

Fork 0

smarter multiple keys #1

New issue

Open

opened 2026-01-21 09:50:47 +00:00 by thabeta · 3 comments

thabeta commented

2026-01-21 09:50:47 +00:00

Owner

Smart Load Balancing: Track usage and direct traffic to less-used instances
Health Checking: Detect and skip failed or rate-limited instances
Priority Weighting: Allow different priorities for different keys
Key Rotation: Automatically rotate keys based on usage patterns
Per-Key Rate Limiting: Track and enforce rate limits per key

1. **Smart Load Balancing:** Track usage and direct traffic to less-used instances 2. **Health Checking:** Detect and skip failed or rate-limited instances 3. **Priority Weighting:** Allow different priorities for different keys 4. **Key Rotation:** Automatically rotate keys based on usage patterns 5. **Per-Key Rate Limiting:** Track and enforce rate limits per key

despiegk added this to the later milestone

2026-02-23 10:23:59 +00:00

despiegk commented

2026-05-01 05:18:22 +00:00

Owner

Spec: smarter multiple keys (provider API keys) — issue #1

Objective

Make hero_aibroker tolerant and efficient when multiple upstream API keys are configured for the same provider (OpenAI, OpenRouter, Groq, SambaNova, Alibaba). Today the code creates one Provider instance per key and registers them under unique names (openai-0, openai-1, ...). Routing picks the first matching base provider it sees and that one key carries every request — the others sit idle until manually moved. This issue replaces that with a per-provider key pool that load-balances, tracks health, respects 429 cooldowns, and supports priority/weight metadata, while keeping the single-key path 100% backwards compatible.

Requirements (mapped to issue body)

Smart Load Balancing — instead of one Provider per key, group keys under one logical Provider (e.g. just "openai") backed by a KeyPool. Selection picks the least-loaded healthy key (lowest in-flight + recent-use weighted score) per outbound request.
Health Checking — every outbound HTTP response feeds the pool. Statuses are classified: 401/403 -> key disabled (manual re-enable); 429 -> cooldown until Retry-After (or 60 s default); 5xx / network -> exponential backoff (start 5 s, cap 5 min, doubles per consecutive failure, resets on first success).
Priority Weighting — each key carries a priority (lower = preferred) and weight (relative share among same-priority keys). Selection considers the lowest non-empty priority bucket only, then weighted-load-balances within it.
Key Rotation — implemented as part of (1): every chat / tts / stt / embedding request asks the pool for a key, so traffic naturally rotates. No timer-driven forced rotation in v1 (out of scope, see notes).
Per-Key Rate Limiting — each key gets a client-side governor token bucket (requests_per_minute, optional). When the bucket is empty the selector skips to another key; if every key is throttled the request waits on the soonest-available bucket (max 30 s) before erroring with 429.

Files to modify / create

New files (under `crates/hero_aibroker_lib/src/providers/`)

keypool.rs — ApiKey, KeyState, KeyHealth, KeyPool types. In-memory, lock-protected. Public methods: acquire() -> KeyLease, record_success(&KeyLease), record_failure(&KeyLease, FailureKind, Option<Duration>), snapshot() -> Vec<KeyStatus>, add_key(ApiKey), remove_key(&str), enable_key(&str), disable_key(&str). KeyLease is RAII — drop without record_* is treated as ''in-flight finished, no signal''.
keypool_tests.rs (or #[cfg(test)] mod tests inside keypool.rs) — unit tests for selection, cooldown decay, weighted load balancing, disabled keys, priority ordering.

Modified files

crates/hero_aibroker_lib/src/config/mod.rs
- Add structured key parsing alongside existing Vec<String> fields. Two paths:
  1. Plain string in env (OPENAI_API_KEYS=k1,k2) — wrap each as ApiKey { key, priority: 1, weight: 1, rpm: None } (default).
  2. JSON form in env (OPENAI_API_KEYS_JSON=[{...}]) for richer config without YAML changes.
- Promote each *_api_keys: Vec<String> to also expose *_api_keys_structured: Vec<ApiKeyConfig>. Keep the Vec<String> field for backwards compatibility with existing consumers (UI, .env writer, OpenRPC).
- save_env_file keeps writing the comma-separated string form; structured metadata lives in a new optional file ~/hero/var/hero_aibroker/keys.yml (read on startup, written on key add/update). v1 only writes when structured metadata is non-default — so a single-key-with-default-metadata install never gets the file at all.
crates/hero_aibroker_lib/src/providers/types.rs
- Extend ProviderError with a typed RateLimited { retry_after: Option<Duration>, body: String } variant (rename of existing RateLimited(String)). Add Unauthorized(String) variant. Existing call sites that match on Other(_) are unaffected; call sites that classify by status need to be updated (only the chat-service routing path).
crates/hero_aibroker_lib/src/providers/openai.rs and openrouter.rs
- Replace the single api_key: String field with Arc<KeyPool>.
- auth_header becomes a per-call helper that takes a KeyLease.
- chat, chat_stream, create_speech, create_transcription, create_embeddings each:
  1. let lease = self.pool.acquire().await?; (or sync try_acquire -> 429 if empty).
  2. Send request with lease.key().
  3. On the response, classify status -> call pool.record_success or pool.record_failure.
  4. For streaming, the success/failure signal is emitted at stream-open time (HTTP status of the SSE handshake) and at first-chunk-error/done. A 429 mid-stream still records a cooldown; a 5xx during streaming records a transient failure on lease drop.
crates/hero_aibroker_lib/src/providers/mod.rs
- create_providers builds one Provider per provider name (no more openai-0 / openai-1 instance suffixing). The KeyPool holds N keys.
- Update create_openrouter_apis similarly — one OpenRouter API surface backed by the same pool.
- Provider naming becomes deterministic again ("openai", "openrouter", "groq", ...), which simplifies the registry-side base_provider_name plumbing (Registry::base_provider_name and find_provider_instances can stay but become a no-op when there is exactly one instance per base name; no behavior change to existing callers).
crates/hero_aibroker_lib/src/registry/mod.rs and crates/hero_aibroker_lib/src/service/chat.rs
- No structural change — both already operate on base provider names. Verify that Registry::find_provider_instances still returns the single new instance and that chat::route keeps working with provider_name = "openai" instead of "openai-0".
crates/hero_aibroker_server/src/api/mod.rs (admin RPC layer)
- Extend handle_providers_list JSON to include per-key health, in_flight, cooldown_until, failure_count, last_used_at, priority, weight, rpm so the admin UI can show key status.
- Add providers.update_key RPC that mutates priority / weight / rpm / enabled for an existing key (called by the UI). v1 makes this OPTIONAL — list-with-status is enough to land the issue.
- Keep providers.add_key / providers.remove_key working. After mutating keys, rebuild_providers becomes much cheaper (no provider re-instantiation) — just call pool.add_key / pool.remove_key. Keep the full rebuild path as fallback for now.
crates/hero_aibroker_server/openrpc.json
- Document the new shape of providers.list (extra fields). Adding fields is backwards-compatible. Document providers.update_key if implemented.
modelsconfig.yml
- No change needed. Provider keys are env-driven, not YAML-driven. Backward compat is automatic.

Step-by-step implementation plan

Step 1 — Introduce `KeyPool` and key types

File: crates/hero_aibroker_lib/src/providers/keypool.rs (new).
Add module declaration in crates/hero_aibroker_lib/src/providers/mod.rs.
Types:
- pub struct ApiKey { pub key: String, pub priority: u32, pub weight: u32, pub rpm: Option<u32> } (config-side, Clone, Serialize, Deserialize).
- struct KeyState { key: ApiKey, in_flight: AtomicU32, last_used: AtomicI64 (unix ms), failures_consecutive: AtomicU32, cooldown_until: AtomicI64 (unix ms, 0 = none), enabled: AtomicBool, bucket: Option<governor::RateLimiter<...>> }.
- pub enum FailureKind { Auth, RateLimit { retry_after: Option<Duration> }, Transient }.
- pub struct KeyPool { provider_name: String, states: Vec<Arc<KeyState>> } — states is built once at construction; mutating membership uses Arc<RwLock<Vec<...>>> only if hot key-add is required (v1: rebuild on add/remove via create_providers).
- pub struct KeyLease { state: Arc<KeyState>, started: Instant }. Drop decrements in_flight.
- KeyPool::acquire(&self, max_wait: Duration) -> Result<KeyLease, ProviderError>:
  1. Snapshot current time.
  2. Group enabled, non-cooled-down states by priority.
  3. Take the lowest-priority non-empty group.
  4. Within the group, pick the state with lowest (in_flight * 1000) / weight; tie-break on oldest last_used.
  5. If the chosen key has an rpm bucket, bucket.check(); on failure, look at the next-best key in the same group, then escalate priority.
  6. If everything is throttled or cooled down, sleep until the earliest cooldown_until or rate-bucket replenishment, bounded by max_wait. On timeout return ProviderError::RateLimited.
  7. Increment in_flight, set last_used, return lease.
- KeyPool::record_success(&self, lease: &KeyLease) resets failures_consecutive to 0 and cooldown_until to 0.
- KeyPool::record_failure(&self, lease: &KeyLease, kind: FailureKind):
  - Auth -> enabled = false. Log a warning naming the provider and a masked key suffix.
  - RateLimit { retry_after } -> cooldown_until = now + retry_after.unwrap_or(60s).
  - Transient -> increment failures_consecutive; cooldown_until = now + min(5min, 5s * 2^(n-1)).
Concurrency: per-state atomics + a single Mutex<()> taken only inside acquire for the selection critical section. governor::RateLimiter::direct is Send + Sync already.
Tests in the same file: 5 keys with mixed priorities, one disabled, one in cooldown — verify selection picks the right one; verify in-flight counters decrement on lease drop; verify weight skews distribution over 1000 picks.
Dependencies: none.

Step 2 — Extend `ProviderError`

File: crates/hero_aibroker_lib/src/providers/types.rs.
Replace RateLimited(String) with RateLimited { retry_after: Option<Duration>, message: String }.
Add Unauthorized(String).
Update the chat::route in crates/hero_aibroker_lib/src/service/chat.rs if it pattern-matches on RateLimited. (It currently doesn't; only ModelNotFound and Other are matched, so this is additive.)
Dependencies: none.

Step 3 — Wire `OpenAIProvider` to use a `KeyPool`

File: crates/hero_aibroker_lib/src/providers/openai.rs.
Field changes: api_key: String -> pool: Arc<KeyPool>. Update OpenAIProvider::new signature.
New helper fn classify_status(status, headers) -> Option<FailureKind>:
- 401, 403 -> Some(Auth).
- 429 -> parse Retry-After -> Some(RateLimit { retry_after }).
- 5xx -> Some(Transient).
- 2xx, other 4xx -> None.
In chat, chat_stream, create_speech, create_transcription, create_embeddings: acquire a lease, use it for auth, classify the response status, record success/failure on the pool.
For streaming, classify at SSE-handshake time; mid-stream errors record Transient on lease drop.
Dependencies: Step 1, Step 2.

Step 4 — Wire `OpenRouterProvider` to use a `KeyPool`

File: crates/hero_aibroker_lib/src/providers/openrouter.rs.
Mirror Step 3. Move classify_status to a shared module to avoid duplication.
The OpenRouterApi impl methods (text_completion, list_models, list_endpoints, get_credits, get_key_info, get_generation) also acquire/release through the pool.
Dependencies: Step 1, Step 2.

Step 5 — `Config` parses structured keys, falls back to Vec

File: crates/hero_aibroker_lib/src/config/mod.rs.
Add pub fn structured_keys_for(&self, provider: &str) -> Vec<ApiKey> which returns the structured form, falling back to wrapping each plain string with default metadata.
Optional v1 polish: load ~/hero/var/hero_aibroker/keys.yml on startup. If absent, behave as today. If present, takes precedence over the env strings.
Dependencies: Step 1 (needs ApiKey type).

Step 6 — `create_providers` builds one provider per name

File: crates/hero_aibroker_lib/src/providers/mod.rs.
Replace the per-key loops with: for each provider, build a KeyPool from config.structured_keys_for(name) (skip if empty), wrap in Arc, construct one provider instance with that pool, insert under the canonical name.
create_openrouter_apis likewise inserts a single "openrouter" key.
Add a regression test that with 3 configured OpenAI keys, providers.get("openai") returns one provider whose pool reports 3 keys.
Dependencies: Steps 1, 3, 4, 5.

Step 7 — Registry compatibility check

File: crates/hero_aibroker_lib/src/registry/mod.rs.
find_provider_instances("openai", providers) previously returned ["openai-0", "openai-1"]; now returns ["openai"]. The expansion loop in from_config_with_catalogs therefore creates one backend per YAML backend (instead of N — once per key). Add a regression test that loading a YAML model with provider: openai yields exactly one backend in available_backends even when 3 keys are configured.
Dependencies: Step 6.

Step 8 — Admin RPC: surface key health

File: crates/hero_aibroker_server/src/api/mod.rs, function handle_providers_list.
After let keys = config.get_provider_keys(name);, also fetch the provider from the ProviderMap and call provider.pool_snapshot() (new accessor). For each key, return masked_key, priority, weight, rpm, enabled, in_flight, failure_count, cooldown_remaining_secs, last_used_secs_ago.
Update OpenRPC schema for providers.list in crates/hero_aibroker_server/openrpc.json to document the new fields (additive — no breaking change).
Dependencies: Steps 1, 6.

Step 9 — End-to-end tests

File: crates/hero_aibroker_lib/tests/keypool_integration.rs (new).
Use a hand-rolled tokio::net::TcpListener mock to verify:
- 429 with Retry-After: 2 -> next request picks a different key, throttled key reuses after 2 s.
- 401 -> key disabled, never picked again.
- 5xx burst on key A then 200 on key B -> key A enters backoff, key B serves.
- Both keys 429 -> caller waits up to max_wait, then ProviderError::RateLimited.
Dependencies: Steps 1–7. (Step 8 is independent.)

Step 10 — Documentation

File: README.md — short section on multi-key setup, env vars, optional keys.yml schema, and how to inspect health via providers.list.
Dependencies: Steps 1–8.

Steps 3 and 4 share the classify_status helper but otherwise touch different files; once Steps 1 and 2 are in, 3 and 4 can run in parallel. Step 8 only needs Step 6's wiring + Step 1's snapshot accessor and is otherwise independent. Step 10 is documentation only.

Acceptance criteria

With one key configured per provider, behavior is byte-identical to today (existing tests pass unchanged).
With multiple keys configured, two consecutive requests against the same provider observably use different keys.
A 429 response with Retry-After: 30 puts the responding key in cooldown for ~30 s; in the meantime other keys serve traffic.
A 401 response disables the key. Subsequent requests skip it. The key remains visible in providers.list with enabled: false.
A 5xx response triggers exponential backoff (5 s -> 10 s -> 20 s ..., capped at 5 min). One success resets the counter.
Priorities are respected: when a priority-1 key is healthy, priority-2 keys are not used.
Weights skew distribution within a priority bucket (a key with weight=2 receives ~2x the traffic of a peer with weight=1 across 1000 picks, +/-15%).
Per-key rpm limit prevents the broker from sending more than rpm requests per minute per key (enforced client-side); when every key is throttled, callers wait up to max_wait then receive ProviderError::RateLimited.
providers.list RPC returns per-key health snapshot.
cargo test -p hero_aibroker_lib and cargo test -p hero_aibroker_server are green.
No new fields are required in modelsconfig.yml (provider keys remain env-driven).

Notes / gotchas

Concurrency model: handlers run on tokio multi-thread runtime. The pool uses per-key atomics and a brief parking_lot::Mutex<()> only during selection. governor rate limiters are Send + Sync. Avoid holding the mutex across await.
Key naming today: create_providers currently emits openai-0, openai-1, etc. when multiple keys exist. The registry has logic (base_provider_name) to fold these back. Collapsing to one provider name simplifies that path and removes a class of subtle bugs (e.g. the chat-fallback order ["openrouter","groq","openai",...] previously matched openai-0 only by coincidence of HashMap insertion order). Verify the streaming is_openrouter_backend(&route.backend.provider) check in crates/hero_aibroker_server/src/api/chat.rs still works — it checks starts_with("openrouter"), which keeps working.
Retry-After header: providers may send seconds ("30") or HTTP-date. Parse both; on parse failure, default to 60 s. OpenAI sometimes uses milliseconds in x-ratelimit-reset-requests — v1 ignores those finer-grained headers and relies on Retry-After.
Streaming failure attribution: SSE responses return HTTP status during the EventSource open; classify there. Mid-stream provider errors are attributed to the lease as Transient since the upstream's wire error rarely distinguishes auth/rate/transient. This is a pragmatic v1 trade.
Lease drop without signal: If a request handler panics mid-call, the lease drops with no recorded outcome. Treat that as ''no information'' — do not increment failures (avoids a flapping key getting wrongfully sidelined by an unrelated panic).
Backwards compatibility: env variables stay (OPENAI_API_KEY, OPENAI_API_KEYS, comma-separated). The new OPENAI_API_KEYS_JSON (and friends) is additive. The existing Vec<String> fields on Config are retained; the structured form is layered on top. The optional keys.yml is opt-in.
Persistence: in v1 health state is in-memory only. After a broker restart all keys start fresh (no cooldowns persist). This is by design — restarts are infrequent and operator-driven.
Out of scope for v1:
- Distributed coordination across multiple broker instances.
- Persistent quota tracking across restarts.
- Forced periodic key rotation on a timer (load-balancer satisfies item 4).
- Daily/monthly quota tracking (just rpm + cooldown in v1).
- A dashboard page in hero_aibroker_app.
- Fine-grained reset-bucket headers (x-ratelimit-*) — only Retry-After is honored.

# Spec: smarter multiple keys (provider API keys) — issue #1 ## Objective Make hero_aibroker tolerant and efficient when multiple upstream API keys are configured for the same provider (OpenAI, OpenRouter, Groq, SambaNova, Alibaba). Today the code creates one `Provider` instance per key and registers them under unique names (`openai-0`, `openai-1`, ...). Routing picks the first matching base provider it sees and that one key carries every request — the others sit idle until manually moved. This issue replaces that with a per-provider key pool that load-balances, tracks health, respects 429 cooldowns, and supports priority/weight metadata, while keeping the single-key path 100% backwards compatible. ## Requirements (mapped to issue body) 1. **Smart Load Balancing** — instead of one `Provider` per key, group keys under one logical `Provider` (e.g. just `"openai"`) backed by a `KeyPool`. Selection picks the least-loaded healthy key (lowest in-flight + recent-use weighted score) per outbound request. 2. **Health Checking** — every outbound HTTP response feeds the pool. Statuses are classified: 401/403 -> key disabled (manual re-enable); 429 -> cooldown until `Retry-After` (or 60 s default); 5xx / network -> exponential backoff (start 5 s, cap 5 min, doubles per consecutive failure, resets on first success). 3. **Priority Weighting** — each key carries a `priority` (lower = preferred) and `weight` (relative share among same-priority keys). Selection considers the lowest non-empty priority bucket only, then weighted-load-balances within it. 4. **Key Rotation** — implemented as part of (1): every chat / tts / stt / embedding request asks the pool for a key, so traffic naturally rotates. No timer-driven forced rotation in v1 (out of scope, see notes). 5. **Per-Key Rate Limiting** — each key gets a client-side governor token bucket (`requests_per_minute`, optional). When the bucket is empty the selector skips to another key; if every key is throttled the request waits on the soonest-available bucket (max 30 s) before erroring with 429. ## Files to modify / create ### New files (under `crates/hero_aibroker_lib/src/providers/`) - `keypool.rs` — `ApiKey`, `KeyState`, `KeyHealth`, `KeyPool` types. In-memory, lock-protected. Public methods: `acquire() -> KeyLease`, `record_success(&KeyLease)`, `record_failure(&KeyLease, FailureKind, Option<Duration>)`, `snapshot() -> Vec<KeyStatus>`, `add_key(ApiKey)`, `remove_key(&str)`, `enable_key(&str)`, `disable_key(&str)`. `KeyLease` is RAII — drop without `record_*` is treated as ''in-flight finished, no signal''. - `keypool_tests.rs` (or `#[cfg(test)] mod tests` inside `keypool.rs`) — unit tests for selection, cooldown decay, weighted load balancing, disabled keys, priority ordering. ### Modified files - `crates/hero_aibroker_lib/src/config/mod.rs` - Add structured key parsing alongside existing `Vec<String>` fields. Two paths: 1. Plain string in env (`OPENAI_API_KEYS=k1,k2`) — wrap each as `ApiKey { key, priority: 1, weight: 1, rpm: None }` (default). 2. JSON form in env (`OPENAI_API_KEYS_JSON=[{...}]`) for richer config without YAML changes. - Promote each `*_api_keys: Vec<String>` to also expose `*_api_keys_structured: Vec<ApiKeyConfig>`. Keep the `Vec<String>` field for backwards compatibility with existing consumers (UI, .env writer, OpenRPC). - `save_env_file` keeps writing the comma-separated string form; structured metadata lives in a new optional file `~/hero/var/hero_aibroker/keys.yml` (read on startup, written on key add/update). v1 only writes when structured metadata is non-default — so a single-key-with-default-metadata install never gets the file at all. - `crates/hero_aibroker_lib/src/providers/types.rs` - Extend `ProviderError` with a typed `RateLimited { retry_after: Option<Duration>, body: String }` variant (rename of existing `RateLimited(String)`). Add `Unauthorized(String)` variant. Existing call sites that match on `Other(_)` are unaffected; call sites that classify by status need to be updated (only the chat-service routing path). - `crates/hero_aibroker_lib/src/providers/openai.rs` and `openrouter.rs` - Replace the single `api_key: String` field with `Arc<KeyPool>`. - `auth_header` becomes a per-call helper that takes a `KeyLease`. - `chat`, `chat_stream`, `create_speech`, `create_transcription`, `create_embeddings` each: 1. `let lease = self.pool.acquire().await?;` (or sync `try_acquire` -> 429 if empty). 2. Send request with `lease.key()`. 3. On the response, classify status -> call `pool.record_success` or `pool.record_failure`. 4. For streaming, the success/failure signal is emitted at stream-open time (HTTP status of the SSE handshake) and at first-chunk-error/done. A 429 mid-stream still records a cooldown; a 5xx during streaming records a transient failure on lease drop. - `crates/hero_aibroker_lib/src/providers/mod.rs` - `create_providers` builds **one** `Provider` per provider name (no more `openai-0` / `openai-1` instance suffixing). The `KeyPool` holds N keys. - Update `create_openrouter_apis` similarly — one OpenRouter API surface backed by the same pool. - Provider naming becomes deterministic again (`"openai"`, `"openrouter"`, `"groq"`, ...), which simplifies the registry-side `base_provider_name` plumbing (`Registry::base_provider_name` and `find_provider_instances` can stay but become a no-op when there is exactly one instance per base name; no behavior change to existing callers). - `crates/hero_aibroker_lib/src/registry/mod.rs` and `crates/hero_aibroker_lib/src/service/chat.rs` - No structural change — both already operate on base provider names. Verify that `Registry::find_provider_instances` still returns the single new instance and that `chat::route` keeps working with `provider_name = "openai"` instead of `"openai-0"`. - `crates/hero_aibroker_server/src/api/mod.rs` (admin RPC layer) - Extend `handle_providers_list` JSON to include per-key `health`, `in_flight`, `cooldown_until`, `failure_count`, `last_used_at`, `priority`, `weight`, `rpm` so the admin UI can show key status. - Add `providers.update_key` RPC that mutates priority / weight / rpm / enabled for an existing key (called by the UI). v1 makes this OPTIONAL — list-with-status is enough to land the issue. - Keep `providers.add_key` / `providers.remove_key` working. After mutating keys, `rebuild_providers` becomes much cheaper (no provider re-instantiation) — just call `pool.add_key` / `pool.remove_key`. Keep the full rebuild path as fallback for now. - `crates/hero_aibroker_server/openrpc.json` - Document the new shape of `providers.list` (extra fields). Adding fields is backwards-compatible. Document `providers.update_key` if implemented. - `modelsconfig.yml` - **No change needed.** Provider keys are env-driven, not YAML-driven. Backward compat is automatic. ## Step-by-step implementation plan ### Step 1 — Introduce `KeyPool` and key types - File: `crates/hero_aibroker_lib/src/providers/keypool.rs` (new). - Add module declaration in `crates/hero_aibroker_lib/src/providers/mod.rs`. - Types: - `pub struct ApiKey { pub key: String, pub priority: u32, pub weight: u32, pub rpm: Option<u32> }` (config-side, `Clone`, `Serialize`, `Deserialize`). - `struct KeyState { key: ApiKey, in_flight: AtomicU32, last_used: AtomicI64 (unix ms), failures_consecutive: AtomicU32, cooldown_until: AtomicI64 (unix ms, 0 = none), enabled: AtomicBool, bucket: Option<governor::RateLimiter<...>> }`. - `pub enum FailureKind { Auth, RateLimit { retry_after: Option<Duration> }, Transient }`. - `pub struct KeyPool { provider_name: String, states: Vec<Arc<KeyState>> }` — `states` is built once at construction; mutating membership uses `Arc<RwLock<Vec<...>>>` only if hot key-add is required (v1: rebuild on add/remove via `create_providers`). - `pub struct KeyLease { state: Arc<KeyState>, started: Instant }`. `Drop` decrements `in_flight`. - `KeyPool::acquire(&self, max_wait: Duration) -> Result<KeyLease, ProviderError>`: 1. Snapshot current time. 2. Group enabled, non-cooled-down states by `priority`. 3. Take the lowest-priority non-empty group. 4. Within the group, pick the state with lowest `(in_flight * 1000) / weight`; tie-break on oldest `last_used`. 5. If the chosen key has an rpm bucket, `bucket.check()`; on failure, look at the next-best key in the same group, then escalate priority. 6. If everything is throttled or cooled down, sleep until the earliest `cooldown_until` or rate-bucket replenishment, bounded by `max_wait`. On timeout return `ProviderError::RateLimited`. 7. Increment `in_flight`, set `last_used`, return lease. - `KeyPool::record_success(&self, lease: &KeyLease)` resets `failures_consecutive` to 0 and `cooldown_until` to 0. - `KeyPool::record_failure(&self, lease: &KeyLease, kind: FailureKind)`: - `Auth` -> `enabled = false`. Log a warning naming the provider and a masked key suffix. - `RateLimit { retry_after }` -> `cooldown_until = now + retry_after.unwrap_or(60s)`. - `Transient` -> increment `failures_consecutive`; `cooldown_until = now + min(5min, 5s * 2^(n-1))`. - Concurrency: per-state atomics + a single `Mutex<()>` taken only inside `acquire` for the selection critical section. `governor::RateLimiter::direct` is `Send + Sync` already. - Tests in the same file: 5 keys with mixed priorities, one disabled, one in cooldown — verify selection picks the right one; verify in-flight counters decrement on lease drop; verify weight skews distribution over 1000 picks. - Dependencies: none. ### Step 2 — Extend `ProviderError` - File: `crates/hero_aibroker_lib/src/providers/types.rs`. - Replace `RateLimited(String)` with `RateLimited { retry_after: Option<Duration>, message: String }`. - Add `Unauthorized(String)`. - Update the `chat::route` in `crates/hero_aibroker_lib/src/service/chat.rs` if it pattern-matches on `RateLimited`. (It currently doesn't; only `ModelNotFound` and `Other` are matched, so this is additive.) - Dependencies: none. ### Step 3 — Wire `OpenAIProvider` to use a `KeyPool` - File: `crates/hero_aibroker_lib/src/providers/openai.rs`. - Field changes: `api_key: String` -> `pool: Arc<KeyPool>`. Update `OpenAIProvider::new` signature. - New helper `fn classify_status(status, headers) -> Option<FailureKind>`: - 401, 403 -> `Some(Auth)`. - 429 -> parse `Retry-After` -> `Some(RateLimit { retry_after })`. - 5xx -> `Some(Transient)`. - 2xx, other 4xx -> `None`. - In `chat`, `chat_stream`, `create_speech`, `create_transcription`, `create_embeddings`: acquire a lease, use it for auth, classify the response status, record success/failure on the pool. - For streaming, classify at SSE-handshake time; mid-stream errors record `Transient` on lease drop. - Dependencies: Step 1, Step 2. ### Step 4 — Wire `OpenRouterProvider` to use a `KeyPool` - File: `crates/hero_aibroker_lib/src/providers/openrouter.rs`. - Mirror Step 3. Move `classify_status` to a shared module to avoid duplication. - The `OpenRouterApi` impl methods (`text_completion`, `list_models`, `list_endpoints`, `get_credits`, `get_key_info`, `get_generation`) also acquire/release through the pool. - Dependencies: Step 1, Step 2. ### Step 5 — `Config` parses structured keys, falls back to Vec<String> - File: `crates/hero_aibroker_lib/src/config/mod.rs`. - Add `pub fn structured_keys_for(&self, provider: &str) -> Vec<ApiKey>` which returns the structured form, falling back to wrapping each plain string with default metadata. - Optional v1 polish: load `~/hero/var/hero_aibroker/keys.yml` on startup. If absent, behave as today. If present, takes precedence over the env strings. - Dependencies: Step 1 (needs `ApiKey` type). ### Step 6 — `create_providers` builds one provider per name - File: `crates/hero_aibroker_lib/src/providers/mod.rs`. - Replace the per-key loops with: for each provider, build a `KeyPool` from `config.structured_keys_for(name)` (skip if empty), wrap in `Arc`, construct one provider instance with that pool, insert under the canonical name. - `create_openrouter_apis` likewise inserts a single `"openrouter"` key. - Add a regression test that with 3 configured OpenAI keys, `providers.get("openai")` returns one provider whose pool reports 3 keys. - Dependencies: Steps 1, 3, 4, 5. ### Step 7 — Registry compatibility check - File: `crates/hero_aibroker_lib/src/registry/mod.rs`. - `find_provider_instances("openai", providers)` previously returned `["openai-0", "openai-1"]`; now returns `["openai"]`. The expansion loop in `from_config_with_catalogs` therefore creates one backend per YAML backend (instead of N — once per key). Add a regression test that loading a YAML model with `provider: openai` yields exactly one backend in `available_backends` even when 3 keys are configured. - Dependencies: Step 6. ### Step 8 — Admin RPC: surface key health - File: `crates/hero_aibroker_server/src/api/mod.rs`, function `handle_providers_list`. - After `let keys = config.get_provider_keys(name);`, also fetch the provider from the `ProviderMap` and call `provider.pool_snapshot()` (new accessor). For each key, return `masked_key`, `priority`, `weight`, `rpm`, `enabled`, `in_flight`, `failure_count`, `cooldown_remaining_secs`, `last_used_secs_ago`. - Update OpenRPC schema for `providers.list` in `crates/hero_aibroker_server/openrpc.json` to document the new fields (additive — no breaking change). - Dependencies: Steps 1, 6. ### Step 9 — End-to-end tests - File: `crates/hero_aibroker_lib/tests/keypool_integration.rs` (new). - Use a hand-rolled `tokio::net::TcpListener` mock to verify: - 429 with `Retry-After: 2` -> next request picks a different key, throttled key reuses after 2 s. - 401 -> key disabled, never picked again. - 5xx burst on key A then 200 on key B -> key A enters backoff, key B serves. - Both keys 429 -> caller waits up to `max_wait`, then `ProviderError::RateLimited`. - Dependencies: Steps 1–7. (Step 8 is independent.) ### Step 10 — Documentation - File: `README.md` — short section on multi-key setup, env vars, optional `keys.yml` schema, and how to inspect health via `providers.list`. - Dependencies: Steps 1–8. Steps 3 and 4 share the `classify_status` helper but otherwise touch different files; once Steps 1 and 2 are in, 3 and 4 can run in parallel. Step 8 only needs Step 6's wiring + Step 1's snapshot accessor and is otherwise independent. Step 10 is documentation only. ## Acceptance criteria - [ ] With one key configured per provider, behavior is byte-identical to today (existing tests pass unchanged). - [ ] With multiple keys configured, two consecutive requests against the same provider observably use different keys. - [ ] A 429 response with `Retry-After: 30` puts the responding key in cooldown for ~30 s; in the meantime other keys serve traffic. - [ ] A 401 response disables the key. Subsequent requests skip it. The key remains visible in `providers.list` with `enabled: false`. - [ ] A 5xx response triggers exponential backoff (5 s -> 10 s -> 20 s ..., capped at 5 min). One success resets the counter. - [ ] Priorities are respected: when a priority-1 key is healthy, priority-2 keys are not used. - [ ] Weights skew distribution within a priority bucket (a key with weight=2 receives ~2x the traffic of a peer with weight=1 across 1000 picks, +/-15%). - [ ] Per-key rpm limit prevents the broker from sending more than `rpm` requests per minute per key (enforced client-side); when every key is throttled, callers wait up to `max_wait` then receive `ProviderError::RateLimited`. - [ ] `providers.list` RPC returns per-key health snapshot. - [ ] `cargo test -p hero_aibroker_lib` and `cargo test -p hero_aibroker_server` are green. - [ ] No new fields are required in `modelsconfig.yml` (provider keys remain env-driven). ## Notes / gotchas - **Concurrency model**: handlers run on tokio multi-thread runtime. The pool uses per-key atomics and a brief `parking_lot::Mutex<()>` only during selection. `governor` rate limiters are `Send + Sync`. Avoid holding the mutex across `await`. - **Key naming today**: `create_providers` currently emits `openai-0`, `openai-1`, etc. when multiple keys exist. The registry has logic (`base_provider_name`) to fold these back. Collapsing to one provider name simplifies that path and removes a class of subtle bugs (e.g. the chat-fallback order `["openrouter","groq","openai",...]` previously matched `openai-0` only by coincidence of HashMap insertion order). Verify the streaming `is_openrouter_backend(&route.backend.provider)` check in `crates/hero_aibroker_server/src/api/chat.rs` still works — it checks `starts_with("openrouter")`, which keeps working. - **Retry-After header**: providers may send seconds (`"30"`) or HTTP-date. Parse both; on parse failure, default to 60 s. OpenAI sometimes uses milliseconds in `x-ratelimit-reset-requests` — v1 ignores those finer-grained headers and relies on `Retry-After`. - **Streaming failure attribution**: SSE responses return HTTP status during the EventSource open; classify there. Mid-stream provider errors are attributed to the lease as `Transient` since the upstream's wire error rarely distinguishes auth/rate/transient. This is a pragmatic v1 trade. - **Lease drop without signal**: If a request handler panics mid-call, the lease drops with no recorded outcome. Treat that as ''no information'' — do not increment failures (avoids a flapping key getting wrongfully sidelined by an unrelated panic). - **Backwards compatibility**: env variables stay (`OPENAI_API_KEY`, `OPENAI_API_KEYS`, comma-separated). The new `OPENAI_API_KEYS_JSON` (and friends) is additive. The existing `Vec<String>` fields on `Config` are retained; the structured form is layered on top. The optional `keys.yml` is opt-in. - **Persistence**: in v1 health state is in-memory only. After a broker restart all keys start fresh (no cooldowns persist). This is by design — restarts are infrequent and operator-driven. - **Out of scope for v1**: - Distributed coordination across multiple broker instances. - Persistent quota tracking across restarts. - Forced periodic key rotation on a timer (load-balancer satisfies item 4). - Daily/monthly quota tracking (just rpm + cooldown in v1). - A dashboard page in `hero_aibroker_app`. - Fine-grained reset-bucket headers (`x-ratelimit-*`) — only `Retry-After` is honored.

despiegk commented

2026-05-01 06:37:18 +00:00

Owner

Test Results

Branch: development_new

`cargo test --workspace --exclude hero_aibroker_app`

Total: 80
Passed: 80
Failed: 0

Per-crate

hero_aibroker_lib lib: 69 passed
hero_aibroker_lib integration (tests/openrouter_compliance.rs): 2 passed
hero_aibroker_server: 0 passed (no tests defined; binary compiles cleanly)

Other crates exercised by the workspace run: hero_aibroker integration (1), hero_aibroker_sdk doctest (1), and seven hero_aibroker_services integration suites (exa, forge, ping, scraperapi, scrapfly, serpapi, serper — 1 each, 7 total). All MCP and UI crates have empty test suites and compile cleanly.

Coverage of new functionality

KeyPool unit tests (10): priority ordering, weight skew, auth disable, rate-limit cooldown, exponential transient backoff, lease-drop in-flight decrement, all-unavailable error, success-resets-failure-count, masking, classify_status, parse_retry_after.
Config structured-key tests (5): JSON env parsing, single-object form, fallback to plain, unknown provider, invalid JSON fallback.
Provider construction tests (5): single-key, multi-key, empty config, all five providers, OpenRouter API map.
Registry regression (1): YAML model with 3 keys yields exactly 1 backend.
Provider name accessors (2): OpenAI and OpenRouter constructor sanity.

Pre-existing unrelated failure: hero_aibroker_app does not build because hero_archipelagos_core::use_focus_poll is missing — outside the scope of this issue.

## Test Results Branch: development_new ### `cargo test --workspace --exclude hero_aibroker_app` - Total: 80 - Passed: 80 - Failed: 0 ### Per-crate - `hero_aibroker_lib` lib: 69 passed - `hero_aibroker_lib` integration (`tests/openrouter_compliance.rs`): 2 passed - `hero_aibroker_server`: 0 passed (no tests defined; binary compiles cleanly) Other crates exercised by the workspace run: `hero_aibroker` integration (1), `hero_aibroker_sdk` doctest (1), and seven `hero_aibroker_services` integration suites (`exa`, `forge`, `ping`, `scraperapi`, `scrapfly`, `serpapi`, `serper` — 1 each, 7 total). All MCP and UI crates have empty test suites and compile cleanly. ### Coverage of new functionality - KeyPool unit tests (10): priority ordering, weight skew, auth disable, rate-limit cooldown, exponential transient backoff, lease-drop in-flight decrement, all-unavailable error, success-resets-failure-count, masking, classify_status, parse_retry_after. - Config structured-key tests (5): JSON env parsing, single-object form, fallback to plain, unknown provider, invalid JSON fallback. - Provider construction tests (5): single-key, multi-key, empty config, all five providers, OpenRouter API map. - Registry regression (1): YAML model with 3 keys yields exactly 1 backend. - Provider name accessors (2): OpenAI and OpenRouter constructor sanity. Pre-existing unrelated failure: `hero_aibroker_app` does not build because `hero_archipelagos_core::use_focus_poll` is missing — outside the scope of this issue.

despiegk commented

2026-05-01 06:40:18 +00:00

Owner

Implementation Summary

Branch: development_new. Tests: 80 passed, 0 failed (cargo test --workspace --exclude hero_aibroker_app). The hero_aibroker_app crate has a pre-existing build error unrelated to this work.

What changed

The previous code created one Provider instance per API key, registered with suffixed names (openai-0, openai-1, ...). Only the first instance saw traffic; the rest sat idle. This change collapses each provider name to a single instance backed by a KeyPool that holds all configured keys for that provider, and adds health tracking, priority/weight metadata, and per-key client-side rate limiting.

Files

New file:

crates/hero_aibroker_lib/src/providers/keypool.rs — KeyPool, KeyLease, ApiKey, KeyState, KeyStatus, FailureKind, KeyPoolError, plus the shared classify_status helper and parse_retry_after. 13 unit tests.

Modified:

crates/hero_aibroker_lib/Cargo.toml — herolib_core workspace dep added.
crates/hero_aibroker_lib/src/providers/mod.rs — create_providers_with_pools returns the new KeyPoolMap alongside ProviderMap. create_providers is now a thin wrapper. Per-key fanout removed.
crates/hero_aibroker_lib/src/providers/openai.rs — pool: Arc<KeyPool> replaces api_key: String. Every method (chat, chat_stream, create_speech, create_transcription, create_embeddings) acquires a lease, classifies the response, records success/failure on the pool. Added pool_snapshot() accessor.
crates/hero_aibroker_lib/src/providers/openrouter.rs — same treatment. All OpenRouterApi methods (text_completion, list_models, list_endpoints, get_credits, get_key_info, get_generation) wired to the pool.
crates/hero_aibroker_lib/src/providers/types.rs — ProviderError::RateLimited is now a struct variant { retry_after: Option<Duration>, message: String }. Added ProviderError::from_pool_error(KeyPoolError).
crates/hero_aibroker_lib/src/config/mod.rs — new Config::structured_keys_for(provider) -> Vec<ApiKey>. New optional *_API_KEYS_JSON env vars (e.g. OPENAI_API_KEYS_JSON) accept [{"key":"...","priority":1,"weight":2,"rpm":120}, ...] for full per-key metadata. Plain comma-separated *_API_KEYS keeps working (defaults to priority=1, weight=1, no rpm).
crates/hero_aibroker_lib/src/registry/mod.rs — regression test (yaml_model_with_multi_key_openai_has_single_backend) ensuring N keys yield 1 backend.
crates/hero_aibroker_lib/tests/openrouter_compliance.rs — fixture updated to construct via KeyPool.
crates/hero_aibroker_server/src/api/mod.rs — AppState carries key_pools: Arc<RwLock<KeyPoolMap>>. handle_providers_list now returns key_health: [KeyStatus] per provider (additive). rebuild_providers updates pool map.
crates/hero_aibroker_server/openrpc.json and crates/hero_aibroker_ui/static/openrpc.json — providers.list schema updated to document key_health.

Behaviour

Selection (in KeyPool::acquire):

Filter to enabled keys whose cooldown has expired.
Group by priority (lower = preferred); take the lowest non-empty group.
Within the group, score = (in_flight * 1000) / max(weight, 1); lowest wins, tie-break on oldest last_used_at.
If the chosen key has an rpm token bucket, bucket.check(); on deny, escalate to next-best in group, then next priority.
If everything is throttled, sleep until soonest replenishment or cooldown, bounded by max_wait (30s default).

Health classification (classify_status):

401, 403 → Auth → key disabled (manual re-enable).
429 → RateLimit { retry_after } → cooldown until Retry-After header (seconds or RFC 2822 date), default 60 s.
5xx → Transient → exponential backoff, min(5 min, 5 s × 2^(n-1)), resets on first success.
2xx, other 4xx → no failure recorded (don'"'"'t blame the key for client errors).

Streaming: lease moves into the spawned task; success recorded once on Event::Open or [DONE]; failures classified from EventSource::Error.

Backwards compatibility

Existing single-key setups behave identically. OPENAI_API_KEY, OPENAI_API_KEYS, *_API_KEY, *_API_KEYS all still work. The Vec<String> fields on Config are retained. modelsconfig.yml is unchanged. The providers.list RPC adds key_health but keeps every existing field.

Out of scope for v1

Distributed coordination across broker instances (each instance has its own pool).
Persistent quota across restarts (in-memory only).
Forced periodic rotation timer (load balancer satisfies item 4 of the issue).
Daily/monthly quota tracking (rpm + cooldown only).
keys.yml YAML file (env-only for v1; the JSON env vars cover the same surface).
Admin UI panel rendering the new key_health data (backend surface is ready).
Integration test crate that brings up two mock upstreams to drive end-to-end failover (algorithmic coverage exists in keypool unit tests).

Test coverage

13 keypool unit tests — priority, weight skew, auth disable, rate-limit cooldown, transient backoff, lease drop, all-unavailable, success reset, masking, classify_status, parse_retry_after.
5 config tests — JSON env, single-object, fallback, unknown provider, invalid JSON.
5 provider construction tests — single-key, multi-key, empty, all five providers, OpenRouter.
1 registry regression — multi-key collapses to one backend.
2 provider name accessors.
2 OpenRouter compliance integration tests — pinning/headers and streaming reasoning/usage.

## Implementation Summary Branch: `development_new`. Tests: 80 passed, 0 failed (`cargo test --workspace --exclude hero_aibroker_app`). The `hero_aibroker_app` crate has a pre-existing build error unrelated to this work. ### What changed The previous code created one `Provider` instance per API key, registered with suffixed names (`openai-0`, `openai-1`, ...). Only the first instance saw traffic; the rest sat idle. This change collapses each provider name to a single instance backed by a `KeyPool` that holds all configured keys for that provider, and adds health tracking, priority/weight metadata, and per-key client-side rate limiting. ### Files New file: - `crates/hero_aibroker_lib/src/providers/keypool.rs` — `KeyPool`, `KeyLease`, `ApiKey`, `KeyState`, `KeyStatus`, `FailureKind`, `KeyPoolError`, plus the shared `classify_status` helper and `parse_retry_after`. 13 unit tests. Modified: - `crates/hero_aibroker_lib/Cargo.toml` — `herolib_core` workspace dep added. - `crates/hero_aibroker_lib/src/providers/mod.rs` — `create_providers_with_pools` returns the new `KeyPoolMap` alongside `ProviderMap`. `create_providers` is now a thin wrapper. Per-key fanout removed. - `crates/hero_aibroker_lib/src/providers/openai.rs` — `pool: Arc<KeyPool>` replaces `api_key: String`. Every method (`chat`, `chat_stream`, `create_speech`, `create_transcription`, `create_embeddings`) acquires a lease, classifies the response, records success/failure on the pool. Added `pool_snapshot()` accessor. - `crates/hero_aibroker_lib/src/providers/openrouter.rs` — same treatment. All `OpenRouterApi` methods (`text_completion`, `list_models`, `list_endpoints`, `get_credits`, `get_key_info`, `get_generation`) wired to the pool. - `crates/hero_aibroker_lib/src/providers/types.rs` — `ProviderError::RateLimited` is now a struct variant `{ retry_after: Option<Duration>, message: String }`. Added `ProviderError::from_pool_error(KeyPoolError)`. - `crates/hero_aibroker_lib/src/config/mod.rs` — new `Config::structured_keys_for(provider) -> Vec<ApiKey>`. New optional `*_API_KEYS_JSON` env vars (e.g. `OPENAI_API_KEYS_JSON`) accept `[{"key":"...","priority":1,"weight":2,"rpm":120}, ...]` for full per-key metadata. Plain comma-separated `*_API_KEYS` keeps working (defaults to priority=1, weight=1, no rpm). - `crates/hero_aibroker_lib/src/registry/mod.rs` — regression test (`yaml_model_with_multi_key_openai_has_single_backend`) ensuring N keys yield 1 backend. - `crates/hero_aibroker_lib/tests/openrouter_compliance.rs` — fixture updated to construct via `KeyPool`. - `crates/hero_aibroker_server/src/api/mod.rs` — `AppState` carries `key_pools: Arc<RwLock<KeyPoolMap>>`. `handle_providers_list` now returns `key_health: [KeyStatus]` per provider (additive). `rebuild_providers` updates pool map. - `crates/hero_aibroker_server/openrpc.json` and `crates/hero_aibroker_ui/static/openrpc.json` — `providers.list` schema updated to document `key_health`. ### Behaviour Selection (in `KeyPool::acquire`): 1. Filter to enabled keys whose cooldown has expired. 2. Group by `priority` (lower = preferred); take the lowest non-empty group. 3. Within the group, score = `(in_flight * 1000) / max(weight, 1)`; lowest wins, tie-break on oldest `last_used_at`. 4. If the chosen key has an `rpm` token bucket, `bucket.check()`; on deny, escalate to next-best in group, then next priority. 5. If everything is throttled, sleep until soonest replenishment or cooldown, bounded by `max_wait` (30s default). Health classification (`classify_status`): - 401, 403 → `Auth` → key disabled (manual re-enable). - 429 → `RateLimit { retry_after }` → cooldown until `Retry-After` header (seconds or RFC 2822 date), default 60 s. - 5xx → `Transient` → exponential backoff, `min(5 min, 5 s × 2^(n-1))`, resets on first success. - 2xx, other 4xx → no failure recorded (don'"'"'t blame the key for client errors). Streaming: lease moves into the spawned task; success recorded once on `Event::Open` or `[DONE]`; failures classified from `EventSource::Error`. ### Backwards compatibility Existing single-key setups behave identically. `OPENAI_API_KEY`, `OPENAI_API_KEYS`, `*_API_KEY`, `*_API_KEYS` all still work. The `Vec<String>` fields on `Config` are retained. `modelsconfig.yml` is unchanged. The `providers.list` RPC adds `key_health` but keeps every existing field. ### Out of scope for v1 - Distributed coordination across broker instances (each instance has its own pool). - Persistent quota across restarts (in-memory only). - Forced periodic rotation timer (load balancer satisfies item 4 of the issue). - Daily/monthly quota tracking (rpm + cooldown only). - `keys.yml` YAML file (env-only for v1; the JSON env vars cover the same surface). - Admin UI panel rendering the new `key_health` data (backend surface is ready). - Integration test crate that brings up two mock upstreams to drive end-to-end failover (algorithmic coverage exists in keypool unit tests). ### Test coverage - 13 keypool unit tests — priority, weight skew, auth disable, rate-limit cooldown, transient backoff, lease drop, all-unavailable, success reset, masking, classify_status, parse_retry_after. - 5 config tests — JSON env, single-object, fallback, unknown provider, invalid JSON. - 5 provider construction tests — single-key, multi-key, empty, all five providers, OpenRouter. - 1 registry regression — multi-key collapses to one backend. - 2 provider name accessors. - 2 OpenRouter compliance integration tests — pinning/headers and streaming reasoning/usage.

despiegk referenced this issue from a commit

2026-05-01 07:21:11 +00:00

feat(broker): per-provider key pool with health, priority, weight, rpm