smarter multiple keys #1
Labels
No labels
prio_critical
prio_low
type_bug
type_contact
type_issue
type_lead
type_question
type_story
type_task
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/hero_aibroker#1
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Spec: smarter multiple keys (provider API keys) — issue #1
Objective
Make hero_aibroker tolerant and efficient when multiple upstream API keys are configured for the same provider (OpenAI, OpenRouter, Groq, SambaNova, Alibaba). Today the code creates one
Providerinstance per key and registers them under unique names (openai-0,openai-1, ...). Routing picks the first matching base provider it sees and that one key carries every request — the others sit idle until manually moved. This issue replaces that with a per-provider key pool that load-balances, tracks health, respects 429 cooldowns, and supports priority/weight metadata, while keeping the single-key path 100% backwards compatible.Requirements (mapped to issue body)
Providerper key, group keys under one logicalProvider(e.g. just"openai") backed by aKeyPool. Selection picks the least-loaded healthy key (lowest in-flight + recent-use weighted score) per outbound request.Retry-After(or 60 s default); 5xx / network -> exponential backoff (start 5 s, cap 5 min, doubles per consecutive failure, resets on first success).priority(lower = preferred) andweight(relative share among same-priority keys). Selection considers the lowest non-empty priority bucket only, then weighted-load-balances within it.requests_per_minute, optional). When the bucket is empty the selector skips to another key; if every key is throttled the request waits on the soonest-available bucket (max 30 s) before erroring with 429.Files to modify / create
New files (under
crates/hero_aibroker_lib/src/providers/)keypool.rs—ApiKey,KeyState,KeyHealth,KeyPooltypes. In-memory, lock-protected. Public methods:acquire() -> KeyLease,record_success(&KeyLease),record_failure(&KeyLease, FailureKind, Option<Duration>),snapshot() -> Vec<KeyStatus>,add_key(ApiKey),remove_key(&str),enable_key(&str),disable_key(&str).KeyLeaseis RAII — drop withoutrecord_*is treated as ''in-flight finished, no signal''.keypool_tests.rs(or#[cfg(test)] mod testsinsidekeypool.rs) — unit tests for selection, cooldown decay, weighted load balancing, disabled keys, priority ordering.Modified files
crates/hero_aibroker_lib/src/config/mod.rsVec<String>fields. Two paths:OPENAI_API_KEYS=k1,k2) — wrap each asApiKey { key, priority: 1, weight: 1, rpm: None }(default).OPENAI_API_KEYS_JSON=[{...}]) for richer config without YAML changes.*_api_keys: Vec<String>to also expose*_api_keys_structured: Vec<ApiKeyConfig>. Keep theVec<String>field for backwards compatibility with existing consumers (UI, .env writer, OpenRPC).save_env_filekeeps writing the comma-separated string form; structured metadata lives in a new optional file~/hero/var/hero_aibroker/keys.yml(read on startup, written on key add/update). v1 only writes when structured metadata is non-default — so a single-key-with-default-metadata install never gets the file at all.crates/hero_aibroker_lib/src/providers/types.rsProviderErrorwith a typedRateLimited { retry_after: Option<Duration>, body: String }variant (rename of existingRateLimited(String)). AddUnauthorized(String)variant. Existing call sites that match onOther(_)are unaffected; call sites that classify by status need to be updated (only the chat-service routing path).crates/hero_aibroker_lib/src/providers/openai.rsandopenrouter.rsapi_key: Stringfield withArc<KeyPool>.auth_headerbecomes a per-call helper that takes aKeyLease.chat,chat_stream,create_speech,create_transcription,create_embeddingseach:let lease = self.pool.acquire().await?;(or synctry_acquire-> 429 if empty).lease.key().pool.record_successorpool.record_failure.crates/hero_aibroker_lib/src/providers/mod.rscreate_providersbuilds oneProviderper provider name (no moreopenai-0/openai-1instance suffixing). TheKeyPoolholds N keys.create_openrouter_apissimilarly — one OpenRouter API surface backed by the same pool."openai","openrouter","groq", ...), which simplifies the registry-sidebase_provider_nameplumbing (Registry::base_provider_nameandfind_provider_instancescan stay but become a no-op when there is exactly one instance per base name; no behavior change to existing callers).crates/hero_aibroker_lib/src/registry/mod.rsandcrates/hero_aibroker_lib/src/service/chat.rsRegistry::find_provider_instancesstill returns the single new instance and thatchat::routekeeps working withprovider_name = "openai"instead of"openai-0".crates/hero_aibroker_server/src/api/mod.rs(admin RPC layer)handle_providers_listJSON to include per-keyhealth,in_flight,cooldown_until,failure_count,last_used_at,priority,weight,rpmso the admin UI can show key status.providers.update_keyRPC that mutates priority / weight / rpm / enabled for an existing key (called by the UI). v1 makes this OPTIONAL — list-with-status is enough to land the issue.providers.add_key/providers.remove_keyworking. After mutating keys,rebuild_providersbecomes much cheaper (no provider re-instantiation) — just callpool.add_key/pool.remove_key. Keep the full rebuild path as fallback for now.crates/hero_aibroker_server/openrpc.jsonproviders.list(extra fields). Adding fields is backwards-compatible. Documentproviders.update_keyif implemented.modelsconfig.ymlStep-by-step implementation plan
Step 1 — Introduce
KeyPooland key typescrates/hero_aibroker_lib/src/providers/keypool.rs(new).crates/hero_aibroker_lib/src/providers/mod.rs.pub struct ApiKey { pub key: String, pub priority: u32, pub weight: u32, pub rpm: Option<u32> }(config-side,Clone,Serialize,Deserialize).struct KeyState { key: ApiKey, in_flight: AtomicU32, last_used: AtomicI64 (unix ms), failures_consecutive: AtomicU32, cooldown_until: AtomicI64 (unix ms, 0 = none), enabled: AtomicBool, bucket: Option<governor::RateLimiter<...>> }.pub enum FailureKind { Auth, RateLimit { retry_after: Option<Duration> }, Transient }.pub struct KeyPool { provider_name: String, states: Vec<Arc<KeyState>> }—statesis built once at construction; mutating membership usesArc<RwLock<Vec<...>>>only if hot key-add is required (v1: rebuild on add/remove viacreate_providers).pub struct KeyLease { state: Arc<KeyState>, started: Instant }.Dropdecrementsin_flight.KeyPool::acquire(&self, max_wait: Duration) -> Result<KeyLease, ProviderError>:priority.(in_flight * 1000) / weight; tie-break on oldestlast_used.bucket.check(); on failure, look at the next-best key in the same group, then escalate priority.cooldown_untilor rate-bucket replenishment, bounded bymax_wait. On timeout returnProviderError::RateLimited.in_flight, setlast_used, return lease.KeyPool::record_success(&self, lease: &KeyLease)resetsfailures_consecutiveto 0 andcooldown_untilto 0.KeyPool::record_failure(&self, lease: &KeyLease, kind: FailureKind):Auth->enabled = false. Log a warning naming the provider and a masked key suffix.RateLimit { retry_after }->cooldown_until = now + retry_after.unwrap_or(60s).Transient-> incrementfailures_consecutive;cooldown_until = now + min(5min, 5s * 2^(n-1)).Mutex<()>taken only insideacquirefor the selection critical section.governor::RateLimiter::directisSend + Syncalready.Step 2 — Extend
ProviderErrorcrates/hero_aibroker_lib/src/providers/types.rs.RateLimited(String)withRateLimited { retry_after: Option<Duration>, message: String }.Unauthorized(String).chat::routeincrates/hero_aibroker_lib/src/service/chat.rsif it pattern-matches onRateLimited. (It currently doesn't; onlyModelNotFoundandOtherare matched, so this is additive.)Step 3 — Wire
OpenAIProviderto use aKeyPoolcrates/hero_aibroker_lib/src/providers/openai.rs.api_key: String->pool: Arc<KeyPool>. UpdateOpenAIProvider::newsignature.fn classify_status(status, headers) -> Option<FailureKind>:Some(Auth).Retry-After->Some(RateLimit { retry_after }).Some(Transient).None.chat,chat_stream,create_speech,create_transcription,create_embeddings: acquire a lease, use it for auth, classify the response status, record success/failure on the pool.Transienton lease drop.Step 4 — Wire
OpenRouterProviderto use aKeyPoolcrates/hero_aibroker_lib/src/providers/openrouter.rs.classify_statusto a shared module to avoid duplication.OpenRouterApiimpl methods (text_completion,list_models,list_endpoints,get_credits,get_key_info,get_generation) also acquire/release through the pool.Step 5 —
Configparses structured keys, falls back to Veccrates/hero_aibroker_lib/src/config/mod.rs.pub fn structured_keys_for(&self, provider: &str) -> Vec<ApiKey>which returns the structured form, falling back to wrapping each plain string with default metadata.~/hero/var/hero_aibroker/keys.ymlon startup. If absent, behave as today. If present, takes precedence over the env strings.ApiKeytype).Step 6 —
create_providersbuilds one provider per namecrates/hero_aibroker_lib/src/providers/mod.rs.KeyPoolfromconfig.structured_keys_for(name)(skip if empty), wrap inArc, construct one provider instance with that pool, insert under the canonical name.create_openrouter_apislikewise inserts a single"openrouter"key.providers.get("openai")returns one provider whose pool reports 3 keys.Step 7 — Registry compatibility check
crates/hero_aibroker_lib/src/registry/mod.rs.find_provider_instances("openai", providers)previously returned["openai-0", "openai-1"]; now returns["openai"]. The expansion loop infrom_config_with_catalogstherefore creates one backend per YAML backend (instead of N — once per key). Add a regression test that loading a YAML model withprovider: openaiyields exactly one backend inavailable_backendseven when 3 keys are configured.Step 8 — Admin RPC: surface key health
crates/hero_aibroker_server/src/api/mod.rs, functionhandle_providers_list.let keys = config.get_provider_keys(name);, also fetch the provider from theProviderMapand callprovider.pool_snapshot()(new accessor). For each key, returnmasked_key,priority,weight,rpm,enabled,in_flight,failure_count,cooldown_remaining_secs,last_used_secs_ago.providers.listincrates/hero_aibroker_server/openrpc.jsonto document the new fields (additive — no breaking change).Step 9 — End-to-end tests
crates/hero_aibroker_lib/tests/keypool_integration.rs(new).tokio::net::TcpListenermock to verify:Retry-After: 2-> next request picks a different key, throttled key reuses after 2 s.max_wait, thenProviderError::RateLimited.Step 10 — Documentation
README.md— short section on multi-key setup, env vars, optionalkeys.ymlschema, and how to inspect health viaproviders.list.Steps 3 and 4 share the
classify_statushelper but otherwise touch different files; once Steps 1 and 2 are in, 3 and 4 can run in parallel. Step 8 only needs Step 6's wiring + Step 1's snapshot accessor and is otherwise independent. Step 10 is documentation only.Acceptance criteria
Retry-After: 30puts the responding key in cooldown for ~30 s; in the meantime other keys serve traffic.providers.listwithenabled: false.rpmrequests per minute per key (enforced client-side); when every key is throttled, callers wait up tomax_waitthen receiveProviderError::RateLimited.providers.listRPC returns per-key health snapshot.cargo test -p hero_aibroker_libandcargo test -p hero_aibroker_serverare green.modelsconfig.yml(provider keys remain env-driven).Notes / gotchas
parking_lot::Mutex<()>only during selection.governorrate limiters areSend + Sync. Avoid holding the mutex acrossawait.create_providerscurrently emitsopenai-0,openai-1, etc. when multiple keys exist. The registry has logic (base_provider_name) to fold these back. Collapsing to one provider name simplifies that path and removes a class of subtle bugs (e.g. the chat-fallback order["openrouter","groq","openai",...]previously matchedopenai-0only by coincidence of HashMap insertion order). Verify the streamingis_openrouter_backend(&route.backend.provider)check incrates/hero_aibroker_server/src/api/chat.rsstill works — it checksstarts_with("openrouter"), which keeps working."30") or HTTP-date. Parse both; on parse failure, default to 60 s. OpenAI sometimes uses milliseconds inx-ratelimit-reset-requests— v1 ignores those finer-grained headers and relies onRetry-After.Transientsince the upstream's wire error rarely distinguishes auth/rate/transient. This is a pragmatic v1 trade.OPENAI_API_KEY,OPENAI_API_KEYS, comma-separated). The newOPENAI_API_KEYS_JSON(and friends) is additive. The existingVec<String>fields onConfigare retained; the structured form is layered on top. The optionalkeys.ymlis opt-in.hero_aibroker_app.x-ratelimit-*) — onlyRetry-Afteris honored.Test Results
Branch: development_new
cargo test --workspace --exclude hero_aibroker_appPer-crate
hero_aibroker_liblib: 69 passedhero_aibroker_libintegration (tests/openrouter_compliance.rs): 2 passedhero_aibroker_server: 0 passed (no tests defined; binary compiles cleanly)Other crates exercised by the workspace run:
hero_aibrokerintegration (1),hero_aibroker_sdkdoctest (1), and sevenhero_aibroker_servicesintegration suites (exa,forge,ping,scraperapi,scrapfly,serpapi,serper— 1 each, 7 total). All MCP and UI crates have empty test suites and compile cleanly.Coverage of new functionality
Pre-existing unrelated failure:
hero_aibroker_appdoes not build becausehero_archipelagos_core::use_focus_pollis missing — outside the scope of this issue.Implementation Summary
Branch:
development_new. Tests: 80 passed, 0 failed (cargo test --workspace --exclude hero_aibroker_app). Thehero_aibroker_appcrate has a pre-existing build error unrelated to this work.What changed
The previous code created one
Providerinstance per API key, registered with suffixed names (openai-0,openai-1, ...). Only the first instance saw traffic; the rest sat idle. This change collapses each provider name to a single instance backed by aKeyPoolthat holds all configured keys for that provider, and adds health tracking, priority/weight metadata, and per-key client-side rate limiting.Files
New file:
crates/hero_aibroker_lib/src/providers/keypool.rs—KeyPool,KeyLease,ApiKey,KeyState,KeyStatus,FailureKind,KeyPoolError, plus the sharedclassify_statushelper andparse_retry_after. 13 unit tests.Modified:
crates/hero_aibroker_lib/Cargo.toml—herolib_coreworkspace dep added.crates/hero_aibroker_lib/src/providers/mod.rs—create_providers_with_poolsreturns the newKeyPoolMapalongsideProviderMap.create_providersis now a thin wrapper. Per-key fanout removed.crates/hero_aibroker_lib/src/providers/openai.rs—pool: Arc<KeyPool>replacesapi_key: String. Every method (chat,chat_stream,create_speech,create_transcription,create_embeddings) acquires a lease, classifies the response, records success/failure on the pool. Addedpool_snapshot()accessor.crates/hero_aibroker_lib/src/providers/openrouter.rs— same treatment. AllOpenRouterApimethods (text_completion,list_models,list_endpoints,get_credits,get_key_info,get_generation) wired to the pool.crates/hero_aibroker_lib/src/providers/types.rs—ProviderError::RateLimitedis now a struct variant{ retry_after: Option<Duration>, message: String }. AddedProviderError::from_pool_error(KeyPoolError).crates/hero_aibroker_lib/src/config/mod.rs— newConfig::structured_keys_for(provider) -> Vec<ApiKey>. New optional*_API_KEYS_JSONenv vars (e.g.OPENAI_API_KEYS_JSON) accept[{"key":"...","priority":1,"weight":2,"rpm":120}, ...]for full per-key metadata. Plain comma-separated*_API_KEYSkeeps working (defaults to priority=1, weight=1, no rpm).crates/hero_aibroker_lib/src/registry/mod.rs— regression test (yaml_model_with_multi_key_openai_has_single_backend) ensuring N keys yield 1 backend.crates/hero_aibroker_lib/tests/openrouter_compliance.rs— fixture updated to construct viaKeyPool.crates/hero_aibroker_server/src/api/mod.rs—AppStatecarrieskey_pools: Arc<RwLock<KeyPoolMap>>.handle_providers_listnow returnskey_health: [KeyStatus]per provider (additive).rebuild_providersupdates pool map.crates/hero_aibroker_server/openrpc.jsonandcrates/hero_aibroker_ui/static/openrpc.json—providers.listschema updated to documentkey_health.Behaviour
Selection (in
KeyPool::acquire):priority(lower = preferred); take the lowest non-empty group.(in_flight * 1000) / max(weight, 1); lowest wins, tie-break on oldestlast_used_at.rpmtoken bucket,bucket.check(); on deny, escalate to next-best in group, then next priority.max_wait(30s default).Health classification (
classify_status):Auth→ key disabled (manual re-enable).RateLimit { retry_after }→ cooldown untilRetry-Afterheader (seconds or RFC 2822 date), default 60 s.Transient→ exponential backoff,min(5 min, 5 s × 2^(n-1)), resets on first success.Streaming: lease moves into the spawned task; success recorded once on
Event::Openor[DONE]; failures classified fromEventSource::Error.Backwards compatibility
Existing single-key setups behave identically.
OPENAI_API_KEY,OPENAI_API_KEYS,*_API_KEY,*_API_KEYSall still work. TheVec<String>fields onConfigare retained.modelsconfig.ymlis unchanged. Theproviders.listRPC addskey_healthbut keeps every existing field.Out of scope for v1
keys.ymlYAML file (env-only for v1; the JSON env vars cover the same surface).key_healthdata (backend surface is ready).Test coverage