Investigate adding latency telemetry to hero_aibroker_server #158

New issue

Open

opened 2026-06-16 11:47:11 +00:00 by nabil_salah · 1 comment

nabil_salah commented

2026-06-16 11:47:11 +00:00

Member

Problem

hero_aibroker_server currently has limited visibility into where time is spent per request. The existing Metrics struct counts requests/errors, and the request log records only total duration. When latency feels high, we can't easily tell whether the bottleneck is routing, key pool waits, serialization, upstream TTFB, full upstream response, tool execution, or response formatting.

Questions to answer

What telemetry stack fits best?
- Prometheus metrics only?
- OpenTelemetry traces only?
- Both? If both, which first?
- Jaeger as trace backend?
What is the operational cost (extra deps, runtime overhead, deployment complexity)?
Does the existing /metrics endpoint suffice as a starting point?

Candidate approaches

Option A: Prometheus histograms

Extend the existing /metrics endpoint with latency histograms. Simplest to deploy, no extra collector, easy dashboards/alerts.

Option B: OpenTelemetry + Jaeger

Instrument the chat path with spans and export to Jaeger. Best for per-request deep dives, but requires running Jaeger or OTel collector.

Option C: Both

Prometheus for aggregate monitoring first, OpenTelemetry for traces as a follow-up.

What we want to measure

Routing/model resolution time
Key pool acquisition wait
Request serialization
Upstream TTFB
Full upstream duration
Tool execution time
Total request duration

## Problem `hero_aibroker_server` currently has limited visibility into where time is spent per request. The existing `Metrics` struct counts requests/errors, and the request log records only total duration. When latency feels high, we can't easily tell whether the bottleneck is routing, key pool waits, serialization, upstream TTFB, full upstream response, tool execution, or response formatting. ## Questions to answer 1. What telemetry stack fits best? - Prometheus metrics only? - OpenTelemetry traces only? - Both? If both, which first? - Jaeger as trace backend? 2. What is the operational cost (extra deps, runtime overhead, deployment complexity)? 3. Does the existing `/metrics` endpoint suffice as a starting point? ## Candidate approaches ### Option A: Prometheus histograms Extend the existing `/metrics` endpoint with latency histograms. Simplest to deploy, no extra collector, easy dashboards/alerts. ### Option B: OpenTelemetry + Jaeger Instrument the chat path with spans and export to Jaeger. Best for per-request deep dives, but requires running Jaeger or OTel collector. ### Option C: Both Prometheus for aggregate monitoring first, OpenTelemetry for traces as a follow-up. ## What we want to measure - Routing/model resolution time - Key pool acquisition wait - Request serialization - Upstream TTFB - Full upstream duration - Tool execution time - Total request duration

nabil_salah self-assigned this

2026-06-21 07:30:58 +00:00

nabil_salah commented

2026-06-22 06:43:54 +00:00

Author

Member

21-06-2026

Current progress on OpenTelemetry/Jaeger tracing for hero_aibroker_server:

Done

Added workspace + crate dependencies for opentelemetry, opentelemetry_sdk, opentelemetry-otlp, and tracing-opentelemetry.
Created crates/hero_aibroker_server/src/telemetry.rs with opt-in OTLP/HTTP trace export, env fallbacks (OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_SERVICE_NAME), and graceful shutdown.
Added CLI flags --telemetry and --telemetry-endpoint to main.rs.
Wired the OpenTelemetry layer into the tracing subscriber alongside the existing HeroTracingLayer.
Instrumented the chat hot path with spans and timing attributes:
- chat.completions (router entry, total duration, status)
- chat.route (model routing)
- keypool.acquire
- upstream.chat / upstream.chat_stream (OpenAI + OpenRouter)
- response.serialize
- broker.tool_loop / broker.tool.execute / broker.streaming_tool_loop / broker.streaming_tool_turn
Used the experimental Tokio runtime-aware batch span processor to support async reqwest HTTP export without panics.
Verified with cargo check, cargo clippy -D warnings, and cargo test -p hero_aibroker_server (119 tests pass).

Manual test (without Jaeger)

Started hero_aibroker_server --fake --telemetry --address 127.0.0.1 --port 8080.
Sent a chat request and received the expected fake response.
Confirmed telemetry is fail-open: broker kept running when no collector was reachable, logged the OTLP export error, and shut down cleanly.

Open / next

Add a docker-compose.telemetry.yml to run Jaeger + broker together for local end-to-end trace viewing.
Test with Jaeger UI at http://localhost:16686 and verify span hierarchy.
Consider spans for embeddings, rerank, TTS, STT, and image operations later.

# 21-06-2026 Current progress on OpenTelemetry/Jaeger tracing for `hero_aibroker_server`: **Done** - Added workspace + crate dependencies for `opentelemetry`, `opentelemetry_sdk`, `opentelemetry-otlp`, and `tracing-opentelemetry`. - Created `crates/hero_aibroker_server/src/telemetry.rs` with opt-in OTLP/HTTP trace export, env fallbacks (`OTEL_EXPORTER_OTLP_ENDPOINT`, `OTEL_SERVICE_NAME`), and graceful shutdown. - Added CLI flags `--telemetry` and `--telemetry-endpoint` to `main.rs`. - Wired the OpenTelemetry layer into the tracing subscriber alongside the existing `HeroTracingLayer`. - Instrumented the chat hot path with spans and timing attributes: - `chat.completions` (router entry, total duration, status) - `chat.route` (model routing) - `keypool.acquire` - `upstream.chat` / `upstream.chat_stream` (OpenAI + OpenRouter) - `response.serialize` - `broker.tool_loop` / `broker.tool.execute` / `broker.streaming_tool_loop` / `broker.streaming_tool_turn` - Used the experimental Tokio runtime-aware batch span processor to support async `reqwest` HTTP export without panics. - Verified with `cargo check`, `cargo clippy -D warnings`, and `cargo test -p hero_aibroker_server` (119 tests pass). **Manual test (without Jaeger)** - Started `hero_aibroker_server --fake --telemetry --address 127.0.0.1 --port 8080`. - Sent a chat request and received the expected fake response. - Confirmed telemetry is fail-open: broker kept running when no collector was reachable, logged the OTLP export error, and shut down cleanly. **Open / next** - Add a `docker-compose.telemetry.yml` to run Jaeger + broker together for local end-to-end trace viewing. - Test with Jaeger UI at http://localhost:16686 and verify span hierarchy. - Consider spans for embeddings, rerank, TTS, STT, and image operations later.