[nu-demo] hero_embedder_server panics with blocking reqwest inside tokio async context; namespace.create rejects Q1 in daemon mode #145

New issue

Closed

opened 2026-04-24 00:31:56 +00:00 by mik-tf · 1 comment

mik-tf commented

2026-04-24 00:31:56 +00:00

Owner

Symptom

Fresh deploy: hero_embedder_server would not start or serve any request.

Startup panic:

thread 'main' panicked at tokio/src/runtime/blocking/shutdown.rs:51:21:
Cannot drop a runtime in a context where blocking is not allowed. This happens when a runtime is dropped from within an asynchronous context.

After patching startup, every embed / rerank RPC call would hang forever (connection accepted, no response, no error in log).
After patching per-request calls, namespace.create {name, quality: 1} consistently returned:

-32001 Embedder for quality 1 (Fast) not available

even though hero_embedderd had all 4 models (Q1/Q2/Q3/Q4) loaded and namespace.list showed an existing Q1 namespace.

Net effect: hero_books.search.query always returned count: 0 warning: 'embedder service not running', AI Assistant could not use search_hero_docs, LLM hallucinated citations.

Root cause

In hero_embedder/crates/hero_embedder_lib/src/embedderd_client.rs: the EmbedderdClient holds a reqwest::blocking::Client and uses blocking .send()?.error_for_status()?.json()? chains inside embed() and rerank(). These are called from axum async handlers in hero_embedder_server via state.rs. reqwest::blocking::Client spawns its own tokio runtime internally; dropping that inner runtime while the outer (main) tokio runtime is live → tokio's guardrail panic.

In hero_embedder/crates/hero_embedder_lib/src/api/namespace.rs:62-71: namespace.create rejects the request if state.embedders doesn't contain the requested quality's EmbedderModel. In daemon-delegation mode (state.embedderd_client.is_some()), the server holds NO local embedders — all models live in hero_embedderd. The check is looking in an empty local map and rejecting every quality.

Demo workaround (applied 2026-04-23 on `development_mik_nu_demo` branch)

Four local patches on hero_embedder:

hero_embedder_server/src/main.rs: wrap discover_embedderd() call in tokio::task::block_in_place(|| …) — legal since the main is #[tokio::main] (multi-thread by default).
hero_embedder_lib/src/embedderd_client.rs::embed(): wrap the self.http.post(…).send()?.error_for_status()?.json()? chain in tokio::task::block_in_place(|| -> Result<_, reqwest::Error> { … })?.
Same wrap in EmbedderdClient::rerank().
hero_embedder_lib/src/api/namespace.rs:62-71: change the guard to if state.embedderd_client.is_none() && !state.embedders.contains_key(&model) { error } — i.e. trust the daemon.

Result after rebuild + restart: embed returns 384-dim vectors, namespace.create succeeds, hero_books indexed docs_hero (163 docs / 7 pages), search.query returns real hits, AI Assistant quotes verbatim from hero_os_guide overview.

Commit on development_mik_nu_demo branch: [nu-demo] wrap blocking reqwest calls in block_in_place (3 files, ~22 ins / 17 del). Not pushed — stays local until reviewers opt in.

Proper upstream fix

The block_in_place wraps work but are fragile (multi-thread-runtime-only, and they block a whole worker thread). The clean answer is an async client:

Convert EmbedderdClient to hold reqwest::Client (async) instead of reqwest::blocking::Client. Remove the builder's blocking import.
Make embed() and rerank() async fn, with .send().await?.error_for_status()?.json().await?.
Keep is_reachable() sync by replacing its self.http.get(…).send() with a plain std::net::TcpStream::connect_timeout probe — no runtime involved; safe to call at startup from sync code.
Update callers in state.rs and api.rs (~5 sites total) to .await the now-async methods. Any fn that calls .embed(…) / .rerank(…) is already async — adding .await is one-token changes.
Remove the tokio::task::block_in_place wraps from main.rs + embedderd_client.rs.
For namespace.create: either keep our guard change (trust daemon when present) OR ask the daemon via its /info endpoint which qualities it has loaded and populate state.embedders_available_qualities: HashSet<u8> at startup. The guard change is simpler; the daemon check is more robust.

Why this is worth doing

Without this, hero_books has no vector search, the AI Assistant has no retrieval, and the entire “semantic grounding via MCP/OpenRPC” story falls apart on any non-trivial deploy. Our local patches unblock the demo but the async refactor is the clean answer — probably a 1-2 hour PR against the hero_embedder repo.

Tracking

home#130 — hero_osis_ai domain missing (upstream-side, separate)
home#140 — WASM compression (different but same "ergonomic upstream gap" theme)
home#144 — office content per library (needs this to be grounded too)

Filed 2026-04-23 (late evening) nu-shell demo bring-up. Signed-off-by: mik-tf

## Symptom Fresh deploy: `hero_embedder_server` would not start or serve any request. 1. **Startup panic**: ``` thread 'main' panicked at tokio/src/runtime/blocking/shutdown.rs:51:21: Cannot drop a runtime in a context where blocking is not allowed. This happens when a runtime is dropped from within an asynchronous context. ``` 2. **After patching startup**, every `embed` / `rerank` RPC call would hang forever (connection accepted, no response, no error in log). 3. **After patching per-request calls**, `namespace.create {name, quality: 1}` consistently returned: ``` -32001 Embedder for quality 1 (Fast) not available ``` even though `hero_embedderd` had all 4 models (Q1/Q2/Q3/Q4) loaded and `namespace.list` showed an existing Q1 namespace. Net effect: `hero_books.search.query` always returned `count: 0 warning: 'embedder service not running'`, AI Assistant could not use `search_hero_docs`, LLM hallucinated citations. ## Root cause **In `hero_embedder/crates/hero_embedder_lib/src/embedderd_client.rs`:** the `EmbedderdClient` holds a `reqwest::blocking::Client` and uses blocking `.send()?.error_for_status()?.json()?` chains inside `embed()` and `rerank()`. These are called from `axum` async handlers in `hero_embedder_server` via `state.rs`. `reqwest::blocking::Client` spawns its own tokio runtime internally; dropping that inner runtime while the outer (main) tokio runtime is live → tokio's guardrail panic. **In `hero_embedder/crates/hero_embedder_lib/src/api/namespace.rs:62-71`:** `namespace.create` rejects the request if `state.embedders` doesn't contain the requested quality's `EmbedderModel`. In daemon-delegation mode (`state.embedderd_client.is_some()`), the server holds NO local embedders — all models live in `hero_embedderd`. The check is looking in an empty local map and rejecting every quality. ## Demo workaround (applied 2026-04-23 on `development_mik_nu_demo` branch) Four local patches on `hero_embedder`: 1. `hero_embedder_server/src/main.rs`: wrap `discover_embedderd()` call in `tokio::task::block_in_place(|| …)` — legal since the main is `#[tokio::main]` (multi-thread by default). 2. `hero_embedder_lib/src/embedderd_client.rs::embed()`: wrap the `self.http.post(…).send()?.error_for_status()?.json()?` chain in `tokio::task::block_in_place(|| -> Result<_, reqwest::Error> { … })?`. 3. Same wrap in `EmbedderdClient::rerank()`. 4. `hero_embedder_lib/src/api/namespace.rs:62-71`: change the guard to `if state.embedderd_client.is_none() && !state.embedders.contains_key(&model) { error }` — i.e. trust the daemon. Result after rebuild + restart: `embed` returns 384-dim vectors, `namespace.create` succeeds, `hero_books` indexed docs_hero (163 docs / 7 pages), `search.query` returns real hits, AI Assistant quotes verbatim from hero_os_guide overview. Commit on `development_mik_nu_demo` branch: `[nu-demo] wrap blocking reqwest calls in block_in_place` (3 files, ~22 ins / 17 del). Not pushed — stays local until reviewers opt in. ## Proper upstream fix The `block_in_place` wraps work but are fragile (multi-thread-runtime-only, and they block a whole worker thread). The clean answer is an async client: 1. **Convert `EmbedderdClient`** to hold `reqwest::Client` (async) instead of `reqwest::blocking::Client`. Remove the builder's blocking import. 2. **Make `embed()` and `rerank()` `async fn`**, with `.send().await?.error_for_status()?.json().await?`. 3. **Keep `is_reachable()` sync** by replacing its `self.http.get(…).send()` with a plain `std::net::TcpStream::connect_timeout` probe — no runtime involved; safe to call at startup from sync code. 4. **Update callers in `state.rs` and `api.rs`** (~5 sites total) to `.await` the now-async methods. Any fn that calls `.embed(…)` / `.rerank(…)` is already `async` — adding `.await` is one-token changes. 5. **Remove the `tokio::task::block_in_place` wraps** from main.rs + embedderd_client.rs. 6. **For `namespace.create`**: either keep our guard change (trust daemon when present) OR ask the daemon via its `/info` endpoint which qualities it has loaded and populate `state.embedders_available_qualities: HashSet<u8>` at startup. The guard change is simpler; the daemon check is more robust. ## Why this is worth doing Without this, hero_books has no vector search, the AI Assistant has no retrieval, and the entire “semantic grounding via MCP/OpenRPC” story falls apart on any non-trivial deploy. Our local patches unblock the demo but the async refactor is the clean answer — probably a 1-2 hour PR against the `hero_embedder` repo. ## Tracking Related: - [home#130](https://forge.ourworld.tf/lhumina_code/home/issues/130) — hero_osis_ai domain missing (upstream-side, separate) - [home#140](https://forge.ourworld.tf/lhumina_code/home/issues/140) — WASM compression (different but same "ergonomic upstream gap" theme) - [home#144](https://forge.ourworld.tf/lhumina_code/home/issues/144) — office content per library (needs this to be grounded too) Filed 2026-04-23 (late evening) nu-shell demo bring-up. Signed-off-by: mik-tf