hero_embedder_server fails because hero_embedderd model load > 30s timeout #29

Open
opened 2026-04-30 11:01:47 +00:00 by zaelgohary · 0 comments
Member

Symptom

POST /hero_embedder/ui/rpc returns 404 with Socket 'rpc.sock' not found for 'hero_embedder'. Hits 31 times in QA audit.

Root cause

hero_embedder_server job is in failed state. It starts before hero_embedderd finishes loading models, fails the daemon health check (30s timeout), and gives up:

Error: hero_embedderd is required but not reachable
HERO_EMBEDDERD_URL='http://127.0.0.1:8092' is set but the daemon did not respond to /health within 30s.
Start hero_embedderd or unset the variable to fall back to the loopback default.

Without the server running, rpc.sock never gets created, and hero_router's sibling-shortcut for /<svc>/ui/rpc → rpc.sock 404s.

Fix options

  1. Make hero_embedder_server retry/wait for the daemon instead of giving up after 30s.
  2. Order the jobs: hero_embedder_server should depend on hero_embedderd being ready (not just running).
  3. Bump the timeout substantially — model load can take minutes on first boot.
## Symptom `POST /hero_embedder/ui/rpc` returns 404 with `Socket 'rpc.sock' not found for 'hero_embedder'`. Hits 31 times in QA audit. ## Root cause `hero_embedder_server` job is in `failed` state. It starts before `hero_embedderd` finishes loading models, fails the daemon health check (30s timeout), and gives up: ``` Error: hero_embedderd is required but not reachable HERO_EMBEDDERD_URL='http://127.0.0.1:8092' is set but the daemon did not respond to /health within 30s. Start hero_embedderd or unset the variable to fall back to the loopback default. ``` Without the server running, `rpc.sock` never gets created, and hero_router's sibling-shortcut for `/<svc>/ui/rpc` → rpc.sock 404s. ## Fix options 1. Make `hero_embedder_server` retry/wait for the daemon instead of giving up after 30s. 2. Order the jobs: `hero_embedder_server` should depend on `hero_embedderd` being ready (not just running). 3. Bump the timeout substantially — model load can take minutes on first boot.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_embedder#29
No description provided.