Router prefix-doubling: /hero_books/web/ looks for web_web.sock instead of web.sock #113
Labels
No labels
prio_critical
prio_low
type_bug
type_contact
type_issue
type_lead
type_question
type_story
type_task
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/hero_router#113
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
A request to https://hcockpit.gent01.qa.grid.tf/hero_books/web/ (admin VM 0069) hits the router waiting-for-socket page with:
The hero_books_web crate's service.toml declares its socket path as
hero_books/web.sock, nothero_books/web_web.sock. The router appears to be concatenating the URL subpath segmentwebwith the servicekindvaluewebwhen constructing the socket lookup.Repro: anonymous + authenticated browser request against the live admin VM after the s168 deploy (binaries: hero_router unchanged this session, hero_cockpit_web and hero_books_* binaries refreshed). The 302-to-OAuth before this page renders confirms the request reaches the router proper.
Acceptance: a fresh
/hero_books/web/request resolves to thehero_books/web.sockUnix socket without doubling the subpath into the filename.Implementation Spec for Issue #113
Objective
A request to
/hero_books/web/(and generally/<service>/web[/...],/<service>/app[/...]) must resolve to the service's canonical<service>/web.sockUnix socket, never to the doubledweb_web.sock. The fix must guarantee no subpath segment is concatenated with itself into aweb_<segment>.sockfilename when the segment is already a canonical web/app socket basename, and must apply identically to the HTTP proxy and the WebSocket tunnel.Root Cause
The per-service proxy dispatcher matches the URL's
webnamesegment against a fixed set of arms (rpc,ui,admin,rest,api,openrpc,python,js) and routes everything else through the catch-all_arm.In
crates/hero_router/src/server/proxy/http/http_dispatch.rs:545-553, the_arm computes the socket filename:For
/hero_books/web/,service_name = "hero_books"andwebname = "web". The arm buildsdirect = "web.sock"and, only if that file exists on disk at dispatch time, uses it. Otherwise it falls through toformat!("web_{}.sock", webname)=web_web.sock. The webname-to-socket contract documented athttp_dispatch.rs:49(<name> -> web_<name>.sock) is designed for named web endpoints wherewebnameis the suffix (e.g.public->web_public.sock); it is wrong whenwebnameis literallyweb/app, which are canonical UI-socket basenames.This is corroborated by the scanner:
classify_socketincrates/hero_router/src/scanner.rs:58treatsweb.sock(andapp.sock) asSocketKind::Ui— first-class canonical sockets, not named web endpoints. The dispatcher has dedicated arms forui/admin(which callresolve_ui_socket, whose directory fallback atcrates/hero_router/src/server/proxy/http/http_socket_resolve.rs:72already checksweb.sock), but no arm forweborapp, so they fall to_and are subject to the doubling fallback.The doubled name surfaces in the user-visible error via the not-found path: the
_arm passes the computedsock_nametonot_found_response(service_name, &sock_name, &headers)athttp_dispatch.rs:567, which formats"Socket '{sock_name}' not found — service may be restarting."atcrates/hero_router/src/server/routes.rs:773; the "Attempt N of 15 — reloading" wrapper is the browser auto-retry page (retry_html). This confirms the resolution path: the renderedweb_web.sockis exactly the value produced athttp_dispatch.rs:552.The identical defective logic is duplicated for the WebSocket tunnel at
crates/hero_router/src/server/proxy/http/http_websocket.rs:97-102:Requirements
/hero_books/web/and/hero_books/web/<rest>resolve to<sock_dir>/hero_books/web.sock.web_<segment>.sockwhen<segment>is itself a canonical web/app socket basename (web,app)./<svc>/public->web_public.sock,/<svc>/<name>->web_<name>.sock./<svc>/explorer_rpc->explorer_rpc.sock(existingdirectbehavior preserved).http_dispatch.rs) and the WebSocket tunnel (http_websocket.rs) resolve the socket filename identically.web.sock(the correct expected name) rather thanweb_web.sock.Files to Modify/Create
crates/hero_router/src/server/proxy/http/http_socket_resolve.rs— add a small pure helper that maps a non-reservedwebnameto its candidate socket filename(s) without doubling; export it for the dispatcher, the WS tunnel, and tests.crates/hero_router/src/server/proxy/http/http_dispatch.rs— replace the inlinedirect/web_<name>computation in the_arm (lines 548-553) with a call to the shared helper; update the comment block at lines 40-49 to document thatweb/appmap toweb.sock/app.sock.crates/hero_router/src/server/proxy/http/http_websocket.rs— replace the inlinedirect/web_<other>computation (lines 97-102) with the same shared helper.Implementation Plan
Step 1: Add a shared, testable webname-to-socket-filename resolver
Files:
crates/hero_router/src/server/proxy/http/http_socket_resolve.rsservice_name,webname, andsock_dir, returns the socket filename to use:direct = format!("{webname}.sock"). Ifsock_dir.join(service_name).join(&direct).exists(), returndirect.webname == "web" || webname == "app"(the canonical UI basenames perscanner.rs:58), returndirecteven when it does not yet exist — so the doubling fallback is never reached and the not-found page reportsweb.sock/app.sock. This also self-heals the timing case where the socket is momentarily absent during a restart.format!("web_{webname}.sock")(unchanged behavior for named endpoints).pub(super)so bothhttp_dispatch.rsandhttp_websocket.rscan call it.Dependencies: none
Step 2: Use the shared resolver in the HTTP dispatcher catch-all arm
Files:
crates/hero_router/src/server/proxy/http/http_dispatch.rs_arm (lines 545-554), replace the inline computation withlet sock_name = resolve_web_socket_name(service_name, webname, sock_dir);(add the helper to the existing import fromhttp_socket_resolve).socket_path/not_found_responselogic unchanged.web/appexception.Dependencies: Step 1
Step 3: Use the shared resolver in the WebSocket tunnel
Files:
crates/hero_router/src/server/proxy/http/http_websocket.rsother => { ... }arm (lines 94-113), replace the inline computation withlet sock_name = resolve_web_socket_name(service_name, other, sock_dir);.Dependencies: Step 1
Step 4: Add unit tests for the resolver
Files:
crates/hero_router/src/server/proxy/http/http_socket_resolve.rsresolve_web_socket_name(svc, "web", dir)returns"web.sock"when the file is absent (the #113 regression case)."web.sock"when<dir>/<svc>/web.sockexists."app"returns"app.sock"(absent and present)."public"returns"web_public.sock"when neitherpublic.socknorweb_public.sockexists (named-endpoint fallback preserved)."explorer_rpc"returns"explorer_rpc.sock"when that file exists (direct-named socket preserved).Dependencies: Steps 1-3
Step 5: Build and run the test suite
Files: none (verification only)
cargo build -p hero_routerandcargo test -p hero_router.http_dispatch.rsstill pass.Dependencies: Steps 1-4
Acceptance Criteria
GET /hero_books/web/resolves to<sock_dir>/hero_books/web.sock(noweb_web.sock).hero_books/web.sockis absent, the waiting/not-found page readsSocket 'web.sock' not found ..., notweb_web.sock./<svc>/app[/...]resolves toapp.sock, neverweb_app.sock./<svc>/publicstill resolves toweb_public.sock;/<svc>/explorer_rpcstill resolves toexplorer_rpc.sock./hero_books/web/...resolves to the sameweb.sock.http_socket_resolve.rscover theweb/appnon-doubling cases and the named/direct fallbacks, and the fullcargo test -p hero_routerpasses.Notes
directexistence check athttp_dispatch.rs:549would already pickweb.sockwhen the file exists at dispatch time. The observed failure happened either because the socket was momentarily absent during the restart or because the deployed binary predates the refactor where thedirectcheck was introduced. Makingweb/appresolve to<webname>.sockunconditionally fixes both: it removes the race-dependent doubling and guarantees the correct name on the retry page so the auto-reload converges once the socket reappears.web/appthrough the existingui/adminarms. Those arms injectX-Forwarded-Prefix: /<svc>/uiand forward toadmin.sock/ui.sock; aweb-mounted service expects the prefix/<svc>/weband its ownweb.sock. The_arm already sets the correctX-Forwarded-Prefix. Keep the request flowing through the_arm and only fix the filename computation.classify_socket(scanner.rs:55-81).Test Results
Ran
cargo test -p hero_routeracross all targets.Breakdown per target:
herolib_router): 145 passedhero_router): 5 passedtests/sse_multiplex.rs): 5 passedThe 5 new resolver tests in
server::proxy::http::http_socket_resolveall pass, confirming the prefix-doubling fix:direct_named_socket_present_is_used_verbatimweb_present_resolves_to_web_sockweb_absent_resolves_to_web_sock_not_doublednamed_endpoint_absent_falls_back_to_web_prefixapp_resolves_to_app_sock_absent_and_presentThe key case
/hero_books/web/now resolves toweb.sockrather thanweb_web.sock.Two pre-existing stale tests were repaired so the suite compiles and passes:
tests/sse_multiplex.rshad aServiceEntrystruct literal missing thedomainsfield (added toServiceEntryafter the test was written). Addeddomains: Vec::new().cache::tests::derive_group_name_falls_back_to_stem_for_unknown_socketsasserted on/var/sockets/legacy.sock, butderive_group_namewas changed to treat any nested socket's parent directory as the group, so that path now resolves tosockets. Updated the test to use a non-nested path (/legacy.sock), which is the actual stem-fallback case.Implementation Summary
Changes
Fix (issue #113):
crates/hero_router/src/server/proxy/http/http_socket_resolve.rs— addedresolve_web_socket_name(service_name, webname, sock_dir): returns<webname>.sockwhen it exists on disk; for the canonical UI basenameswebandappit returns<webname>.sockunconditionally (theweb_<webname>.sockfallback is never reached, so no moreweb_web.sockdoubling and the waiting page reports the correct expected socket name); all other names keep the existing<name>.sock-if-present-else-web_<name>.sockbehavior.crates/hero_router/src/server/proxy/http/http_dispatch.rs— the catch-all webname arm now calls the shared resolver instead of the inline computation; the webname-to-socket mapping comment documents theweb/appexception.crates/hero_router/src/server/proxy/http/http_websocket.rs— the WebSocket tunnel arm uses the same shared resolver, keeping HTTP and WS resolution in lockstep.Tests:
http_socket_resolve.rs, including the regression caseweb_absent_resolves_to_web_sock_not_doubled(/hero_books/web/resolves toweb.sockeven when the socket is momentarily absent during a restart).tests/sse_multiplex.rs(missingdomainsfield onServiceEntryliteral), staleinclude_str!paths in thegenerator/js.rsandgenerator/python.rstest modules, and an outdated assertion incache.rs::derive_group_name_falls_back_to_stem_for_unknown_sockets. No production code was changed by these repairs.Test Results
155 total, 155 passed, 0 failed (lib 145, bin 5, integration 5).
Notes
_arm'sX-Forwarded-Prefix: /<service>/<webname>behavior is unchanged, soweb-mounted services keep receiving the correct prefix.web/appare intentionally not routed through theui/adminarms; only the filename computation changed.it happened because hero book didn't find web.sock of hero_books so it fall back to web_web.sock but the root cause was hero_books didn't start which should expose the web.sock first