transport: check_socket_or_die calls std::process::exit on socket errors #109
Labels
No labels
prio_critical
prio_low
type_bug
type_contact
type_issue
type_lead
type_question
type_story
type_task
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/hero_blueprint#109
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
crates/openrpc/src/transport.rs::check_socket_or_diecallsstd::process::exit(0)whenever a Unix socket cannot be opened — file missing, EACCES on the path, or the entry is not a socket. It is invoked fromunix_socket(line 149) andunix_socket_with_path(line 178) on commitf17dcd7.A library calling
process::exitis a footgun for any long-running consumer: a single failedconnect()for one target's socket kills the entire host process. We hit this inhero_codescalers, where the daemon needs to talk to each user's hero_proc socket — when a user's home is0700or their daemon isn't running, the codescalers daemon dies with exit code 0 and hero_proc restarts it. Every in-flight RPC drops; the UI shows "RPC server unavailable" for a few seconds.Reproduction
From any consumer that opens an
OpenRpcClientagainst a missing or unreachable socket:Impact
Result<Self, OpenRpcError>).Errare terminated instead.tokio::net::UnixStream::connect+ timeout) push responsibility back to every consumer, defeating the purpose of returning aResult.Suggested fix
Return an
OpenRpcErrorfromcheck_socket_or_die(and rename it) so the existingResultsignature is honoured. Move theeprintln!block to consumers that genuinely want CLI-friendly output — most servers want the structured error.CLI tools that rely on the friendly exit-on-fail behaviour today can wrap
OpenRpcClient::unix_socket(...)?themselves and print before bailing.Context
f17dcd71aa18ce9527ddf7a2ce19ea607ff41d65(branchdevelopment)hero_codescalers: pre-stat each per-user socket before calling the SDK (seecrates/hero_codescalers_server/src/workspace.rs::ensure_user_daemon_reachable). The workaround can be removed once this lands.Already fixed in
dc83817(closes #95). You cited commitf17dcd7which predates the fix by one day.Current state of
crates/openrpc/src/transport.rs(HEAD2b2aeb6):check_socket_or_diewas renamed →check_socket_reachable, returnsResult<(), OpenRpcError>instead of callingexit(0). Bothunix_socket()andunix_socket_with_path()propagate via?. The innerUnixStream::connect.map_errblocks that also calledexit(0)are gone too.Audit at the time confirmed all 33 downstream consumers use
?-propagation through the generated SDKs — non-breaking. See #95 for the full architectural write-up.Closing as duplicate of #95. If you see the exit(0) behavior on the latest binary, let me know — that would mean the fix got reverted somewhere.