Native TCP transport for explorer — remove socat dependency for multi-machine deployments #30
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
The entire hero_compute RPC stack uses Unix sockets
only. Nodes on different physical machines cannot
send heartbeats to a remote explorer or receive
proxied VM calls without an external socat bridge.
Current workaround: socat bridges set up manually
or via make start scripts. This is fragile, requires
socat installed on every node, and is not
container-friendly.
What Needs to Change
5 components need updating:
hero_rpc — UnixRpcServer needs an optional
TcpListener alongside the existing UnixListener.
Should be backward compatible — Unix socket
remains the default.
hero_compute_explorer/src/main.rs — When
EXPLORER_TCP_ADDR env var is set (e.g.
0.0.0.0:9002), bind a TCP listener in addition
to the Unix socket.
hero_compute_server/src/heartbeat_sender.rs —
Parse tcp://host:port in EXPLORER_ADDRESSES
and use TcpStream::connect() instead of
UnixStream::connect() for remote explorers.
hero_compute_explorer/src/explorer/proxy.rs —
Support TcpStream for remote nodes alongside
UnixStream for local nodes. The node's
socket_path field becomes a URI:
unix:///path/to/socket (local node)
tcp://host:port (remote node)
schemas/explorer/explorer.oschema —
Rename or redefine socket_path field to
accept both unix:// and tcp:// URIs.
New Environment Variables
EXPLORER_TCP_ADDR=0.0.0.0:9002
On the explorer — enables TCP listener.
If unset, TCP is disabled (Unix only,
current behavior).
EXPLORER_ADDRESSES=tcp://135.181.217.244:9002
On the node — where to send heartbeats.
Supports both:
unix:///path/to/explorer.sock (local)
tcp://host:port (remote)
Comma-separated for multiple explorers.
Security Note
TCP transport should initially be unauthenticated
(same as Unix socket today). TLS/mTLS is a
follow-up once the basic transport works.
For now, network-level security (firewall rules,
VPN, mycelium network) is sufficient.
Backward Compatibility
set must continue to work identically
Why This Matters
This is the prerequisite for:
(TF nodes are remote by definition)
Note on herolib_core
herolib_core already has OpenRpcTransport::http()
for the client side. The server side (UnixRpcServer)
has no TCP equivalent. The client-side building
blocks exist — server side needs to be added.
Definition of Done
explorers
via TCP
tcp:// schemes
documented as optional
TCP setup instructions