feat: Replace TCP bridges with Hero Router for cross-node communication (Phase 2) #69

Closed
opened 2026-04-06 19:06:47 +00:00 by mahmoud · 1 comment
Owner

Context

Follows from #65 (Phase 1 — HTTP-over-UDS migration). hero_compute's multi-node deployment (master/worker mode) currently uses in-process TCP bridges for cross-node communication:

  • Port 9002 (master): Explorer TCP bridge — receives worker heartbeats
  • Port 9003 (worker): Server TCP bridge — receives proxied RPC calls from master

These are raw TCP forwarders baked into each binary. They bypass Hero's standard networking layer (Hero Router), have no encryption, no authentication, and no service discovery.

Objective

Replace the custom TCP bridges with Hero Router as the single network entry point per node. Each node runs hero_router on a single port, which discovers local Unix sockets and routes incoming HTTP requests to the correct service.

Architecture Change

Before (TCP Bridges)

Master explorer ←──TCP:9002──── Worker heartbeat_sender
Master explorer ───TCP:9003───→ Worker compute_server

Two custom TCP ports per node, raw byte forwarding, no discovery.

After (Hero Router)

Master hero_router (port 443/9998)
  └─→ /hero_compute/explorer_rpc/* → local explorer_rpc.sock

Worker hero_router (port 443/9998) 
  └─→ /hero_compute/rpc/* → local rpc.sock

Single entry point per node, HTTP routing, service discovery, context headers.

Implementation Plan

1. Remove TCP bridge code

  • crates/hero_compute_server/src/main.rs: Remove tcp_bridge() function and --tcp-port CLI arg
  • crates/hero_compute_explorer/src/main.rs: Remove tcp_bridge() function and --tcp-port CLI arg
  • Remove MAX_BRIDGE_CONNECTIONS constant

2. Update NodeProxy to route through Hero Router

  • crates/hero_compute_explorer/src/explorer/proxy.rs: Update call_tcp() to send HTTP POST through Hero Router URL instead of raw TCP
  • Worker advertise address becomes http://worker-ip:9998/hero_compute/rpc/api/root/cloud/rpc (Hero Router path)
  • Keep Unix socket path for local nodes unchanged

3. Update heartbeat sender

  • crates/hero_compute_server/src/heartbeat_sender.rs: Update send_tcp() to send HTTP POST through Hero Router
  • Master explorer address becomes http://master-ip:9998/hero_compute/explorer_rpc/api/root/explorer/rpc

4. Update CLI and .env generation

  • crates/hero_compute/src/main.rs: Remove --rpc-port and --explorer-port flags
  • Update write_env(): EXPLORER_ADDRESSES uses Hero Router URL format
  • Update HERO_COMPUTE_ADVERTISE_ADDRESS format
  • Keep --port for UI (still needs direct TCP for WebSocket console)

5. Update documentation

  • Remove TCP port references (9002, 9003) from docs
  • Document Hero Router dependency and setup
  • Update architecture diagrams

Socket Discovery

Hero Router auto-discovers services by scanning $HERO_SOCKET_DIR/. With the Phase 1 layout:

~/hero/var/sockets/hero_compute/
  rpc.sock            → discovered as /hero_compute/rpc/*
  ui.sock             → discovered as /hero_compute/ui/*
  explorer_rpc.sock   → discovered as /hero_compute/explorer_rpc/*

No registration code needed — Hero Router finds the sockets automatically.

What Stays

  • UI TCP port (9001): Kept for WebSocket console sessions (Hero Router doesn't proxy WebSocket upgrade yet)
  • Unix socket communication: All local inter-process communication stays on UDS
  • Heartbeat protocol: Same JSON-RPC payload, just transported via HTTP through Hero Router instead of raw TCP

Dependencies

  • Hero Router must be installed and running on each node
  • Install script should include hero_router as a dependency

Acceptance Criteria

  • cargo build --workspace succeeds
  • cargo test --workspace passes
  • No TCP bridge code remains in server/explorer binaries
  • --rpc-port and --explorer-port CLI flags removed
  • Multi-node master/worker mode works through Hero Router
  • Heartbeats flow through Hero Router
  • VM operations proxied through Hero Router
  • Single node (local) mode still works unchanged
  • Documentation updated

References

## Context Follows from #65 (Phase 1 — HTTP-over-UDS migration). hero_compute's multi-node deployment (master/worker mode) currently uses **in-process TCP bridges** for cross-node communication: - **Port 9002** (master): Explorer TCP bridge — receives worker heartbeats - **Port 9003** (worker): Server TCP bridge — receives proxied RPC calls from master These are raw TCP forwarders baked into each binary. They bypass Hero's standard networking layer (Hero Router), have no encryption, no authentication, and no service discovery. ## Objective Replace the custom TCP bridges with **Hero Router** as the single network entry point per node. Each node runs `hero_router` on a single port, which discovers local Unix sockets and routes incoming HTTP requests to the correct service. ## Architecture Change ### Before (TCP Bridges) ``` Master explorer ←──TCP:9002──── Worker heartbeat_sender Master explorer ───TCP:9003───→ Worker compute_server ``` Two custom TCP ports per node, raw byte forwarding, no discovery. ### After (Hero Router) ``` Master hero_router (port 443/9998) └─→ /hero_compute/explorer_rpc/* → local explorer_rpc.sock Worker hero_router (port 443/9998) └─→ /hero_compute/rpc/* → local rpc.sock ``` Single entry point per node, HTTP routing, service discovery, context headers. ## Implementation Plan ### 1. Remove TCP bridge code - `crates/hero_compute_server/src/main.rs`: Remove `tcp_bridge()` function and `--tcp-port` CLI arg - `crates/hero_compute_explorer/src/main.rs`: Remove `tcp_bridge()` function and `--tcp-port` CLI arg - Remove `MAX_BRIDGE_CONNECTIONS` constant ### 2. Update NodeProxy to route through Hero Router - `crates/hero_compute_explorer/src/explorer/proxy.rs`: Update `call_tcp()` to send HTTP POST through Hero Router URL instead of raw TCP - Worker advertise address becomes `http://worker-ip:9998/hero_compute/rpc/api/root/cloud/rpc` (Hero Router path) - Keep Unix socket path for local nodes unchanged ### 3. Update heartbeat sender - `crates/hero_compute_server/src/heartbeat_sender.rs`: Update `send_tcp()` to send HTTP POST through Hero Router - Master explorer address becomes `http://master-ip:9998/hero_compute/explorer_rpc/api/root/explorer/rpc` ### 4. Update CLI and .env generation - `crates/hero_compute/src/main.rs`: Remove `--rpc-port` and `--explorer-port` flags - Update `write_env()`: `EXPLORER_ADDRESSES` uses Hero Router URL format - Update `HERO_COMPUTE_ADVERTISE_ADDRESS` format - Keep `--port` for UI (still needs direct TCP for WebSocket console) ### 5. Update documentation - Remove TCP port references (9002, 9003) from docs - Document Hero Router dependency and setup - Update architecture diagrams ## Socket Discovery Hero Router auto-discovers services by scanning `$HERO_SOCKET_DIR/`. With the Phase 1 layout: ``` ~/hero/var/sockets/hero_compute/ rpc.sock → discovered as /hero_compute/rpc/* ui.sock → discovered as /hero_compute/ui/* explorer_rpc.sock → discovered as /hero_compute/explorer_rpc/* ``` No registration code needed — Hero Router finds the sockets automatically. ## What Stays - **UI TCP port (9001)**: Kept for WebSocket console sessions (Hero Router doesn't proxy WebSocket upgrade yet) - **Unix socket communication**: All local inter-process communication stays on UDS - **Heartbeat protocol**: Same JSON-RPC payload, just transported via HTTP through Hero Router instead of raw TCP ## Dependencies - Hero Router must be installed and running on each node - Install script should include `hero_router` as a dependency ## Acceptance Criteria - [ ] `cargo build --workspace` succeeds - [ ] `cargo test --workspace` passes - [ ] No TCP bridge code remains in server/explorer binaries - [ ] `--rpc-port` and `--explorer-port` CLI flags removed - [ ] Multi-node master/worker mode works through Hero Router - [ ] Heartbeats flow through Hero Router - [ ] VM operations proxied through Hero Router - [ ] Single node (local) mode still works unchanged - [ ] Documentation updated ## References - Phase 1: #65 - Hero Router: https://forge.ourworld.tf/lhumina_code/hero_router - Hero Proxy: https://forge.ourworld.tf/lhumina_code/hero_proxy
Author
Owner

Implementation committed: 53e5d3f

PR: #71
Browse: 53e5d3f

Implementation committed: `53e5d3f` PR: https://forge.ourworld.tf/lhumina_code/hero_compute/pulls/71 Browse: https://forge.ourworld.tf/lhumina_code/hero_compute/commit/53e5d3f
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_compute#69
No description provided.