feat: Auto-detect node connectivity (Public IP → Mycelium → Local fallback) #57

Open
opened 2026-04-05 11:29:27 +00:00 by mahmoud · 1 comment
Owner

Summary

Currently, worker nodes require manually setting MYCELIUM_IP in the environment, and the heartbeat transport only works over TCP with routable IPs. Nodes behind NAT or without public IPs cannot participate in a multi-node cluster.

We need an automatic connectivity discovery chain at node startup that determines the best available transport — no manual config required.

Proposed Behavior

On node registration, run a connectivity discovery chain:

1. Check for Public IP

  • If detected → use as primary transport
  • Log: "Public IP detected: 85.x.x.x — ready for multi-node communication"

2. No Public IP → Check Mycelium

  • If Mycelium daemon is running → use Mycelium IPv6 as transport
  • Log: "No public IP detected, checking Mycelium..."
  • Log: "Mycelium network active: 300:abc::1 — ready to accept connections"

3. No Public IP, No Mycelium → Local-only mode

  • Fall back to local mode (Unix socket only)
  • Log: "⚠ Mycelium is not running, falling back to local mode"
  • Log: "⚠ Note: in local mode you will not be able to manage VMs outside this node"

4. Has Both Public IP + Mycelium

  • Use public IP as primary, store Mycelium as secondary/metadata
  • Log: "Public IP: 85.x.x.x (primary), Mycelium: 300:abc::1"

Why

  • Eliminates manual config — no more MYCELIUM_IP env var to set
  • Enables NAT-friendly clusters — workers behind NAT can join via Mycelium overlay
  • Graceful degradation — Public IP > Mycelium > Local, with clear logging at each step
  • New topologies — fully Mycelium-based clusters with zero public IPs, or mixed setups

Implementation Notes

  • Mycelium IPs are standard IPv6, so tcp://[300:abc::1]:9002 should work with the existing TCP bridge — no protocol changes needed
  • The explorer TCP bridge needs to also bind on / connect to Mycelium IPv6 addresses
  • Auto-detection can inspect network interfaces for Mycelium address ranges, or query the local Mycelium daemon
  • The HERO_COMPUTE_ADVERTISE_ADDRESS should be set automatically based on the discovered transport
  • The heartbeat sender already supports TCP, just needs to use the right address

Affected Crates

  • hero_compute — CLI startup logic, env var generation
  • hero_compute_server — node registration, heartbeat sender, config
  • hero_compute_explorer — TCP bridge binding, proxy connections
## Summary Currently, worker nodes require manually setting `MYCELIUM_IP` in the environment, and the heartbeat transport only works over TCP with routable IPs. Nodes behind NAT or without public IPs cannot participate in a multi-node cluster. We need an **automatic connectivity discovery chain** at node startup that determines the best available transport — no manual config required. ## Proposed Behavior On node registration, run a connectivity discovery chain: ### 1. Check for Public IP - If detected → use as primary transport - Log: `"Public IP detected: 85.x.x.x — ready for multi-node communication"` ### 2. No Public IP → Check Mycelium - If Mycelium daemon is running → use Mycelium IPv6 as transport - Log: `"No public IP detected, checking Mycelium..."` - Log: `"Mycelium network active: 300:abc::1 — ready to accept connections"` ### 3. No Public IP, No Mycelium → Local-only mode - Fall back to local mode (Unix socket only) - Log: `"⚠ Mycelium is not running, falling back to local mode"` - Log: `"⚠ Note: in local mode you will not be able to manage VMs outside this node"` ### 4. Has Both Public IP + Mycelium - Use public IP as primary, store Mycelium as secondary/metadata - Log: `"Public IP: 85.x.x.x (primary), Mycelium: 300:abc::1"` ## Why - **Eliminates manual config** — no more `MYCELIUM_IP` env var to set - **Enables NAT-friendly clusters** — workers behind NAT can join via Mycelium overlay - **Graceful degradation** — Public IP > Mycelium > Local, with clear logging at each step - **New topologies** — fully Mycelium-based clusters with zero public IPs, or mixed setups ## Implementation Notes - Mycelium IPs are standard IPv6, so `tcp://[300:abc::1]:9002` should work with the existing TCP bridge — no protocol changes needed - The explorer TCP bridge needs to also bind on / connect to Mycelium IPv6 addresses - Auto-detection can inspect network interfaces for Mycelium address ranges, or query the local Mycelium daemon - The `HERO_COMPUTE_ADVERTISE_ADDRESS` should be set automatically based on the discovered transport - The heartbeat sender already supports TCP, just needs to use the right address ## Affected Crates - `hero_compute` — CLI startup logic, env var generation - `hero_compute_server` — node registration, heartbeat sender, config - `hero_compute_explorer` — TCP bridge binding, proxy connections
Author
Owner

Update: Align with Hero RPC Transport Standards

hero_compute currently uses raw TCP sockets with newline-delimited JSON-RPC for cross-node communication (heartbeats, RPC proxy). This is a custom transport that doesn't follow the Hero ecosystem standard, which is:

  • All services bind exclusively to Unix Domain Sockets
  • JSON-RPC 2.0 over HTTP/1.1 (not raw newline-delimited TCP)
  • hero_proxy is the sole TCP entry point
  • hero_inspector discovers services by scanning sockets

hero_compute bypasses all of this with a custom tcp_bridge() — but for a valid reason: the Hero RPC layer has no concept of remote service-to-service communication. OpenRpcTransport only supports connect_socket(path), no TCP. So hero_compute had to roll its own.

What this means for Mycelium support

Short-term — Mycelium IPv6 works as-is with the existing raw TCP bridge (tcp://[300:abc::1]:9002), minimal changes needed.

Long-term — this exposes a gap in the Hero RPC ecosystem: no cross-machine transport. Any future Hero service needing multi-node communication will hit the same wall.

Revised scope for this issue

Stays focused on the connectivity discovery chain:

  1. Auto-detect public IP → use for TCP bridge transport
  2. No public IP → auto-detect Mycelium IPv6 from network interfaces (400::/7 or 300::/8 ranges)
  3. Neither available → local-only mode with clear warning
  4. Both available → public IP primary, Mycelium stored as metadata
  5. Remove need for manual MYCELIUM_IP env var
  6. Clear startup logging at each decision point
  7. TCP bridge needs to bind [::] (not just 0.0.0.0) for IPv6/Mycelium support

Proposed detection logic

fn detect_connectivity() -> ConnectivityMode {
    // 1. Check for public/routable IPv4
    if let Some(ip) = detect_public_ip() {
        // log: "Public IP detected: {ip} — ready for multi-node communication"
        if let Some(myc) = detect_mycelium_ip() {
            // log: "Mycelium also available: {myc}"
            return PublicWithMycelium { public: ip, mycelium: myc }
        }
        return PublicOnly(ip)
    }

    // 2. No public IP — check Mycelium
    // log: "No public IP detected, checking Mycelium..."
    if let Some(myc) = detect_mycelium_ip() {
        // log: "Mycelium network active: {myc} — ready to accept connections"
        return MyceliumOnly(myc)
    }

    // 3. Nothing available
    // log: "⚠ No public IP and Mycelium is not running"
    // log: "⚠ Falling back to local mode — cannot manage VMs outside this node"
    return LocalOnly
}

fn detect_mycelium_ip() -> Option<String> {
    // Scan network interfaces for IPv6 in Mycelium ranges
    // (400::/7 or 300::/8 depending on Mycelium version)
    // Or check if mycelium process is running and query it
}
## Update: Align with Hero RPC Transport Standards hero_compute currently uses **raw TCP sockets with newline-delimited JSON-RPC** for cross-node communication (heartbeats, RPC proxy). This is a custom transport that doesn't follow the Hero ecosystem standard, which is: - All services bind exclusively to Unix Domain Sockets - JSON-RPC 2.0 over HTTP/1.1 (not raw newline-delimited TCP) - `hero_proxy` is the sole TCP entry point - `hero_inspector` discovers services by scanning sockets hero_compute bypasses all of this with a custom `tcp_bridge()` — but **for a valid reason**: the Hero RPC layer has no concept of remote service-to-service communication. `OpenRpcTransport` only supports `connect_socket(path)`, no TCP. So hero_compute had to roll its own. ### What this means for Mycelium support **Short-term** — Mycelium IPv6 works as-is with the existing raw TCP bridge (`tcp://[300:abc::1]:9002`), minimal changes needed. **Long-term** — this exposes a gap in the Hero RPC ecosystem: no cross-machine transport. Any future Hero service needing multi-node communication will hit the same wall. ### Revised scope for this issue Stays focused on the connectivity discovery chain: 1. Auto-detect public IP → use for TCP bridge transport 2. No public IP → auto-detect Mycelium IPv6 from network interfaces (`400::/7` or `300::/8` ranges) 3. Neither available → local-only mode with clear warning 4. Both available → public IP primary, Mycelium stored as metadata 5. Remove need for manual `MYCELIUM_IP` env var 6. Clear startup logging at each decision point 7. TCP bridge needs to bind `[::]` (not just `0.0.0.0`) for IPv6/Mycelium support ### Proposed detection logic ```rust fn detect_connectivity() -> ConnectivityMode { // 1. Check for public/routable IPv4 if let Some(ip) = detect_public_ip() { // log: "Public IP detected: {ip} — ready for multi-node communication" if let Some(myc) = detect_mycelium_ip() { // log: "Mycelium also available: {myc}" return PublicWithMycelium { public: ip, mycelium: myc } } return PublicOnly(ip) } // 2. No public IP — check Mycelium // log: "No public IP detected, checking Mycelium..." if let Some(myc) = detect_mycelium_ip() { // log: "Mycelium network active: {myc} — ready to accept connections" return MyceliumOnly(myc) } // 3. Nothing available // log: "⚠ No public IP and Mycelium is not running" // log: "⚠ Falling back to local mode — cannot manage VMs outside this node" return LocalOnly } fn detect_mycelium_ip() -> Option<String> { // Scan network interfaces for IPv6 in Mycelium ranges // (400::/7 or 300::/8 depending on Mycelium version) // Or check if mycelium process is running and query it } ```
mahmoud added this to the ACTIVE project 2026-04-05 12:38:09 +00:00
mahmoud added this to the now milestone 2026-04-05 12:38:11 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_compute#57
No description provided.