lhumina_code/hero_compute

Fork 0

feat: Auto-detect node connectivity (Public IP → Mycelium → Local fallback) #57

New issue

Closed

opened 2026-04-05 11:29:27 +00:00 by mahmoud · 2 comments

mahmoud commented

2026-04-05 11:29:27 +00:00

Owner

Summary

Currently, worker nodes require manually setting MYCELIUM_IP in the environment, and the heartbeat transport only works over TCP with routable IPs. Nodes behind NAT or without public IPs cannot participate in a multi-node cluster.

We need an automatic connectivity discovery chain at node startup that determines the best available transport — no manual config required.

Proposed Behavior

On node registration, run a connectivity discovery chain:

1. Check for Public IP

If detected → use as primary transport
Log: "Public IP detected: 85.x.x.x — ready for multi-node communication"

2. No Public IP → Check Mycelium

If Mycelium daemon is running → use Mycelium IPv6 as transport
Log: "No public IP detected, checking Mycelium..."
Log: "Mycelium network active: 300:abc::1 — ready to accept connections"

3. No Public IP, No Mycelium → Local-only mode

Fall back to local mode (Unix socket only)
Log: "⚠ Mycelium is not running, falling back to local mode"
Log: "⚠ Note: in local mode you will not be able to manage VMs outside this node"

4. Has Both Public IP + Mycelium

Use public IP as primary, store Mycelium as secondary/metadata
Log: "Public IP: 85.x.x.x (primary), Mycelium: 300:abc::1"

Why

Eliminates manual config — no more MYCELIUM_IP env var to set
Enables NAT-friendly clusters — workers behind NAT can join via Mycelium overlay
Graceful degradation — Public IP > Mycelium > Local, with clear logging at each step
New topologies — fully Mycelium-based clusters with zero public IPs, or mixed setups

Implementation Notes

Mycelium IPs are standard IPv6, so tcp://[300:abc::1]:9002 should work with the existing TCP bridge — no protocol changes needed
The explorer TCP bridge needs to also bind on / connect to Mycelium IPv6 addresses
Auto-detection can inspect network interfaces for Mycelium address ranges, or query the local Mycelium daemon
The HERO_COMPUTE_ADVERTISE_ADDRESS should be set automatically based on the discovered transport
The heartbeat sender already supports TCP, just needs to use the right address

Affected Crates

hero_compute — CLI startup logic, env var generation
hero_compute_server — node registration, heartbeat sender, config
hero_compute_explorer — TCP bridge binding, proxy connections

## Summary Currently, worker nodes require manually setting `MYCELIUM_IP` in the environment, and the heartbeat transport only works over TCP with routable IPs. Nodes behind NAT or without public IPs cannot participate in a multi-node cluster. We need an **automatic connectivity discovery chain** at node startup that determines the best available transport — no manual config required. ## Proposed Behavior On node registration, run a connectivity discovery chain: ### 1. Check for Public IP - If detected → use as primary transport - Log: `"Public IP detected: 85.x.x.x — ready for multi-node communication"` ### 2. No Public IP → Check Mycelium - If Mycelium daemon is running → use Mycelium IPv6 as transport - Log: `"No public IP detected, checking Mycelium..."` - Log: `"Mycelium network active: 300:abc::1 — ready to accept connections"` ### 3. No Public IP, No Mycelium → Local-only mode - Fall back to local mode (Unix socket only) - Log: `"⚠ Mycelium is not running, falling back to local mode"` - Log: `"⚠ Note: in local mode you will not be able to manage VMs outside this node"` ### 4. Has Both Public IP + Mycelium - Use public IP as primary, store Mycelium as secondary/metadata - Log: `"Public IP: 85.x.x.x (primary), Mycelium: 300:abc::1"` ## Why - **Eliminates manual config** — no more `MYCELIUM_IP` env var to set - **Enables NAT-friendly clusters** — workers behind NAT can join via Mycelium overlay - **Graceful degradation** — Public IP > Mycelium > Local, with clear logging at each step - **New topologies** — fully Mycelium-based clusters with zero public IPs, or mixed setups ## Implementation Notes - Mycelium IPs are standard IPv6, so `tcp://[300:abc::1]:9002` should work with the existing TCP bridge — no protocol changes needed - The explorer TCP bridge needs to also bind on / connect to Mycelium IPv6 addresses - Auto-detection can inspect network interfaces for Mycelium address ranges, or query the local Mycelium daemon - The `HERO_COMPUTE_ADVERTISE_ADDRESS` should be set automatically based on the discovered transport - The heartbeat sender already supports TCP, just needs to use the right address ## Affected Crates - `hero_compute` — CLI startup logic, env var generation - `hero_compute_server` — node registration, heartbeat sender, config - `hero_compute_explorer` — TCP bridge binding, proxy connections

mahmoud commented

2026-04-05 11:50:05 +00:00

Author

Owner

Update: Align with Hero RPC Transport Standards

hero_compute currently uses raw TCP sockets with newline-delimited JSON-RPC for cross-node communication (heartbeats, RPC proxy). This is a custom transport that doesn't follow the Hero ecosystem standard, which is:

All services bind exclusively to Unix Domain Sockets
JSON-RPC 2.0 over HTTP/1.1 (not raw newline-delimited TCP)
hero_proxy is the sole TCP entry point
hero_inspector discovers services by scanning sockets

hero_compute bypasses all of this with a custom tcp_bridge() — but for a valid reason: the Hero RPC layer has no concept of remote service-to-service communication. OpenRpcTransport only supports connect_socket(path), no TCP. So hero_compute had to roll its own.

What this means for Mycelium support

Short-term — Mycelium IPv6 works as-is with the existing raw TCP bridge (tcp://[300:abc::1]:9002), minimal changes needed.

Long-term — this exposes a gap in the Hero RPC ecosystem: no cross-machine transport. Any future Hero service needing multi-node communication will hit the same wall.

Revised scope for this issue

Stays focused on the connectivity discovery chain:

Auto-detect public IP → use for TCP bridge transport
No public IP → auto-detect Mycelium IPv6 from network interfaces (400::/7 or 300::/8 ranges)
Neither available → local-only mode with clear warning
Both available → public IP primary, Mycelium stored as metadata
Remove need for manual MYCELIUM_IP env var
Clear startup logging at each decision point
TCP bridge needs to bind [::] (not just 0.0.0.0) for IPv6/Mycelium support

Proposed detection logic

fn detect_connectivity() -> ConnectivityMode {
    // 1. Check for public/routable IPv4
    if let Some(ip) = detect_public_ip() {
        // log: "Public IP detected: {ip} — ready for multi-node communication"
        if let Some(myc) = detect_mycelium_ip() {
            // log: "Mycelium also available: {myc}"
            return PublicWithMycelium { public: ip, mycelium: myc }
        }
        return PublicOnly(ip)
    }

    // 2. No public IP — check Mycelium
    // log: "No public IP detected, checking Mycelium..."
    if let Some(myc) = detect_mycelium_ip() {
        // log: "Mycelium network active: {myc} — ready to accept connections"
        return MyceliumOnly(myc)
    }

    // 3. Nothing available
    // log: "⚠ No public IP and Mycelium is not running"
    // log: "⚠ Falling back to local mode — cannot manage VMs outside this node"
    return LocalOnly
}

fn detect_mycelium_ip() -> Option<String> {
    // Scan network interfaces for IPv6 in Mycelium ranges
    // (400::/7 or 300::/8 depending on Mycelium version)
    // Or check if mycelium process is running and query it
}

## Update: Align with Hero RPC Transport Standards hero_compute currently uses **raw TCP sockets with newline-delimited JSON-RPC** for cross-node communication (heartbeats, RPC proxy). This is a custom transport that doesn't follow the Hero ecosystem standard, which is: - All services bind exclusively to Unix Domain Sockets - JSON-RPC 2.0 over HTTP/1.1 (not raw newline-delimited TCP) - `hero_proxy` is the sole TCP entry point - `hero_inspector` discovers services by scanning sockets hero_compute bypasses all of this with a custom `tcp_bridge()` — but **for a valid reason**: the Hero RPC layer has no concept of remote service-to-service communication. `OpenRpcTransport` only supports `connect_socket(path)`, no TCP. So hero_compute had to roll its own. ### What this means for Mycelium support **Short-term** — Mycelium IPv6 works as-is with the existing raw TCP bridge (`tcp://[300:abc::1]:9002`), minimal changes needed. **Long-term** — this exposes a gap in the Hero RPC ecosystem: no cross-machine transport. Any future Hero service needing multi-node communication will hit the same wall. ### Revised scope for this issue Stays focused on the connectivity discovery chain: 1. Auto-detect public IP → use for TCP bridge transport 2. No public IP → auto-detect Mycelium IPv6 from network interfaces (`400::/7` or `300::/8` ranges) 3. Neither available → local-only mode with clear warning 4. Both available → public IP primary, Mycelium stored as metadata 5. Remove need for manual `MYCELIUM_IP` env var 6. Clear startup logging at each decision point 7. TCP bridge needs to bind `[::]` (not just `0.0.0.0`) for IPv6/Mycelium support ### Proposed detection logic ```rust fn detect_connectivity() -> ConnectivityMode { // 1. Check for public/routable IPv4 if let Some(ip) = detect_public_ip() { // log: "Public IP detected: {ip} — ready for multi-node communication" if let Some(myc) = detect_mycelium_ip() { // log: "Mycelium also available: {myc}" return PublicWithMycelium { public: ip, mycelium: myc } } return PublicOnly(ip) } // 2. No public IP — check Mycelium // log: "No public IP detected, checking Mycelium..." if let Some(myc) = detect_mycelium_ip() { // log: "Mycelium network active: {myc} — ready to accept connections" return MyceliumOnly(myc) } // 3. Nothing available // log: "⚠ No public IP and Mycelium is not running" // log: "⚠ Falling back to local mode — cannot manage VMs outside this node" return LocalOnly } fn detect_mycelium_ip() -> Option<String> { // Scan network interfaces for IPv6 in Mycelium ranges // (400::/7 or 300::/8 depending on Mycelium version) // Or check if mycelium process is running and query it } ```

mahmoud referenced this issue

2026-04-05 12:37:55 +00:00

chore: Remove HERO_COMPUTE_BRIDGE_TOKEN from TCP bridge #59

nabil_salah was assigned by mahmoud

2026-04-05 12:38:06 +00:00

mahmoud added this to the ACTIVE project

2026-04-05 12:38:09 +00:00

mahmoud added this to the now milestone

2026-04-05 12:38:11 +00:00

mahmoud commented

2026-04-11 15:02:07 +00:00

Author

Owner

Implemented in v0.1.8, with architecture changes from the original design.

What was implemented

Auto-detect Mycelium IP: install.sh queries hero_proxy mycelium.info RPC, saves MYCELIUM_IP to env file automatically
Worker auto-advertise: write_env() uses MYCELIUM_IP for the advertise address, falls back to detect_outbound_ip() if not set
Heartbeat over Mycelium: heartbeat sender builds Hero Proxy URL from mycelium_ip, sends via hero_proxy -> hero_router -> socket
hero_proxy listener auto-setup: install.sh creates [::]:9997 dual-stack listener, setup_hero_proxy_mycelium_listener() in CLI
Fallback chain: MYCELIUM_IP (preferred) -> detect_outbound_ip (fallback) -> local Unix socket (single-node)
No manual config needed: install script handles everything, user just runs hero_compute --start

Architecture difference from original design

The issue proposed TCP bridges on Mycelium IPv6 addresses. We instead implemented:

hero_proxy (Mycelium IPv6:9997) -> hero_router (localhost:9988) -> Unix sockets

TCP bridges were removed entirely. hero_proxy handles all external connectivity. The user-facing result is the same: auto-detection, no manual config, NAT-friendly via Mycelium.

Not implemented (minor)

Step-by-step connectivity logging (Public IP detected / Mycelium active / Local fallback) — current code warns on failure but no verbose discovery log
Dual-address mode (Public IP primary + Mycelium secondary) — uses one or the other, not both

These are cosmetic. The core requirement (auto-detect, zero manual config, multi-node over Mycelium) is working and tested live.

Closing.

Implemented in v0.1.8, with architecture changes from the original design. ### What was implemented - **Auto-detect Mycelium IP**: install.sh queries hero_proxy mycelium.info RPC, saves MYCELIUM_IP to env file automatically - **Worker auto-advertise**: write_env() uses MYCELIUM_IP for the advertise address, falls back to detect_outbound_ip() if not set - **Heartbeat over Mycelium**: heartbeat sender builds Hero Proxy URL from mycelium_ip, sends via hero_proxy -> hero_router -> socket - **hero_proxy listener auto-setup**: install.sh creates [::]:9997 dual-stack listener, setup_hero_proxy_mycelium_listener() in CLI - **Fallback chain**: MYCELIUM_IP (preferred) -> detect_outbound_ip (fallback) -> local Unix socket (single-node) - **No manual config needed**: install script handles everything, user just runs hero_compute --start ### Architecture difference from original design The issue proposed TCP bridges on Mycelium IPv6 addresses. We instead implemented: ``` hero_proxy (Mycelium IPv6:9997) -> hero_router (localhost:9988) -> Unix sockets ``` TCP bridges were removed entirely. hero_proxy handles all external connectivity. The user-facing result is the same: auto-detection, no manual config, NAT-friendly via Mycelium. ### Not implemented (minor) - Step-by-step connectivity logging (Public IP detected / Mycelium active / Local fallback) — current code warns on failure but no verbose discovery log - Dual-address mode (Public IP primary + Mycelium secondary) — uses one or the other, not both These are cosmetic. The core requirement (auto-detect, zero manual config, multi-node over Mycelium) is working and tested live. Closing.

mahmoud closed this issue

2026-04-11 15:02:08 +00:00