hero_logger — centralized log aggregation service for the hero ecosystem #105

Closed
opened 2026-03-31 11:16:24 +00:00 by mahmoud · 2 comments
Owner

Problem

Currently, each hero service manages its own logs independently:

  • hero_compute stores deployment logs in-memory on the VM struct (lost on restart, 500 line cap)
  • hero_proc has its own log ring buffer (per-service, not queryable across services)
  • No cross-service log aggregation
  • No real-time log streaming to browser
  • No log persistence across restarts
  • No unified way for any hero service to write and retrieve logs

Inspiration

ThreeFold's logagg solves the write path — VMs stream logs via WebSocket to a file backend. But it is write-only, file-only, and not designed for the hero ecosystem.

hero_logger extends this concept with:

  • Read path (HTTP + JSON-RPC + WebSocket streaming)
  • Redis backend (standard in hero ecosystem)
  • Structured log entries (timestamp + level + msg)
  • Hero service pattern (proc lifecycle, Unix socket, OpenRPC spec, auto-generated SDK)

Architecture

Write Path

The issue is the escaped parentheses in Mermaid — it doesn't need escaping inside edge labels. Here's the fixed version:
markdown## Architecture

Write Path

flowchart TD
    subgraph producers["Producers"]
        A["hero_compute\ndeploy logs"]
        B["VM syslog\ntailstream agent"]
        C["hero_zinit\nservice logs"]
        D["any hero service\nvia SDK"]
    end

    subgraph server["hero_logger_server"]
        E["ingest handler\nWebSocket"]
        F["RPC handler\nJSON-RPC"]
    end

    subgraph redis["Redis"]
        G["logs-id\nList · history"]
        H["log_sources\nSet · index"]
        I["log_stream-id\nPub/Sub · realtime"]
    end

    A -->|RPC log| F
    B -->|WebSocket ingest| E
    C -->|RPC log| F
    D -->|SDK| F

    E -->|RPUSH| G
    E -->|SADD| H
    E -->|PUBLISH| I
    F -->|RPUSH| G
    F -->|SADD| H
    F -->|PUBLISH| I

Read Path

flowchart TD
    subgraph redis["Redis"]
        G["logs:{id}\nList"]
        I["log_stream:{id}\nPub/Sub"]
    end

    subgraph server["hero_logger_server · read endpoints"]
        J["GET /logs/{id}\nHTTP JSON history"]
        K["ws://logger/stream/{id}\nWebSocket realtime"]
        L["RPC load\(id, limit\)\nPaginated query"]
    end

    subgraph consumers["Consumers"]
        M[Browser UI\nJSON history]
        N[Deploy modal\nrealtime stream]
        O[VM console\nrealtime stream]
        P[CLI / SDK\npaginated query]
    end

    G -->|LRANGE| J
    G -->|LRANGE on connect| K
    I -->|SUBSCRIBE fan-out| K
    G -->|LRANGE| L

    J --> M
    K --> N
    K --> O
    L --> P

Redis Data Model

erDiagram
    LOGS_ID {
        string key "logs:{source_id}"
        string type "List"
        string entry "JSON: ts + level + msg"
        string policy "RPUSH append · LTRIM keep 10k · EXPIRE 24h"
    }

    LOG_SOURCES {
        string key "log_sources"
        string type "Set"
        string value "all active source IDs"
        string ops "SADD on write · SREM on clear"
    }

    LOG_STREAM_ID {
        string key "log_stream:{source_id}"
        string type "Pub/Sub channel"
        string value "same JSON entry on every write"
        string ops "PUBLISH on write · SUBSCRIBE for realtime"
    }

Hints

  1. Should be a new standalone repo, e.g., hero_logger on lhumina_code org
  2. After testing and making sure the system is fully functional, we should extend the VMs to stream their own syslog to hero_logger. -> single source of truth.
  3. HeroCompute should use HeroLogger to replace the in-memory DeploymentLogStore [Later]
  4. Connect to hero_redis (the hero ecosystem Redis service)

Definition of Done

  • hero_logger_server compiles and passes cargo check + clippy
  • log() RPC stores entry in Redis
  • load() RPC retrieves paginated history
  • list_sources() returns active source IDs
  • clear() deletes all logs for a source
  • GET /logs/{id} returns JSON history
  • WebSocket /stream/{id} streams real-time
  • WebSocket /ingest/{id} accepts log writes
  • Zinit lifecycle (start/stop/serve/status)
  • hero_logger_sdk auto-generated from schema
  • hero_compute_server integrated via SDK
  • UI switches from polling to WebSocket
  • All env vars documented in README
  • docs/architecture.md updated
  • Tested end-to-end on kristof6
### Problem Currently, each hero service manages its own logs independently: - hero_compute stores deployment logs in-memory on the VM struct (lost on restart, 500 line cap) - hero_proc has its own log ring buffer (per-service, not queryable across services) - No cross-service log aggregation - No real-time log streaming to browser - No log persistence across restarts - No unified way for any hero service to write and retrieve logs ### Inspiration ThreeFold's [logagg](github.com/threefoldtech/logagg) solves the write path — VMs stream logs via WebSocket to a file backend. But it is write-only, file-only, and not designed for the hero ecosystem. hero_logger extends this concept with: - Read path (HTTP + JSON-RPC + WebSocket streaming) - Redis backend (standard in hero ecosystem) - Structured log entries (timestamp + level + msg) - Hero service pattern (proc lifecycle, Unix socket, OpenRPC spec, auto-generated SDK) ### Architecture #### Write Path The issue is the escaped parentheses \(\) in Mermaid — it doesn't need escaping inside edge labels. Here's the fixed version: markdown## Architecture ### Write Path ```mermaid flowchart TD subgraph producers["Producers"] A["hero_compute\ndeploy logs"] B["VM syslog\ntailstream agent"] C["hero_zinit\nservice logs"] D["any hero service\nvia SDK"] end subgraph server["hero_logger_server"] E["ingest handler\nWebSocket"] F["RPC handler\nJSON-RPC"] end subgraph redis["Redis"] G["logs-id\nList · history"] H["log_sources\nSet · index"] I["log_stream-id\nPub/Sub · realtime"] end A -->|RPC log| F B -->|WebSocket ingest| E C -->|RPC log| F D -->|SDK| F E -->|RPUSH| G E -->|SADD| H E -->|PUBLISH| I F -->|RPUSH| G F -->|SADD| H F -->|PUBLISH| I ``` #### Read Path ```mermaid flowchart TD subgraph redis["Redis"] G["logs:{id}\nList"] I["log_stream:{id}\nPub/Sub"] end subgraph server["hero_logger_server · read endpoints"] J["GET /logs/{id}\nHTTP JSON history"] K["ws://logger/stream/{id}\nWebSocket realtime"] L["RPC load\(id, limit\)\nPaginated query"] end subgraph consumers["Consumers"] M[Browser UI\nJSON history] N[Deploy modal\nrealtime stream] O[VM console\nrealtime stream] P[CLI / SDK\npaginated query] end G -->|LRANGE| J G -->|LRANGE on connect| K I -->|SUBSCRIBE fan-out| K G -->|LRANGE| L J --> M K --> N K --> O L --> P ``` #### Redis Data Model ```mermaid erDiagram LOGS_ID { string key "logs:{source_id}" string type "List" string entry "JSON: ts + level + msg" string policy "RPUSH append · LTRIM keep 10k · EXPIRE 24h" } LOG_SOURCES { string key "log_sources" string type "Set" string value "all active source IDs" string ops "SADD on write · SREM on clear" } LOG_STREAM_ID { string key "log_stream:{source_id}" string type "Pub/Sub channel" string value "same JSON entry on every write" string ops "PUBLISH on write · SUBSCRIBE for realtime" } ``` ### Hints 1. Should be a new standalone repo, e.g., hero_logger on lhumina_code org 2. After testing and making sure the system is fully functional, we should extend the VMs to stream their own syslog to hero_logger. -> `single source of truth.` 3. HeroCompute should use HeroLogger to replace the in-memory DeploymentLogStore [Later] 4. Connect to hero_redis (the hero ecosystem Redis service) #### Definition of Done - [ ] hero_logger_server compiles and passes cargo check + clippy - [ ] log() RPC stores entry in Redis - [ ] load() RPC retrieves paginated history - [ ] list_sources() returns active source IDs - [ ] clear() deletes all logs for a source - [ ] GET /logs/{id} returns JSON history - [ ] WebSocket /stream/{id} streams real-time - [ ] WebSocket /ingest/{id} accepts log writes - [ ] Zinit lifecycle (start/stop/serve/status) - [ ] hero_logger_sdk auto-generated from schema - [ ] hero_compute_server integrated via SDK - [ ] UI switches from polling to WebSocket - [ ] All env vars documented in README - [ ] docs/architecture.md updated - [ ] Tested end-to-end on kristof6
Member

WIP:

  • Implemented one server that handles all 3 communication paths:

  • RPC endpoint for log, load, list_sources, clear

  • HTTP endpoint for history retrieval

  • WebSocket ingest + stream endpoints

  • splited into 3 handlers rpc/http/ws

  • store layer with trait + Redis implementation

  • Implemented Redis data model and write/read flow

  • Added shared pub/sub subscription manager so multiple WS clients can fan out from a single Redis subscription per source.

WIP: - Implemented one server that handles all 3 communication paths: - RPC endpoint for log, load, list_sources, clear - HTTP endpoint for history retrieval - WebSocket ingest + stream endpoints - splited into 3 handlers rpc/http/ws - store layer with trait + Redis implementation - Implemented Redis data model and write/read flow - Added shared pub/sub subscription manager so multiple WS clients can fan out from a single Redis subscription per source.
Owner

no logs all need to be in hero_proc

no logs all need to be in hero_proc
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/home#105
No description provided.