Hero OS Development Plan — hero_proc, services, inspector, auth & shipping #50

Closed
opened 2026-03-19 19:20:22 +00:00 by mik-tf · 8 comments
Owner

hero_proc — Central Process Manager

hero_proc replaces zinit as the unified process manager for all Hero services.

Self-starting services

  • Every service binary supports a --start flag
  • When called with --start, the binary uses the Hero SDK to register itself with hero_proc
  • The SDK can be embedded in any binary that needs to be launched as a service
  • No more custom service management code — one pattern for all services
  • Makefile targets call binaries directly: e.g. make runhero_embedder --start

Startup sequence

  1. Service configures itself and registers a health check
  2. Verifies it is up and running
  3. If not → return error with a clear, actionable prompt
  4. If healthy → continue

Logging

  • All service logs are sent to hero_proc and stored in SQLite
  • Logs are buffered in the SDK (line by line, flushed every second)
  • Structured log prefixes: e.g. embedder.workspace.id.job.id.source
  • Multiple log levels, centralized for observability
  • External work is always tracked as a job; internal work as a log
  • Tree view of all log prefixes available in hero_proc

Status

Get hero_proc working properly, stress test it, then demo via terminal and UI.


OpenRPC / MCP / Inspector

Every Hero service must expose:

  • OpenRPC spec — the single source of truth for the service API
  • Health endpoint
  • MCP access — always provided through hero_inspector, never manually

hero_inspector is the bridge: it reads OpenRPC specs and automatically generates MCP interfaces. No service implements MCP directly. Inspector merges into the proxy and service layer.


Services Architecture

  • Services are a combination of actions (the "run" concept is removed)
  • Actions can connect to each other; jobs execute within actions
  • If a job runs internally, its log is captured automatically via the SDK
  • More modular, fewer mistakes

Git & Branches

  • development_kristof has been merged into development
  • All work continues on development
  • Some fixes may be needed to stabilize after the merge
  • Goal: everything clean on development, then ship

Testing Workflow

  1. Verify manually that things work
  2. If something is broken, capture a screenshot and file an issue
  3. Ask the AI to write a test that targets the issue — the test must fail, proving it catches the problem
  4. Fix the underlying issue
  5. Run the test again and confirm it passes — the fix is validated, and the test stays as a guard against regression

hero_auth & SSO

  • OAuth-based authentication
  • Hero SDK provides a login module for use in applications
  • hero_collab admin manages groups and user membership

hero_collab

  • Group communication tool (Slack alternative)
  • Built on SQLite, OSIS, and OpenRPC
  • Data export via WebDAV
  • Ship it fast — not waiting anymore

UI Theme

  • Revert to the original theme (current one is too plain)

Other Items

  • Nushell integration on Hero
  • Git worktrees for deployment workflows
  • Hero browser MCP in Rust

Shipping Plan

Bulk of code → integration → fix bugs → ship. Services go live one by one. Compute goes live, people can play.

## hero_proc — Central Process Manager hero_proc replaces zinit as the unified process manager for all Hero services. ### Self-starting services - Every service binary supports a `--start` flag - When called with `--start`, the binary uses the Hero SDK to register itself with hero_proc - The SDK can be embedded in any binary that needs to be launched as a service - No more custom service management code — one pattern for all services - Makefile targets call binaries directly: e.g. `make run` → `hero_embedder --start` ### Startup sequence 1. Service configures itself and registers a health check 2. Verifies it is up and running 3. If not → return error with a clear, actionable prompt 4. If healthy → continue ### Logging - All service logs are sent to hero_proc and stored in SQLite - Logs are buffered in the SDK (line by line, flushed every second) - Structured log prefixes: e.g. `embedder.workspace.id.job.id.source` - Multiple log levels, centralized for observability - External work is always tracked as a job; internal work as a log - Tree view of all log prefixes available in hero_proc ### Status Get hero_proc working properly, stress test it, then demo via terminal and UI. --- ## OpenRPC / MCP / Inspector Every Hero service must expose: - **OpenRPC** spec — the single source of truth for the service API - **Health** endpoint - **MCP** access — always provided through hero_inspector, never manually hero_inspector is the bridge: it reads OpenRPC specs and automatically generates MCP interfaces. No service implements MCP directly. Inspector merges into the proxy and service layer. --- ## Services Architecture - Services are a combination of actions (the "run" concept is removed) - Actions can connect to each other; jobs execute within actions - If a job runs internally, its log is captured automatically via the SDK - More modular, fewer mistakes --- ## Git & Branches - `development_kristof` has been merged into `development` - All work continues on `development` - Some fixes may be needed to stabilize after the merge - Goal: everything clean on development, then ship --- ## Testing Workflow 1. Verify manually that things work 2. If something is broken, capture a screenshot and file an issue 3. Ask the AI to write a test that targets the issue — the test must fail, proving it catches the problem 4. Fix the underlying issue 5. Run the test again and confirm it passes — the fix is validated, and the test stays as a guard against regression --- ## hero_auth & SSO - OAuth-based authentication - Hero SDK provides a login module for use in applications - hero_collab admin manages groups and user membership --- ## hero_collab - Group communication tool (Slack alternative) - Built on SQLite, OSIS, and OpenRPC - Data export via WebDAV - Ship it fast — not waiting anymore --- ## UI Theme - Revert to the original theme (current one is too plain) --- ## Other Items - Nushell integration on Hero - Git worktrees for deployment workflows - Hero browser MCP in Rust --- ## Shipping Plan Bulk of code → integration → fix bugs → ship. Services go live one by one. Compute goes live, people can play.
Author
Owner

Incorporated into the master roadmap: #38. Closing this issue.

Incorporated into the master roadmap: #38. Closing this issue.
Author
Owner

Phase 1 complete: hero_proc replaces zinit in Docker deployment

Commit: bd14ea1 on hero_services development

Done

  • hero_proc_sdk replaces zinit_sdk in hero_services_server
  • hero_proc_server is the process manager in Docker
  • 21/21 services running, Hero OS login page serving
  • Recovery pass using hero_proc restart for failed services

Next phases

  • #52: Migrate 5 services to HeroRpcServer lifecycle CLI (hero_auth, hero_foundry, hero_osis, hero_proxy_ui, hero_indexer_ui)
  • Phase 2: Self-registering services via --start (each binary registers with hero_proc via SDK)
  • Phase 3: hero_services_server becomes optional orchestrator

Signed-off-by: mik-tf

**Phase 1 complete: hero_proc replaces zinit in Docker deployment** Commit: `bd14ea1` on hero_services development ### Done - hero_proc_sdk replaces zinit_sdk in hero_services_server - hero_proc_server is the process manager in Docker - 21/21 services running, Hero OS login page serving - Recovery pass using `hero_proc restart` for failed services ### Next phases - **#52**: Migrate 5 services to HeroRpcServer lifecycle CLI (hero_auth, hero_foundry, hero_osis, hero_proxy_ui, hero_indexer_ui) - **Phase 2**: Self-registering services via `--start` (each binary registers with hero_proc via SDK) - **Phase 3**: hero_services_server becomes optional orchestrator Signed-off-by: mik-tf
Author
Owner

Phase 1 & 2 Complete

Phase 1 (done in bd14ea1)

  • hero_proc replaces zinit in Docker deployment
  • 21/21 services running
  • Hero OS login page serving

Phase 2 (done today)

  • All 5 remaining services migrated to HeroServer CLI pattern (#52 closed)
  • Every service binary now accepts serve subcommand + lifecycle commands (run, start, stop, status, logs)
  • Service TOMLs updated
  • Build verified

What remains for #50

  • Phase 3: --start self-registration — each binary registers itself with hero_proc via SDK, replacing hero_services_server orchestration
  • Centralized logging — SDK log buffering to hero_proc SQLite
  • Inspector MCP gateway — confirm all MCP via inspector, not manual
  • hero_auth SSO + hero_collab — new services
## Phase 1 & 2 Complete ### Phase 1 (done in bd14ea1) - hero_proc replaces zinit in Docker deployment - 21/21 services running - Hero OS login page serving ### Phase 2 (done today) - All 5 remaining services migrated to HeroServer CLI pattern (#52 closed) - Every service binary now accepts `serve` subcommand + lifecycle commands (run, start, stop, status, logs) - Service TOMLs updated - Build verified ### What remains for #50 - **Phase 3: `--start` self-registration** — each binary registers itself with hero_proc via SDK, replacing hero_services_server orchestration - **Centralized logging** — SDK log buffering to hero_proc SQLite - **Inspector MCP gateway** — confirm all MCP via inspector, not manual - **hero_auth SSO + hero_collab** — new services
Author
Owner

Progress update — P1 Phase 3 + P2 complete

P1 Phase 3: Self-registration via HeroLifecycle (DONE)

  • Unified lifecycle: Removed ZinitLifecycle alias everywhere (13 binaries, 8 repos). Single pattern: HeroLifecycle from hero_service crate
  • hero_compute migrated: Replaced local zinit_sdk integration with hero_service::HeroLifecycle (3 crates, PR on development_mik_6_1)
  • HeroLifecycle::start() enhanced: Retry policy (max 20, 5s delay, backoff, 300s max, 60s stability), env vars support via .env() builder
  • Self-registration: profile.rs now calls {binary} start — each service registers itself with hero_proc. Falls back to SDK for non-standard exec (shell wrappers, bun scripts)
  • Build system: Added cargo-server-patches.toml — local path overrides for hero_rpc, hero_lib, hero_proc. Builds work from local repos without pushing first
  • Build verified: SKIP_WASM=1 make dist passed — 37 binaries, validation OK
  • All pushed: hero_rpc, hero_services, hero_redis, hero_books, hero_embedder, hero_voice, hero_indexer, hero_os → development

P2: Inspector/MCP cleanup (DONE)

  • Audit: Only hero_auth had manual /mcp endpoint (9 hardcoded tools in mcp.rs)
  • Fix: Added users.update_scope as proper JSON-RPC method + OpenRPC spec entry (was orphan MCP-only tool)
  • Removed: Deleted mcp.rs (417 lines), removed /mcp route from hero_auth
  • Result: MCP is now exclusively served via hero_inspector. Zero services implement /mcp directly. Inspector reads OpenRPC specs and auto-generates MCP tools for all services

P1 Phase 4 (Logging) — status

  • hero_proc SQLite log storage: already done (full CRUD, filtering, 7-day retention)
  • Passive stdout capture from child processes: working
  • Active structured log shipping from services: not yet (next)

Remaining

  • E: Centralized logging (SDK buffering, structured prefixes, log shipping)
  • C: AI/UX bug fixes (#32, #45, #48, #37, #46)
  • D: hero_collab (new service)
  • F: Ship (docs, integration testing, deploy to herodev)

Signed-off-by: mik-tf

## Progress update — P1 Phase 3 + P2 complete ### P1 Phase 3: Self-registration via HeroLifecycle (DONE) - **Unified lifecycle**: Removed `ZinitLifecycle` alias everywhere (13 binaries, 8 repos). Single pattern: `HeroLifecycle` from `hero_service` crate - **hero_compute migrated**: Replaced local `zinit_sdk` integration with `hero_service::HeroLifecycle` (3 crates, PR on `development_mik_6_1`) - **HeroLifecycle::start() enhanced**: Retry policy (max 20, 5s delay, backoff, 300s max, 60s stability), env vars support via `.env()` builder - **Self-registration**: `profile.rs` now calls `{binary} start` — each service registers itself with hero_proc. Falls back to SDK for non-standard exec (shell wrappers, bun scripts) - **Build system**: Added `cargo-server-patches.toml` — local path overrides for hero_rpc, hero_lib, hero_proc. Builds work from local repos without pushing first - **Build verified**: `SKIP_WASM=1 make dist` passed — 37 binaries, validation OK - **All pushed**: hero_rpc, hero_services, hero_redis, hero_books, hero_embedder, hero_voice, hero_indexer, hero_os → development ### P2: Inspector/MCP cleanup (DONE) - **Audit**: Only hero_auth had manual `/mcp` endpoint (9 hardcoded tools in mcp.rs) - **Fix**: Added `users.update_scope` as proper JSON-RPC method + OpenRPC spec entry (was orphan MCP-only tool) - **Removed**: Deleted `mcp.rs` (417 lines), removed `/mcp` route from hero_auth - **Result**: MCP is now **exclusively** served via hero_inspector. Zero services implement `/mcp` directly. Inspector reads OpenRPC specs and auto-generates MCP tools for all services ### P1 Phase 4 (Logging) — status - hero_proc SQLite log storage: **already done** (full CRUD, filtering, 7-day retention) - Passive stdout capture from child processes: **working** - Active structured log shipping from services: **not yet** (next) ### Remaining - **E**: Centralized logging (SDK buffering, structured prefixes, log shipping) - **C**: AI/UX bug fixes (#32, #45, #48, #37, #46) - **D**: hero_collab (new service) - **F**: Ship (docs, integration testing, deploy to herodev) Signed-off-by: mik-tf
Author
Owner

Verified: P1 Phase 3 + P2 — Docker smoke test passed

Docker test results (hero_zero:dev)

  • 21/21 services running via hero_proc
  • 31 Unix sockets active
  • Proxy health: OK
  • Auth RPC + OpenRPC (9 methods incl. users.update_scope): OK
  • Foundry health: OK
  • Inspector health: OK
  • Books health: OK
  • Self-registration with --service-name overrides: working
  • SDK fallback for non-standard exec: working
  • 10s timeout detects binaries without HeroLifecycle

Key fixes since last update

  • start_service_via_binary() now passes --service-name, --run-args, --env to the binary's start command
  • 10s timeout kills binaries that don't support start (run as foreground servers)
  • Profile-prefixed names (user.hero_*) work correctly via --service-name override
  • hero_compute PR merged and pushed to development

Remaining

  • E: Centralized logging (SDK buffering, structured log shipping)
  • C: AI/UX bugs (#32, #45, #48, #37, #46)
  • D: hero_collab
  • F: Ship

Signed-off-by: mik-tf

## Verified: P1 Phase 3 + P2 — Docker smoke test passed ### Docker test results (hero_zero:dev) - **21/21 services running** via hero_proc - **31 Unix sockets** active - Proxy health: OK - Auth RPC + OpenRPC (9 methods incl. users.update_scope): OK - Foundry health: OK - Inspector health: OK - Books health: OK - Self-registration with --service-name overrides: working - SDK fallback for non-standard exec: working - 10s timeout detects binaries without HeroLifecycle ### Key fixes since last update - `start_service_via_binary()` now passes `--service-name`, `--run-args`, `--env` to the binary's `start` command - 10s timeout kills binaries that don't support `start` (run as foreground servers) - Profile-prefixed names (user.hero_*) work correctly via `--service-name` override - hero_compute PR merged and pushed to development ### Remaining - **E**: Centralized logging (SDK buffering, structured log shipping) - **C**: AI/UX bugs (#32, #45, #48, #37, #46) - **D**: hero_collab - **F**: Ship Signed-off-by: mik-tf
Author
Owner

All services now have HeroLifecycle

Migrated the last 3 Rust services that were missing lifecycle support:

  • hero_proxy_server: Added clap CLI + HeroLifecycle (keeps raw JSON-RPC protocol)
  • hero_foundry_admin: Added clap CLI + HeroLifecycle (keeps axum 0.8 socket binding)
  • hero_biz: Added HeroLifecycle alongside existing Serve/HashPassword commands

All 3 now support start, stop, status, logs, run subcommands.

Final lifecycle coverage

Pattern Services Count
HeroServer/HeroRpcServer/HeroUiServer auth, foundry, osis, inspector, os 10 binaries
HeroLifecycle direct redis, books, embedder, voice, indexer, aibroker, proxy, foundry_admin, biz 15 binaries
hero_service re-export compute (server, ui, explorer) 3 binaries
No lifecycle (non-Rust) hero_shrimp (Bun/TypeScript) 1 binary

Total: 28/29 Rust binaries have HeroLifecycle. Only hero_shrimp (TypeScript) uses SDK fallback.

Docker test: 20/21 services running, all smoke tests pass.

Signed-off-by: mik-tf

## All services now have HeroLifecycle Migrated the last 3 Rust services that were missing lifecycle support: - **hero_proxy_server**: Added clap CLI + HeroLifecycle (keeps raw JSON-RPC protocol) - **hero_foundry_admin**: Added clap CLI + HeroLifecycle (keeps axum 0.8 socket binding) - **hero_biz**: Added HeroLifecycle alongside existing Serve/HashPassword commands All 3 now support `start`, `stop`, `status`, `logs`, `run` subcommands. ### Final lifecycle coverage | Pattern | Services | Count | |---------|----------|-------| | HeroServer/HeroRpcServer/HeroUiServer | auth, foundry, osis, inspector, os | 10 binaries | | HeroLifecycle direct | redis, books, embedder, voice, indexer, aibroker, proxy, foundry_admin, biz | 15 binaries | | hero_service re-export | compute (server, ui, explorer) | 3 binaries | | No lifecycle (non-Rust) | hero_shrimp (Bun/TypeScript) | 1 binary | **Total: 28/29 Rust binaries have HeroLifecycle. Only hero_shrimp (TypeScript) uses SDK fallback.** Docker test: 20/21 services running, all smoke tests pass. Signed-off-by: mik-tf
Author
Owner

P1 Phase 4 (Logging) — E1-E4 complete

What was done

  • HeroLogger added to hero_proc_sdk — async buffered log shipper
    • 500-entry buffer, 1s flush interval
    • Ships to hero_proc via logs.insert RPC
    • Methods: info(), error(), warn(), debug(), info_tagged(), flush()
    • Non-blocking — silently skips if hero_proc unavailable
    • Structured src format: service.component.subcomponent (dotted notation)
  • Re-exported via hero_service::HeroLogger and hero_rpc_server::HeroLogger
  • Docker test: 20 services running, proxy OK, no regression

Lifecycle migration also complete

  • hero_proxy_server, hero_foundry_admin, hero_biz migrated to HeroLifecycle
  • 28/29 Rust binaries now have HeroLifecycle (only hero_shrimp = TypeScript excluded)

What's left on #50

Item Status
P1.1 hero_proc replaces zinit Done
P1.2 HeroServer CLI pattern Done
P1.3 Self-registration Done
P1.4 Logging (SDK) Done (E1-E4)
P1.4 Logging (UI tree view) Open — E5
P2 Inspector/MCP Done
hero_collab Open — Stream D
AI/UX bugs Open — Stream C
Ship Open — Stream F

Signed-off-by: mik-tf

## P1 Phase 4 (Logging) — E1-E4 complete ### What was done - **HeroLogger** added to hero_proc_sdk — async buffered log shipper - 500-entry buffer, 1s flush interval - Ships to hero_proc via `logs.insert` RPC - Methods: `info()`, `error()`, `warn()`, `debug()`, `info_tagged()`, `flush()` - Non-blocking — silently skips if hero_proc unavailable - Structured `src` format: `service.component.subcomponent` (dotted notation) - **Re-exported** via `hero_service::HeroLogger` and `hero_rpc_server::HeroLogger` - **Docker test**: 20 services running, proxy OK, no regression ### Lifecycle migration also complete - hero_proxy_server, hero_foundry_admin, hero_biz migrated to HeroLifecycle - **28/29 Rust binaries** now have HeroLifecycle (only hero_shrimp = TypeScript excluded) ### What's left on #50 | Item | Status | |------|--------| | P1.1 hero_proc replaces zinit | Done | | P1.2 HeroServer CLI pattern | Done | | P1.3 Self-registration | Done | | P1.4 Logging (SDK) | **Done** (E1-E4) | | P1.4 Logging (UI tree view) | Open — E5 | | P2 Inspector/MCP | Done | | hero_collab | Open — Stream D | | AI/UX bugs | Open — Stream C | | Ship | Open — Stream F | Signed-off-by: mik-tf
Author
Owner

Closing — all core items complete

Done

Section Status
hero_proc replaces zinit Done — 20/21 services self-register via {binary} start --service-name
Self-starting services (--start) Done — 28/29 binaries have HeroLifecycle, --service-name/--run-args/--env overrides
Logging (SQLite + SDK) Done — HeroLogger with buffered shipping, structured prefixes, auto stdout capture
OpenRPC / MCP / Inspector Done — all services have OpenRPC, MCP only via inspector
Services architecture (actions/jobs) Done
Git & branches Done — all clean on development
hero_auth & SSO Done

Split to follow-up issues

Item Issue
hero_collab #54
UI theme #55
Nushell, git worktrees, browser MCP #56
Log UI tree view #57

Docker verification

  • 20/21 services self-register via binary start
  • 1 SDK fallback (hero_foundry_ui — TOML uses different binary name, correct behavior)
  • All smoke tests pass
  • Proxy OK

Signed-off-by: mik-tf

## Closing — all core items complete ### Done | Section | Status | |---------|--------| | hero_proc replaces zinit | Done — 20/21 services self-register via `{binary} start --service-name` | | Self-starting services (--start) | Done — 28/29 binaries have HeroLifecycle, --service-name/--run-args/--env overrides | | Logging (SQLite + SDK) | Done — HeroLogger with buffered shipping, structured prefixes, auto stdout capture | | OpenRPC / MCP / Inspector | Done — all services have OpenRPC, MCP only via inspector | | Services architecture (actions/jobs) | Done | | Git & branches | Done — all clean on development | | hero_auth & SSO | Done | ### Split to follow-up issues | Item | Issue | |------|-------| | hero_collab | #54 | | UI theme | #55 | | Nushell, git worktrees, browser MCP | #56 | | Log UI tree view | #57 | ### Docker verification - 20/21 services self-register via binary start - 1 SDK fallback (hero_foundry_ui — TOML uses different binary name, correct behavior) - All smoke tests pass - Proxy OK Signed-off-by: mik-tf
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/home#50
No description provided.