Split hero_osis into per-domain services (one binary, one process, one socket per domain) #22

Closed
opened 2026-04-13 12:28:11 +00:00 by timur · 1 comment
Owner

Problem

Currently hero_osis_server registers all 17 domains through a single unified rpc.sock and a single hero_proc service. This works but treats hero_osis as a monolith — domains can't be restarted, scaled, or feature-gated independently, and they all die together on a panic.

Proposal

Match the recipe_server pattern: one binary, one process, one socket per domain.

$HERO_SOCKET_DIR/hero_osis_identity/rpc.sock
$HERO_SOCKET_DIR/hero_osis_flow/rpc.sock
$HERO_SOCKET_DIR/hero_osis_ai/rpc.sock
... 17 total

Each domain becomes a first-class hero_proc service with its own --start / --stop / foreground lifecycle.

Implementation

Per-domain binaries

  • 17 thin binaries in crates/hero_osis_server/src/bin/hero_osis_<domain>.rs
  • Each ~25 lines: OServer::run_cli(lifecycle, |s, ctxs, ...| s.register::<OsisDomain>(...).await; s.run().await)
  • Each [[bin]] declares required-features = ["<domain>"]
  • SERVICE_NAME = hero_osis_<domain> → matches hero_sockets convention

Cross-domain wiring (AI → Flow)

  • AI binary enables both ai and flow cargo features
  • AI registers only the ai domain on its socket
  • AI creates a private read-only Arc<OsisFlow> pointing at the same ~/hero/var/osisdb/{ctx}/flow/ storage path for in-process workflow lookup (read-only is safe across processes since OSIS storage is on disk and AI never writes flow records)
  • The public flow.workflow.* write path remains exclusive to the Flow service

Embedder special

  • hero_osis_embedder binary keeps the ONNX global init at startup
  • Single process owns the embedder state (no concurrent ONNX init across procs)

Cleanup

  • Delete the monolithic src/main.rs (or keep behind a --monolith flag for local dev)
  • Update E2E harness to spawn all 17 services and verify per-domain sockets

Out of scope

  • Splitting hero_osis_server lib into 17 lib crates (would shrink binaries but is huge churn)
  • hero_proc registration scripting (will write a register-all.sh helper, but per-service --start is the intended UX)

Refs

  • Hero_rpc OServer + recipe_server pattern: lhumina_code/hero_rpc#13
  • Downstream: hero_os#32 (hero_os_http needs to learn the new socket layout — once this lands)
  • Aligns with hero_sockets skill: $HERO_SOCKET_DIR/<service_name>/rpc.sock
## Problem Currently `hero_osis_server` registers all 17 domains through a single unified `rpc.sock` and a single hero_proc service. This works but treats hero_osis as a monolith — domains can't be restarted, scaled, or feature-gated independently, and they all die together on a panic. ## Proposal Match the recipe_server pattern: **one binary, one process, one socket per domain**. ``` $HERO_SOCKET_DIR/hero_osis_identity/rpc.sock $HERO_SOCKET_DIR/hero_osis_flow/rpc.sock $HERO_SOCKET_DIR/hero_osis_ai/rpc.sock ... 17 total ``` Each domain becomes a first-class hero_proc service with its own `--start` / `--stop` / foreground lifecycle. ## Implementation ### Per-domain binaries - 17 thin binaries in `crates/hero_osis_server/src/bin/hero_osis_<domain>.rs` - Each ~25 lines: `OServer::run_cli(lifecycle, |s, ctxs, ...| s.register::<OsisDomain>(...).await; s.run().await)` - Each `[[bin]]` declares `required-features = ["<domain>"]` - SERVICE_NAME = `hero_osis_<domain>` → matches hero_sockets convention ### Cross-domain wiring (AI → Flow) - AI binary enables both `ai` and `flow` cargo features - AI registers only the `ai` domain on its socket - AI creates a private read-only `Arc<OsisFlow>` pointing at the same `~/hero/var/osisdb/{ctx}/flow/` storage path for in-process workflow lookup (read-only is safe across processes since OSIS storage is on disk and AI never writes flow records) - The public `flow.workflow.*` write path remains exclusive to the Flow service ### Embedder special - `hero_osis_embedder` binary keeps the ONNX global init at startup - Single process owns the embedder state (no concurrent ONNX init across procs) ### Cleanup - Delete the monolithic `src/main.rs` (or keep behind a `--monolith` flag for local dev) - Update E2E harness to spawn all 17 services and verify per-domain sockets ## Out of scope - Splitting `hero_osis_server` lib into 17 lib crates (would shrink binaries but is huge churn) - hero_proc registration scripting (will write a `register-all.sh` helper, but per-service `--start` is the intended UX) ## Refs - Hero_rpc OServer + recipe_server pattern: https://forge.ourworld.tf/lhumina_code/hero_rpc/issues/13 - Downstream: hero_os#32 (hero_os_http needs to learn the new socket layout — once this lands) - Aligns with hero_sockets skill: `$HERO_SOCKET_DIR/<service_name>/rpc.sock`
timur closed this issue 2026-04-13 13:19:46 +00:00
Author
Owner

Implemented and merged on development (commit 3b52af0)

What landed

Generator (hero_rpc commit 8681589):

  • New OschemaBuildConfig::bins(prefix, git_url) builder option that auto-generates src/bin/<prefix>_<domain>.rs for each configured domain.
  • skip_bin(domain) to opt out per-domain (used by hero_osis for embedder + ai).
  • Each generated bin is a thin OServer::run_cli wrapper that registers exactly one domain.

hero_osis:

  • 17 binaries: hero_osis_<domain> for each of ai, base, business, calendar, code, communication, embedder, files, finance, flow, identity, job, ledger, media, network, projects, settings.
  • 15 generated by build.rs; 2 hand-written:
    • hero_osis_embedder — ONNX Runtime global init before register.
    • hero_osis_ai — placeholder for cross-process Flow wiring.
  • Sockets: $HERO_SOCKET_DIR/hero_osis_<domain>/rpc.sock, matching hero_sockets convention.
  • Cross-service isolation enforced: methods for foreign domains return -32601 Method not found.
  • Monolithic src/main.rs deleted.

E2E harness (tests/e2e/run.sh):

  • Spawns all 17 services in parallel into an isolated $HERO_SOCKET_DIR.
  • Each service seeds only its own domain from the shared data/seed/ tree (foreign-domain TOML auto-skipped).
  • Per-service /health and domain.list validation.
  • Cross-service routing assertions: user.list works on identity socket, returns -32601 on flow socket.
  • Per-domain CRUD round-trip: list → exists → get against the domain's own socket.

Result: 38/38 tests pass.

  • 16 services launched (embedder excluded by default — needs ONNX runtime; opt in via E2E_INCLUDE_EMBEDDER=1).
  • 16 sockets ready, 16 /health endpoints OK.
  • 123 records seeded across all services with zero seed errors.
  • 44 seeded types round-trip cleanly.

Not yet done (follow-ups)

  1. AI → Flow cross-process wiringBotService.execute() currently returns BotQueryError("Flow domain not available"). Requires a hero_rpc_client from the AI service to $HERO_SOCKET_DIR/hero_osis_flow/rpc.sock. TODO comment in src/bin/hero_osis_ai.rs. Will track separately.

  2. hero_proc registration scripting--start per-service works. A register-all.sh or [[hero_proc.services]] manifest would be nice but not required.

  3. hero_os#32hero_os_http proxy needs to learn the new per-domain socket layout. Was waiting on this work; can proceed now.

## Implemented and merged on `development` (commit `3b52af0`) ### What landed **Generator (hero_rpc commit `8681589`):** - New `OschemaBuildConfig::bins(prefix, git_url)` builder option that auto-generates `src/bin/<prefix>_<domain>.rs` for each configured domain. - `skip_bin(domain)` to opt out per-domain (used by hero_osis for embedder + ai). - Each generated bin is a thin OServer::run_cli wrapper that registers exactly one domain. **hero_osis:** - 17 binaries: `hero_osis_<domain>` for each of ai, base, business, calendar, code, communication, embedder, files, finance, flow, identity, job, ledger, media, network, projects, settings. - 15 generated by build.rs; 2 hand-written: - `hero_osis_embedder` — ONNX Runtime global init before register. - `hero_osis_ai` — placeholder for cross-process Flow wiring. - Sockets: `$HERO_SOCKET_DIR/hero_osis_<domain>/rpc.sock`, matching hero_sockets convention. - Cross-service isolation enforced: methods for foreign domains return `-32601 Method not found`. - Monolithic `src/main.rs` deleted. **E2E harness (`tests/e2e/run.sh`):** - Spawns all 17 services in parallel into an isolated `$HERO_SOCKET_DIR`. - Each service seeds only its own domain from the shared `data/seed/` tree (foreign-domain TOML auto-skipped). - Per-service `/health` and `domain.list` validation. - Cross-service routing assertions: `user.list` works on `identity` socket, returns `-32601` on `flow` socket. - Per-domain CRUD round-trip: `list → exists → get` against the domain's own socket. **Result: 38/38 tests pass.** - 16 services launched (embedder excluded by default — needs ONNX runtime; opt in via `E2E_INCLUDE_EMBEDDER=1`). - 16 sockets ready, 16 `/health` endpoints OK. - 123 records seeded across all services with zero seed errors. - 44 seeded types round-trip cleanly. ### Not yet done (follow-ups) 1. **AI → Flow cross-process wiring** — `BotService.execute()` currently returns `BotQueryError("Flow domain not available")`. Requires a `hero_rpc_client` from the AI service to `$HERO_SOCKET_DIR/hero_osis_flow/rpc.sock`. TODO comment in `src/bin/hero_osis_ai.rs`. Will track separately. 2. **hero_proc registration scripting** — `--start` per-service works. A `register-all.sh` or `[[hero_proc.services]]` manifest would be nice but not required. 3. **hero_os#32** — `hero_os_http` proxy needs to learn the new per-domain socket layout. Was waiting on this work; can proceed now.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_osis#22
No description provided.