auto-start hero_db when hero_proc starts #95

Open
opened 2026-05-06 17:07:27 +00:00 by mik-tf · 5 comments
Owner

Overview

When hero_proc starts, it should bring up hero_db automatically (it's the canonical persistent store; nearly every other service depends on it). Today hero_db requires its own separate service_db start step.

Why

Meeting 2026-05-06: "now hero / hero proc starts / automatically start the db".

Reduces one manual step in every install/restart cycle and removes a class of "service X failed to start because hero_db wasn't running" errors.

Acceptance

  • hero_proc, on its own start, ensures hero_db is running (registered service auto-start, idempotent)
  • If hero_db service isn't registered, hero_proc surfaces a clear error rather than silently continuing
  • Documented in docs_hero / hero_proc_cmd skill
  • META: hero_proc#86 — reliability
  • Skill: hero_proc_cmd, hero_proc_sdk

Source: meeting notes 2026-05-06.

## Overview When `hero_proc` starts, it should bring up `hero_db` automatically (it's the canonical persistent store; nearly every other service depends on it). Today `hero_db` requires its own separate `service_db start` step. ## Why Meeting 2026-05-06: "now hero / hero proc starts / automatically start the db". Reduces one manual step in every install/restart cycle and removes a class of "service X failed to start because hero_db wasn't running" errors. ## Acceptance - [ ] hero_proc, on its own start, ensures hero_db is running (registered service auto-start, idempotent) - [ ] If hero_db service isn't registered, hero_proc surfaces a clear error rather than silently continuing - [ ] Documented in `docs_hero` / `hero_proc_cmd` skill ## Related - META: [hero_proc#86](https://forge.ourworld.tf/lhumina_code/hero_proc/issues/86) — reliability - Skill: `hero_proc_cmd`, `hero_proc_sdk` Source: meeting notes 2026-05-06.
mik-tf added this to the ACTIVE project 2026-05-06 17:31:52 +00:00
Member

Implementation Spec for Issue #95

Objective

When hero_proc_server starts, it must ensure that the canonical persistent store service named hero_db is started as part of supervisor bring-up. This must be idempotent (re-running on a healthy system is a no-op), surface a clear error when hero_db is not registered, and be documented in the user-facing CLI skill.

Context: How hero_proc starts services today

  • Entry point: crates/hero_proc_server/src/server.rs::run(cfg, cancel) — builds HeroProcDb, starts Supervisor::run() as a tokio task, then the scheduler / scanner / web server.
  • Supervisor::run() on startup calls in order:
    1. recover_running_jobs() — reattach to PIDs that survived restart.
    2. autostart_services() — for every service whose status == ServiceWantedStatus::Start, topologically sort by requires / after, then create a Run + Pending Jobs.
    3. autostart_process_jobs() — re-queue dead is_process jobs.
  • A service is a row in the SQLite services table keyed by (context_name, name). There is no list of "well-known canonical services" hard-coded anywhere today.

The existing autostart_services() already starts hero_db automatically if a service named hero_db is registered with status: start. The real gap surfaced by this issue is: today nothing in hero_proc treats hero_db as required — if it's missing or set to stop, the server boots silently and the operator finds out later through downstream service failures.

Requirements

  • On hero_proc_server startup, after supervisor bring-up, hero_proc must ensure a service named hero_db (context core) is active.
  • The behaviour must be idempotent: starting twice must not produce duplicate jobs and must not error.
  • If the hero_db service is not registered in the DB, hero_proc must emit a clear, prominent error (tracing error! and a structured entry into the logging table via log_batcher), including remediation guidance.
  • If the hero_db service is registered but status != start, hero_proc must emit a clear warning and skip the auto-start (operator intent is preserved — documented behaviour, not silently overridden).
  • Provide an opt-out env var HERO_PROC_NO_AUTOSTART_DB=1 for special cases (test harnesses, isolated sandboxes, embedded launchers that manage hero_db externally).
  • Document the new behaviour in the hero_proc_cmd skill and the server crate's CLAUDE.md.

Files to Modify / Create

Path Change Why
crates/hero_proc_server/src/supervisor/auto_start_db.rs Create New module containing ensure_hero_db_autostart(&Supervisor), the missing-service error path, and unit tests. Keeps the new policy isolated from existing autostart code.
crates/hero_proc_server/src/supervisor/mod.rs Modify Add pub mod auto_start_db;. Insert a call to auto_start_db::ensure_hero_db_autostart(self).await in Supervisor::run() immediately after self.autostart_services().await. Widen autostart_one_service to pub(crate).
crates/hero_proc_integration_test/tests/hero_db_autostart.rs Create End-to-end test using the existing harness: registers a fake hero_db action+service, launches hero_proc_server, asserts the service has at least one job within ~3 s. Second variant: no-registration → assert error log entry.
~/.claude/skills/hero_proc_cmd/SKILL.md Modify Add a "hero_db auto-start" section documenting the new behaviour, the opt-out env var, and the error message users should expect when hero_db isn't registered.
crates/hero_proc_server/CLAUDE.md Modify (if present) One short paragraph noting that the supervisor explicitly ensures hero_db and where the policy lives.

Implementation Plan

Step 1 — Add the auto_start_db module skeleton

Files: crates/hero_proc_server/src/supervisor/auto_start_db.rs (create), crates/hero_proc_server/src/supervisor/mod.rs (modify).

  • Add pub mod auto_start_db; near the top of supervisor/mod.rs.
  • Define the public entry function pub(crate) async fn ensure_hero_db_autostart(sup: &Supervisor).
  • Honour HERO_PROC_NO_AUTOSTART_DB=1 (early return with info!).
  • Look up db.services.get("core", "hero_db") and branch on a private enum HeroDbState { NotRegistered, NotStartIntent(ServiceWantedStatus), AlreadyActive, NeedsStart(ServiceConfig) }.
    Dependencies: none.

Step 2 — Author the structured error message for missing hero_db

Files: crates/hero_proc_server/src/supervisor/auto_start_db.rs.

  • On NotRegistered: emit error! with remediation message and write the same to the logs DB via sup.log_batcher (source hero_proc.auto_start_db).
  • Do not return Err — degraded-mode warning, not fatal startup error.
    Dependencies: Step 1.

Step 3 — Refactor autostart_one_service so the new module can call into it

Files: crates/hero_proc_server/src/supervisor/mod.rs, crates/hero_proc_server/src/supervisor/auto_start_db.rs.

  • Widen Supervisor::autostart_one_service to pub(crate).
  • NeedsStart(cfg) arm calls sup.autostart_one_service(&cfg.context_name, &cfg).await.
  • Insert auto_start_db::ensure_hero_db_autostart(&self).await; in Supervisor::run() right after self.autostart_services().await;. Running after general autostart means if hero_db is one of many services with status: start it has already been queued — ensure_hero_db_autostart then detects AlreadyActive and is a no-op.
    Dependencies: Steps 1 and 2.

Step 4 — Unit tests for auto_start_db

Files: crates/hero_proc_server/src/supervisor/auto_start_db.rs (append #[cfg(test)] mod tests).

  • Extract a free function decide_action(db, env_opt_out) -> HeroDbState so tests don't need the full async run loop.
  • Test cases: empty DB → NotRegistered; status: stopNotStartIntent(Stop); existing non-terminal job → AlreadyActive; valid config + no jobs → NeedsStart(cfg); env var set → early return regardless of state. Use serial_test::serial for env-var test.
    Dependencies: Steps 1–3.

Step 5 — Integration test using the existing harness

Files: crates/hero_proc_integration_test/tests/hero_db_autostart.rs (create).

  • Use harness::TestHarness to spin up a fresh hero_proc_server against a temp socket + temp DB.
  • Pre-seed the DB via SDK calls (action_set for a no-op action, service_set with name: "hero_db", status: start, actions: ["main"]), restart, then assert via service_status that within 3 s hero_db has at least one Pending or Running job.
  • Second variant: do not pre-register hero_db, start the server, assert via log query that an error log entry exists with source hero_proc.auto_start_db.
    Dependencies: Steps 1–3. Can be authored in parallel with Step 4.

Step 6 — Documentation updates

Files: ~/.claude/skills/hero_proc_cmd/SKILL.md, crates/hero_proc_server/CLAUDE.md.

  • Add a "hero_db auto-start" subsection to the skill documenting the new behaviour, the opt-out env var, the error path, and where to find the log entry.
  • Append a short note to the server crate's CLAUDE.md (if present).
    Dependencies: independent of Steps 4 and 5.

Acceptance Criteria

  • hero_proc, on its own start, ensures hero_db is running (registered service auto-start, idempotent) — implemented in supervisor/auto_start_db.rs::ensure_hero_db_autostart, called from Supervisor::run() after autostart_services().
  • If hero_db service isn't registered, hero_proc surfaces a clear error rather than silently continuing — tracing error! plus a logging table entry with source hero_proc.auto_start_db.
  • Documented in the hero_proc_cmd skill (and noted for docs_hero cross-repo follow-up).
  • Idempotent under reruns (verified by Step 4 unit test 3 and Step 5 integration test).
  • Opt-out via HERO_PROC_NO_AUTOSTART_DB=1 (verified by Step 4 unit test 5).
  • No regression: existing autostart_services topological ordering still wins for any service that declares requires = ["hero_db"]. The new module is purely additive.

Notes / Design Decisions

  • Additive safety net rather than hard-coding a hero_db row at boot. Seeding a service row from the supervisor would mask configuration errors and would surprise operators who deliberately set status: stop. The agreed model is "DB is the source of truth"; we only validate, never auto-create.
  • Env var, not CLI flag. The codebase consistently prefers HERO_PROC_* env vars (e.g. HERO_PROC_SOCKET, HERO_PROC_EXAMPLES_DIR). Adding a new CLI flag would be inconsistent and would also need wiring through ServerConfig and launch_in_screen.
  • Direct call to autostart_one_service rather than going through the JSON-RPC service.start handler. The supervisor already exposes the in-process equivalent — we just widen its visibility. Avoids serialization overhead and keeps boot-time logic in-process.
  • Order of operations. ensure_hero_db_autostart runs after autostart_services, before autostart_process_jobs. By that point the DB is fully open and the log batcher is alive — both prerequisites of the function.
  • Async / fire-and-forget. We dispatch the run+jobs, then return. We do not wait for hero_db to become "ready" (the existing service_state evaluator does that asynchronously on a 5 s cadence and is consumed by downstream services that declare requires). Blocking on readiness here would deadlock the supervisor's main loop.
  • Edge case — non-core context. The spec scopes the lookup to ("core", "hero_db"). If a deployment uses a non-default context for the DB, the operator must put the service in core or this can be extended later with HERO_PROC_HERO_DB_CONTEXT. Out of scope for #95.
## Implementation Spec for Issue #95 ### Objective When `hero_proc_server` starts, it must ensure that the canonical persistent store service named `hero_db` is started as part of supervisor bring-up. This must be idempotent (re-running on a healthy system is a no-op), surface a clear error when `hero_db` is not registered, and be documented in the user-facing CLI skill. ### Context: How hero_proc starts services today - Entry point: `crates/hero_proc_server/src/server.rs::run(cfg, cancel)` — builds `HeroProcDb`, starts `Supervisor::run()` as a tokio task, then the scheduler / scanner / web server. - `Supervisor::run()` on startup calls in order: 1. `recover_running_jobs()` — reattach to PIDs that survived restart. 2. `autostart_services()` — for every service whose `status == ServiceWantedStatus::Start`, topologically sort by `requires` / `after`, then create a Run + Pending Jobs. 3. `autostart_process_jobs()` — re-queue dead `is_process` jobs. - A service is a row in the SQLite `services` table keyed by `(context_name, name)`. There is no list of "well-known canonical services" hard-coded anywhere today. The existing `autostart_services()` already starts `hero_db` automatically **if** a service named `hero_db` is registered with `status: start`. The real gap surfaced by this issue is: today nothing in hero_proc treats `hero_db` as **required** — if it's missing or set to `stop`, the server boots silently and the operator finds out later through downstream service failures. ### Requirements - On `hero_proc_server` startup, after supervisor bring-up, hero_proc must ensure a service named `hero_db` (context `core`) is active. - The behaviour must be idempotent: starting twice must not produce duplicate jobs and must not error. - If the `hero_db` service is **not registered** in the DB, hero_proc must emit a clear, prominent error (tracing `error!` and a structured entry into the `logging` table via `log_batcher`), including remediation guidance. - If the `hero_db` service is registered but `status != start`, hero_proc must emit a clear warning and skip the auto-start (operator intent is preserved — documented behaviour, not silently overridden). - Provide an opt-out env var `HERO_PROC_NO_AUTOSTART_DB=1` for special cases (test harnesses, isolated sandboxes, embedded launchers that manage hero_db externally). - Document the new behaviour in the `hero_proc_cmd` skill and the server crate's `CLAUDE.md`. ### Files to Modify / Create | Path | Change | Why | |---|---|---| | `crates/hero_proc_server/src/supervisor/auto_start_db.rs` | Create | New module containing `ensure_hero_db_autostart(&Supervisor)`, the missing-service error path, and unit tests. Keeps the new policy isolated from existing autostart code. | | `crates/hero_proc_server/src/supervisor/mod.rs` | Modify | Add `pub mod auto_start_db;`. Insert a call to `auto_start_db::ensure_hero_db_autostart(self).await` in `Supervisor::run()` immediately after `self.autostart_services().await`. Widen `autostart_one_service` to `pub(crate)`. | | `crates/hero_proc_integration_test/tests/hero_db_autostart.rs` | Create | End-to-end test using the existing harness: registers a fake `hero_db` action+service, launches `hero_proc_server`, asserts the service has at least one job within ~3 s. Second variant: no-registration → assert error log entry. | | `~/.claude/skills/hero_proc_cmd/SKILL.md` | Modify | Add a "hero_db auto-start" section documenting the new behaviour, the opt-out env var, and the error message users should expect when `hero_db` isn't registered. | | `crates/hero_proc_server/CLAUDE.md` | Modify (if present) | One short paragraph noting that the supervisor explicitly ensures `hero_db` and where the policy lives. | ### Implementation Plan #### Step 1 — Add the `auto_start_db` module skeleton Files: `crates/hero_proc_server/src/supervisor/auto_start_db.rs` (create), `crates/hero_proc_server/src/supervisor/mod.rs` (modify). - Add `pub mod auto_start_db;` near the top of `supervisor/mod.rs`. - Define the public entry function `pub(crate) async fn ensure_hero_db_autostart(sup: &Supervisor)`. - Honour `HERO_PROC_NO_AUTOSTART_DB=1` (early return with `info!`). - Look up `db.services.get("core", "hero_db")` and branch on a private enum `HeroDbState { NotRegistered, NotStartIntent(ServiceWantedStatus), AlreadyActive, NeedsStart(ServiceConfig) }`. Dependencies: none. #### Step 2 — Author the structured error message for missing hero_db Files: `crates/hero_proc_server/src/supervisor/auto_start_db.rs`. - On `NotRegistered`: emit `error!` with remediation message and write the same to the logs DB via `sup.log_batcher` (source `hero_proc.auto_start_db`). - Do not return `Err` — degraded-mode warning, not fatal startup error. Dependencies: Step 1. #### Step 3 — Refactor `autostart_one_service` so the new module can call into it Files: `crates/hero_proc_server/src/supervisor/mod.rs`, `crates/hero_proc_server/src/supervisor/auto_start_db.rs`. - Widen `Supervisor::autostart_one_service` to `pub(crate)`. - `NeedsStart(cfg)` arm calls `sup.autostart_one_service(&cfg.context_name, &cfg).await`. - Insert `auto_start_db::ensure_hero_db_autostart(&self).await;` in `Supervisor::run()` right after `self.autostart_services().await;`. Running after general autostart means if `hero_db` is one of many services with `status: start` it has already been queued — `ensure_hero_db_autostart` then detects `AlreadyActive` and is a no-op. Dependencies: Steps 1 and 2. #### Step 4 — Unit tests for `auto_start_db` Files: `crates/hero_proc_server/src/supervisor/auto_start_db.rs` (append `#[cfg(test)] mod tests`). - Extract a free function `decide_action(db, env_opt_out) -> HeroDbState` so tests don't need the full async run loop. - Test cases: empty DB → `NotRegistered`; `status: stop` → `NotStartIntent(Stop)`; existing non-terminal job → `AlreadyActive`; valid config + no jobs → `NeedsStart(cfg)`; env var set → early return regardless of state. Use `serial_test::serial` for env-var test. Dependencies: Steps 1–3. #### Step 5 — Integration test using the existing harness Files: `crates/hero_proc_integration_test/tests/hero_db_autostart.rs` (create). - Use `harness::TestHarness` to spin up a fresh `hero_proc_server` against a temp socket + temp DB. - Pre-seed the DB via SDK calls (`action_set` for a no-op action, `service_set` with `name: "hero_db"`, `status: start`, `actions: ["main"]`), restart, then assert via `service_status` that within 3 s `hero_db` has at least one Pending or Running job. - Second variant: do not pre-register `hero_db`, start the server, assert via `log query` that an `error` log entry exists with source `hero_proc.auto_start_db`. Dependencies: Steps 1–3. Can be authored in parallel with Step 4. #### Step 6 — Documentation updates Files: `~/.claude/skills/hero_proc_cmd/SKILL.md`, `crates/hero_proc_server/CLAUDE.md`. - Add a "hero_db auto-start" subsection to the skill documenting the new behaviour, the opt-out env var, the error path, and where to find the log entry. - Append a short note to the server crate's CLAUDE.md (if present). Dependencies: independent of Steps 4 and 5. ### Acceptance Criteria - [ ] hero_proc, on its own start, ensures `hero_db` is running (registered service auto-start, idempotent) — implemented in `supervisor/auto_start_db.rs::ensure_hero_db_autostart`, called from `Supervisor::run()` after `autostart_services()`. - [ ] If `hero_db` service isn't registered, hero_proc surfaces a clear error rather than silently continuing — tracing `error!` plus a `logging` table entry with source `hero_proc.auto_start_db`. - [ ] Documented in the `hero_proc_cmd` skill (and noted for `docs_hero` cross-repo follow-up). - [ ] Idempotent under reruns (verified by Step 4 unit test 3 and Step 5 integration test). - [ ] Opt-out via `HERO_PROC_NO_AUTOSTART_DB=1` (verified by Step 4 unit test 5). - [ ] No regression: existing `autostart_services` topological ordering still wins for any service that declares `requires = ["hero_db"]`. The new module is purely additive. ### Notes / Design Decisions - **Additive safety net rather than hard-coding a `hero_db` row at boot.** Seeding a service row from the supervisor would mask configuration errors and would surprise operators who deliberately set `status: stop`. The agreed model is "DB is the source of truth"; we only validate, never auto-create. - **Env var, not CLI flag.** The codebase consistently prefers `HERO_PROC_*` env vars (e.g. `HERO_PROC_SOCKET`, `HERO_PROC_EXAMPLES_DIR`). Adding a new CLI flag would be inconsistent and would also need wiring through `ServerConfig` and `launch_in_screen`. - **Direct call to `autostart_one_service` rather than going through the JSON-RPC `service.start` handler.** The supervisor already exposes the in-process equivalent — we just widen its visibility. Avoids serialization overhead and keeps boot-time logic in-process. - **Order of operations.** `ensure_hero_db_autostart` runs *after* `autostart_services`, *before* `autostart_process_jobs`. By that point the DB is fully open and the log batcher is alive — both prerequisites of the function. - **Async / fire-and-forget.** We dispatch the run+jobs, then return. We do not wait for hero_db to become "ready" (the existing `service_state` evaluator does that asynchronously on a 5 s cadence and is consumed by downstream services that declare `requires`). Blocking on readiness here would deadlock the supervisor's main loop. - **Edge case — non-`core` context.** The spec scopes the lookup to `("core", "hero_db")`. If a deployment uses a non-default context for the DB, the operator must put the service in `core` or this can be extended later with `HERO_PROC_HERO_DB_CONTEXT`. Out of scope for #95.
Member

Test Results

  • Total: 411
  • Passed: 410
  • Failed: 1
  • Ignored: 52
  • Test command: cargo test --workspace --no-fail-fast -- --test-threads=1

(Note: the initial run with cargo test --workspace -- --test-threads=1 aborted at the first failure in hero_proc_lib due to default fail-fast behavior. Re-ran with --no-fail-fast to capture all results across the workspace.)

New tests added in this branch

  • supervisor::auto_start_db::tests::* (5 unit tests in hero_proc_server)
    • decide_action_returns_already_active_when_non_terminal_job_exists ... ok
    • decide_action_returns_needs_start_when_no_active_jobs ... ok
    • decide_action_returns_none_when_env_opt_out_set ... ok
    • decide_action_returns_not_registered_for_empty_db ... ok
    • decide_action_returns_not_start_intent_when_status_stop ... ok
  • hero_db_autostart::* (2 integration tests in hero_proc_integration_test)
    • hero_db_autostarts_when_registered_with_start_status ... ok
    • hero_db_missing_logs_clear_error ... ok

All 7 new tests pass.

Pre-existing failures

  • hero_proc_lib::db::integration_tests::tests::logging_api_end_to_end_filters_and_cleanupcrates/hero_proc_lib/src/db/integration_tests.rs:649 — assertion after delete_older_than(u32::MAX), ctx query should be empty, got 30 entries fails. Verified to also fail on the unmodified development branch, so this is a pre-existing issue unrelated to the auto-start-hero_db change.

Notes

  • Toolchain: rustup override set 1.95.0 (1.95.0-x86_64-unknown-linux-gnu).
  • 52 tests are marked ignored across the workspace; none are touched by this change.
  • The new integration tests in crates/hero_proc_integration_test/tests/hero_db_autostart.rs correctly serialize on the global socket lock and pass under --test-threads=1.
## Test Results - Total: 411 - Passed: 410 - Failed: 1 - Ignored: 52 - Test command: `cargo test --workspace --no-fail-fast -- --test-threads=1` (Note: the initial run with `cargo test --workspace -- --test-threads=1` aborted at the first failure in `hero_proc_lib` due to default fail-fast behavior. Re-ran with `--no-fail-fast` to capture all results across the workspace.) ### New tests added in this branch - `supervisor::auto_start_db::tests::*` (5 unit tests in `hero_proc_server`) - `decide_action_returns_already_active_when_non_terminal_job_exists` ... ok - `decide_action_returns_needs_start_when_no_active_jobs` ... ok - `decide_action_returns_none_when_env_opt_out_set` ... ok - `decide_action_returns_not_registered_for_empty_db` ... ok - `decide_action_returns_not_start_intent_when_status_stop` ... ok - `hero_db_autostart::*` (2 integration tests in `hero_proc_integration_test`) - `hero_db_autostarts_when_registered_with_start_status` ... ok - `hero_db_missing_logs_clear_error` ... ok All 7 new tests pass. ### Pre-existing failures - `hero_proc_lib::db::integration_tests::tests::logging_api_end_to_end_filters_and_cleanup` — `crates/hero_proc_lib/src/db/integration_tests.rs:649` — assertion `after delete_older_than(u32::MAX), ctx query should be empty, got 30 entries` fails. Verified to also fail on the unmodified `development` branch, so this is a pre-existing issue unrelated to the auto-start-hero_db change. ### Notes - Toolchain: `rustup override set 1.95.0` (1.95.0-x86_64-unknown-linux-gnu). - 52 tests are marked ignored across the workspace; none are touched by this change. - The new integration tests in `crates/hero_proc_integration_test/tests/hero_db_autostart.rs` correctly serialize on the global socket lock and pass under `--test-threads=1`.
Member

Implementation Summary

Behaviour added

  • hero_proc_server now automatically ensures the hero_db service (context core) is running on startup. The check is wired into Supervisor::run() immediately after autostart_services() and before autostart_process_jobs().
  • Idempotent. Detects AlreadyActive via the same non-terminal-job predicate the existing autostart loop uses, so re-runs are no-ops.
  • Operator intent wins. If hero_db is registered with status: stop / ignore / spec, the auto-start is skipped with a warn! log.
  • Missing registration is loud but non-fatal. When hero_db is not registered in context "core", hero_proc emits a tracing::error! AND writes a structured error entry to the logs DB with source hero_proc.auto_start_db. The supervisor still finishes booting so the admin RPC stays reachable.
  • Opt-out via HERO_PROC_NO_AUTOSTART_DB=1 for test harnesses, isolated sandboxes, or deployments where hero_db is managed externally.

Files

Created:

  • crates/hero_proc_server/src/supervisor/auto_start_db.rs — the policy module. Exposes ensure_hero_db_autostart(&Supervisor) and a private decide_action(db, env_opt_out) -> Option<HeroDbState> that does the pure classification used by tests.
  • crates/hero_proc_integration_test/tests/hero_db_autostart.rs — two end-to-end tests covering the happy path (registered + start → job dispatched) and the missing-registration path (no hero_db row → error log entry).

Modified:

  • crates/hero_proc_server/src/supervisor/mod.rspub mod auto_start_db;, widened Supervisor::autostart_one_service to pub(crate), inserted the call to ensure_hero_db_autostart(&self).await after autostart_services().await.
  • crates/hero_proc_integration_test/src/harness.rs — added start_with_seed<F>(seed: F) so integration tests can pre-seed the DB after the harness wipes it but before the server starts. Also added a CARGO_TARGET_DIR lookup branch and a parser for <workspace>/.cargo/config.toml [build] target-dir so the harness picks up the freshly-built hero_proc_server instead of a stale binary on PATH.
  • crates/hero_proc_server/CLAUDE.md — short paragraph noting the supervisor enforces this, where the policy lives, and that it is purely additive and non-panicking.
  • ~/.claude/skills/hero_proc_cmd/SKILL.md — new ## hero_db auto-start section documenting behaviour, opt-out env var, and how to read the error log.

Tests

  • 5 new unit tests in supervisor::auto_start_db::tests exercising every branch of decide_action (env opt-out, not-registered, not-start-intent, already-active, needs-start). All passing.
  • 2 new integration tests in hero_db_autostart (hero_db_autostarts_when_registered_with_start_status, hero_db_missing_logs_clear_error). All passing.

Workspace test run (cargo test --workspace --no-fail-fast -- --test-threads=1):

  • Total: 411
  • Passed: 410
  • Failed: 1 (pre-existing, unrelated)
  • Ignored: 52

The one failure is hero_proc_lib::db::integration_tests::tests::logging_api_end_to_end_filters_and_cleanup (crates/hero_proc_lib/src/db/integration_tests.rs:649). Reproduced on a clean development checkout, so it is not introduced by this change.

Acceptance criteria

  • hero_proc, on its own start, ensures hero_db is running (registered service auto-start, idempotent).
  • If hero_db service isn't registered, hero_proc surfaces a clear error rather than silently continuing.
  • Documented in the hero_proc_cmd skill and the server crate's CLAUDE.md. Cross-repo docs_hero was not found locally and is flagged for separate follow-up.

Notes / design decisions

  • Additive safety net rather than auto-creating a hero_db row at boot. The DB is the source of truth — auto-creating would mask configuration errors.
  • Env var opt-out, not a CLI flag, to match the existing HERO_PROC_* pattern.
  • Direct in-process call to Supervisor::autostart_one_service rather than going through the JSON-RPC service.start handler — avoids serialization for boot-time logic.
  • The check runs after autostart_services so if hero_db was already queued by general autostart, the new function detects AlreadyActive and is a no-op. Strictly additive, no double-dispatch.
  • Edge case: a non-core context for hero_db is out of scope and would require a follow-up HERO_PROC_HERO_DB_CONTEXT env var.
## Implementation Summary ### Behaviour added - `hero_proc_server` now automatically ensures the `hero_db` service (context `core`) is running on startup. The check is wired into `Supervisor::run()` immediately after `autostart_services()` and before `autostart_process_jobs()`. - **Idempotent.** Detects `AlreadyActive` via the same non-terminal-job predicate the existing autostart loop uses, so re-runs are no-ops. - **Operator intent wins.** If `hero_db` is registered with `status: stop` / `ignore` / `spec`, the auto-start is skipped with a `warn!` log. - **Missing registration is loud but non-fatal.** When `hero_db` is not registered in context `"core"`, hero_proc emits a `tracing::error!` AND writes a structured error entry to the logs DB with source `hero_proc.auto_start_db`. The supervisor still finishes booting so the admin RPC stays reachable. - **Opt-out** via `HERO_PROC_NO_AUTOSTART_DB=1` for test harnesses, isolated sandboxes, or deployments where `hero_db` is managed externally. ### Files **Created:** - `crates/hero_proc_server/src/supervisor/auto_start_db.rs` — the policy module. Exposes `ensure_hero_db_autostart(&Supervisor)` and a private `decide_action(db, env_opt_out) -> Option<HeroDbState>` that does the pure classification used by tests. - `crates/hero_proc_integration_test/tests/hero_db_autostart.rs` — two end-to-end tests covering the happy path (registered + start → job dispatched) and the missing-registration path (no `hero_db` row → error log entry). **Modified:** - `crates/hero_proc_server/src/supervisor/mod.rs` — `pub mod auto_start_db;`, widened `Supervisor::autostart_one_service` to `pub(crate)`, inserted the call to `ensure_hero_db_autostart(&self).await` after `autostart_services().await`. - `crates/hero_proc_integration_test/src/harness.rs` — added `start_with_seed<F>(seed: F)` so integration tests can pre-seed the DB after the harness wipes it but before the server starts. Also added a `CARGO_TARGET_DIR` lookup branch and a parser for `<workspace>/.cargo/config.toml [build] target-dir` so the harness picks up the freshly-built `hero_proc_server` instead of a stale binary on `PATH`. - `crates/hero_proc_server/CLAUDE.md` — short paragraph noting the supervisor enforces this, where the policy lives, and that it is purely additive and non-panicking. - `~/.claude/skills/hero_proc_cmd/SKILL.md` — new `## hero_db auto-start` section documenting behaviour, opt-out env var, and how to read the error log. ### Tests - 5 new unit tests in `supervisor::auto_start_db::tests` exercising every branch of `decide_action` (env opt-out, not-registered, not-start-intent, already-active, needs-start). All passing. - 2 new integration tests in `hero_db_autostart` (`hero_db_autostarts_when_registered_with_start_status`, `hero_db_missing_logs_clear_error`). All passing. Workspace test run (`cargo test --workspace --no-fail-fast -- --test-threads=1`): - Total: 411 - Passed: 410 - Failed: 1 (pre-existing, unrelated) - Ignored: 52 The one failure is `hero_proc_lib::db::integration_tests::tests::logging_api_end_to_end_filters_and_cleanup` (`crates/hero_proc_lib/src/db/integration_tests.rs:649`). Reproduced on a clean `development` checkout, so it is not introduced by this change. ### Acceptance criteria - [x] hero_proc, on its own start, ensures `hero_db` is running (registered service auto-start, idempotent). - [x] If `hero_db` service isn't registered, hero_proc surfaces a clear error rather than silently continuing. - [x] Documented in the `hero_proc_cmd` skill and the server crate's `CLAUDE.md`. Cross-repo `docs_hero` was not found locally and is flagged for separate follow-up. ### Notes / design decisions - Additive safety net rather than auto-creating a `hero_db` row at boot. The DB is the source of truth — auto-creating would mask configuration errors. - Env var opt-out, not a CLI flag, to match the existing `HERO_PROC_*` pattern. - Direct in-process call to `Supervisor::autostart_one_service` rather than going through the JSON-RPC `service.start` handler — avoids serialization for boot-time logic. - The check runs *after* `autostart_services` so if `hero_db` was already queued by general autostart, the new function detects `AlreadyActive` and is a no-op. Strictly additive, no double-dispatch. - Edge case: a non-`core` context for `hero_db` is out of scope and would require a follow-up `HERO_PROC_HERO_DB_CONTEXT` env var.
Member

Pull request opened: #99

This PR implements the changes discussed in this issue.

Pull request opened: https://forge.ourworld.tf/lhumina_code/hero_proc/pulls/99 This PR implements the changes discussed in this issue.
Member

Pull request opened: #99

This PR implements the changes discussed in this issue.

Pull request opened: https://forge.ourworld.tf/lhumina_code/hero_proc/pulls/99 This PR implements the changes discussed in this issue.
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_proc#95
No description provided.