auto-start hero_db when hero_proc starts #95
Labels
No labels
prio_critical
prio_low
type_bug
type_contact
type_issue
type_lead
type_question
type_story
type_task
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/hero_proc#95
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Overview
When
hero_procstarts, it should bring uphero_dbautomatically (it's the canonical persistent store; nearly every other service depends on it). Todayhero_dbrequires its own separateservice_db startstep.Why
Meeting 2026-05-06: "now hero / hero proc starts / automatically start the db".
Reduces one manual step in every install/restart cycle and removes a class of "service X failed to start because hero_db wasn't running" errors.
Acceptance
docs_hero/hero_proc_cmdskillRelated
hero_proc_cmd,hero_proc_sdkSource: meeting notes 2026-05-06.
Implementation Spec for Issue #95
Objective
When
hero_proc_serverstarts, it must ensure that the canonical persistent store service namedhero_dbis started as part of supervisor bring-up. This must be idempotent (re-running on a healthy system is a no-op), surface a clear error whenhero_dbis not registered, and be documented in the user-facing CLI skill.Context: How hero_proc starts services today
crates/hero_proc_server/src/server.rs::run(cfg, cancel)— buildsHeroProcDb, startsSupervisor::run()as a tokio task, then the scheduler / scanner / web server.Supervisor::run()on startup calls in order:recover_running_jobs()— reattach to PIDs that survived restart.autostart_services()— for every service whosestatus == ServiceWantedStatus::Start, topologically sort byrequires/after, then create a Run + Pending Jobs.autostart_process_jobs()— re-queue deadis_processjobs.servicestable keyed by(context_name, name). There is no list of "well-known canonical services" hard-coded anywhere today.The existing
autostart_services()already startshero_dbautomatically if a service namedhero_dbis registered withstatus: start. The real gap surfaced by this issue is: today nothing in hero_proc treatshero_dbas required — if it's missing or set tostop, the server boots silently and the operator finds out later through downstream service failures.Requirements
hero_proc_serverstartup, after supervisor bring-up, hero_proc must ensure a service namedhero_db(contextcore) is active.hero_dbservice is not registered in the DB, hero_proc must emit a clear, prominent error (tracingerror!and a structured entry into theloggingtable vialog_batcher), including remediation guidance.hero_dbservice is registered butstatus != start, hero_proc must emit a clear warning and skip the auto-start (operator intent is preserved — documented behaviour, not silently overridden).HERO_PROC_NO_AUTOSTART_DB=1for special cases (test harnesses, isolated sandboxes, embedded launchers that manage hero_db externally).hero_proc_cmdskill and the server crate'sCLAUDE.md.Files to Modify / Create
crates/hero_proc_server/src/supervisor/auto_start_db.rsensure_hero_db_autostart(&Supervisor), the missing-service error path, and unit tests. Keeps the new policy isolated from existing autostart code.crates/hero_proc_server/src/supervisor/mod.rspub mod auto_start_db;. Insert a call toauto_start_db::ensure_hero_db_autostart(self).awaitinSupervisor::run()immediately afterself.autostart_services().await. Widenautostart_one_servicetopub(crate).crates/hero_proc_integration_test/tests/hero_db_autostart.rshero_dbaction+service, launcheshero_proc_server, asserts the service has at least one job within ~3 s. Second variant: no-registration → assert error log entry.~/.claude/skills/hero_proc_cmd/SKILL.mdhero_dbisn't registered.crates/hero_proc_server/CLAUDE.mdhero_dband where the policy lives.Implementation Plan
Step 1 — Add the
auto_start_dbmodule skeletonFiles:
crates/hero_proc_server/src/supervisor/auto_start_db.rs(create),crates/hero_proc_server/src/supervisor/mod.rs(modify).pub mod auto_start_db;near the top ofsupervisor/mod.rs.pub(crate) async fn ensure_hero_db_autostart(sup: &Supervisor).HERO_PROC_NO_AUTOSTART_DB=1(early return withinfo!).db.services.get("core", "hero_db")and branch on a private enumHeroDbState { NotRegistered, NotStartIntent(ServiceWantedStatus), AlreadyActive, NeedsStart(ServiceConfig) }.Dependencies: none.
Step 2 — Author the structured error message for missing hero_db
Files:
crates/hero_proc_server/src/supervisor/auto_start_db.rs.NotRegistered: emiterror!with remediation message and write the same to the logs DB viasup.log_batcher(sourcehero_proc.auto_start_db).Err— degraded-mode warning, not fatal startup error.Dependencies: Step 1.
Step 3 — Refactor
autostart_one_serviceso the new module can call into itFiles:
crates/hero_proc_server/src/supervisor/mod.rs,crates/hero_proc_server/src/supervisor/auto_start_db.rs.Supervisor::autostart_one_servicetopub(crate).NeedsStart(cfg)arm callssup.autostart_one_service(&cfg.context_name, &cfg).await.auto_start_db::ensure_hero_db_autostart(&self).await;inSupervisor::run()right afterself.autostart_services().await;. Running after general autostart means ifhero_dbis one of many services withstatus: startit has already been queued —ensure_hero_db_autostartthen detectsAlreadyActiveand is a no-op.Dependencies: Steps 1 and 2.
Step 4 — Unit tests for
auto_start_dbFiles:
crates/hero_proc_server/src/supervisor/auto_start_db.rs(append#[cfg(test)] mod tests).decide_action(db, env_opt_out) -> HeroDbStateso tests don't need the full async run loop.NotRegistered;status: stop→NotStartIntent(Stop); existing non-terminal job →AlreadyActive; valid config + no jobs →NeedsStart(cfg); env var set → early return regardless of state. Useserial_test::serialfor env-var test.Dependencies: Steps 1–3.
Step 5 — Integration test using the existing harness
Files:
crates/hero_proc_integration_test/tests/hero_db_autostart.rs(create).harness::TestHarnessto spin up a freshhero_proc_serveragainst a temp socket + temp DB.action_setfor a no-op action,service_setwithname: "hero_db",status: start,actions: ["main"]), restart, then assert viaservice_statusthat within 3 shero_dbhas at least one Pending or Running job.hero_db, start the server, assert vialog querythat anerrorlog entry exists with sourcehero_proc.auto_start_db.Dependencies: Steps 1–3. Can be authored in parallel with Step 4.
Step 6 — Documentation updates
Files:
~/.claude/skills/hero_proc_cmd/SKILL.md,crates/hero_proc_server/CLAUDE.md.Dependencies: independent of Steps 4 and 5.
Acceptance Criteria
hero_dbis running (registered service auto-start, idempotent) — implemented insupervisor/auto_start_db.rs::ensure_hero_db_autostart, called fromSupervisor::run()afterautostart_services().hero_dbservice isn't registered, hero_proc surfaces a clear error rather than silently continuing — tracingerror!plus aloggingtable entry with sourcehero_proc.auto_start_db.hero_proc_cmdskill (and noted fordocs_herocross-repo follow-up).HERO_PROC_NO_AUTOSTART_DB=1(verified by Step 4 unit test 5).autostart_servicestopological ordering still wins for any service that declaresrequires = ["hero_db"]. The new module is purely additive.Notes / Design Decisions
hero_dbrow at boot. Seeding a service row from the supervisor would mask configuration errors and would surprise operators who deliberately setstatus: stop. The agreed model is "DB is the source of truth"; we only validate, never auto-create.HERO_PROC_*env vars (e.g.HERO_PROC_SOCKET,HERO_PROC_EXAMPLES_DIR). Adding a new CLI flag would be inconsistent and would also need wiring throughServerConfigandlaunch_in_screen.autostart_one_servicerather than going through the JSON-RPCservice.starthandler. The supervisor already exposes the in-process equivalent — we just widen its visibility. Avoids serialization overhead and keeps boot-time logic in-process.ensure_hero_db_autostartruns afterautostart_services, beforeautostart_process_jobs. By that point the DB is fully open and the log batcher is alive — both prerequisites of the function.service_stateevaluator does that asynchronously on a 5 s cadence and is consumed by downstream services that declarerequires). Blocking on readiness here would deadlock the supervisor's main loop.corecontext. The spec scopes the lookup to("core", "hero_db"). If a deployment uses a non-default context for the DB, the operator must put the service incoreor this can be extended later withHERO_PROC_HERO_DB_CONTEXT. Out of scope for #95.Test Results
cargo test --workspace --no-fail-fast -- --test-threads=1(Note: the initial run with
cargo test --workspace -- --test-threads=1aborted at the first failure inhero_proc_libdue to default fail-fast behavior. Re-ran with--no-fail-fastto capture all results across the workspace.)New tests added in this branch
supervisor::auto_start_db::tests::*(5 unit tests inhero_proc_server)decide_action_returns_already_active_when_non_terminal_job_exists... okdecide_action_returns_needs_start_when_no_active_jobs... okdecide_action_returns_none_when_env_opt_out_set... okdecide_action_returns_not_registered_for_empty_db... okdecide_action_returns_not_start_intent_when_status_stop... okhero_db_autostart::*(2 integration tests inhero_proc_integration_test)hero_db_autostarts_when_registered_with_start_status... okhero_db_missing_logs_clear_error... okAll 7 new tests pass.
Pre-existing failures
hero_proc_lib::db::integration_tests::tests::logging_api_end_to_end_filters_and_cleanup—crates/hero_proc_lib/src/db/integration_tests.rs:649— assertionafter delete_older_than(u32::MAX), ctx query should be empty, got 30 entriesfails. Verified to also fail on the unmodifieddevelopmentbranch, so this is a pre-existing issue unrelated to the auto-start-hero_db change.Notes
rustup override set 1.95.0(1.95.0-x86_64-unknown-linux-gnu).crates/hero_proc_integration_test/tests/hero_db_autostart.rscorrectly serialize on the global socket lock and pass under--test-threads=1.Implementation Summary
Behaviour added
hero_proc_servernow automatically ensures thehero_dbservice (contextcore) is running on startup. The check is wired intoSupervisor::run()immediately afterautostart_services()and beforeautostart_process_jobs().AlreadyActivevia the same non-terminal-job predicate the existing autostart loop uses, so re-runs are no-ops.hero_dbis registered withstatus: stop/ignore/spec, the auto-start is skipped with awarn!log.hero_dbis not registered in context"core", hero_proc emits atracing::error!AND writes a structured error entry to the logs DB with sourcehero_proc.auto_start_db. The supervisor still finishes booting so the admin RPC stays reachable.HERO_PROC_NO_AUTOSTART_DB=1for test harnesses, isolated sandboxes, or deployments wherehero_dbis managed externally.Files
Created:
crates/hero_proc_server/src/supervisor/auto_start_db.rs— the policy module. Exposesensure_hero_db_autostart(&Supervisor)and a privatedecide_action(db, env_opt_out) -> Option<HeroDbState>that does the pure classification used by tests.crates/hero_proc_integration_test/tests/hero_db_autostart.rs— two end-to-end tests covering the happy path (registered + start → job dispatched) and the missing-registration path (nohero_dbrow → error log entry).Modified:
crates/hero_proc_server/src/supervisor/mod.rs—pub mod auto_start_db;, widenedSupervisor::autostart_one_servicetopub(crate), inserted the call toensure_hero_db_autostart(&self).awaitafterautostart_services().await.crates/hero_proc_integration_test/src/harness.rs— addedstart_with_seed<F>(seed: F)so integration tests can pre-seed the DB after the harness wipes it but before the server starts. Also added aCARGO_TARGET_DIRlookup branch and a parser for<workspace>/.cargo/config.toml [build] target-dirso the harness picks up the freshly-builthero_proc_serverinstead of a stale binary onPATH.crates/hero_proc_server/CLAUDE.md— short paragraph noting the supervisor enforces this, where the policy lives, and that it is purely additive and non-panicking.~/.claude/skills/hero_proc_cmd/SKILL.md— new## hero_db auto-startsection documenting behaviour, opt-out env var, and how to read the error log.Tests
supervisor::auto_start_db::testsexercising every branch ofdecide_action(env opt-out, not-registered, not-start-intent, already-active, needs-start). All passing.hero_db_autostart(hero_db_autostarts_when_registered_with_start_status,hero_db_missing_logs_clear_error). All passing.Workspace test run (
cargo test --workspace --no-fail-fast -- --test-threads=1):The one failure is
hero_proc_lib::db::integration_tests::tests::logging_api_end_to_end_filters_and_cleanup(crates/hero_proc_lib/src/db/integration_tests.rs:649). Reproduced on a cleandevelopmentcheckout, so it is not introduced by this change.Acceptance criteria
hero_dbis running (registered service auto-start, idempotent).hero_dbservice isn't registered, hero_proc surfaces a clear error rather than silently continuing.hero_proc_cmdskill and the server crate'sCLAUDE.md. Cross-repodocs_herowas not found locally and is flagged for separate follow-up.Notes / design decisions
hero_dbrow at boot. The DB is the source of truth — auto-creating would mask configuration errors.HERO_PROC_*pattern.Supervisor::autostart_one_servicerather than going through the JSON-RPCservice.starthandler — avoids serialization for boot-time logic.autostart_servicesso ifhero_dbwas already queued by general autostart, the new function detectsAlreadyActiveand is a no-op. Strictly additive, no double-dispatch.corecontext forhero_dbis out of scope and would require a follow-upHERO_PROC_HERO_DB_CONTEXTenv var.Pull request opened: #99
This PR implements the changes discussed in this issue.
Pull request opened: #99
This PR implements the changes discussed in this issue.