fix(lab/service): close 3 sweep-blocking lab service bugs (#254/#255/#256) #257

Merged
mik-tf merged 1 commit from development_mik into development 2026-05-15 20:47:19 +00:00
Owner

Summary

Closes 3 lab service bugs that surfaced during the hero_proc#102 service.toml + lab service sweep on its first repo (hero_code, hero_code#15 — squash-merged).

Issue Fix
#254lab service <name> --start only starts one binary Expand SERVICE_MAP entries for known multi-binary services (hero_code, hero_db, hero_aibroker, hero_browser, hero_slides) to include their _admin companions. Doc-comment notes that dynamic discovery from service.toml is the proper long-term fix.
#255lab service destructively deletes hero_proc rpc.sock on false-negative liveness probe Add a final ping_hero_proc(&sock) probe immediately before the std::fs::remove_file(&sock) call in start_hero_proc. Bail with a clear message if a daemon is still answering, instead of deleting the socket and orphaning supervised services.
#256lab service hero_proc --start shells out to screen without ensuring it's installed Pre-flight which screen check at the top of start_hero_proc, BEFORE any state cleanup. Error message points at lab install base (which already installs screen in its apt list at base.rs:29).

Test plan

  • cargo build --release -p lab clean
  • lab service resetall → bootstrap chain (hero_proc + hero_db + hero_aibroker + hero_code) → hero_proc service list shows all multi-binary services with both server + admin registered + running
  • lab service hero_code --start now starts BOTH hero_code_server (6 smoke ✓) AND hero_code_admin (2 smoke ✓) in a single invocation; previously only the server started
  • lab service hero_db --start now starts BOTH hero_db_server (4 smoke ✓) AND hero_db_admin (2 smoke ✓)
  • Pre-existing single-binary services (hero_proc, hero_router, hero_runner_rhai, etc.) unaffected — SERVICE_MAP entries unchanged
  • Verified under lab build #54729
  • Reviewer go-ahead for squash-merge

Acceptance gate carry — retroactive validation for hero_code#15

hero_code PR #15 documented criterion 5 as "with per-binary workaround" because lab couldn't start both binaries via service-name. After this PR lands, that workaround is no longer needed — lab service hero_code --start is the canonical single-command path.

Out of scope (filed separately or noted)

  • Dynamic SERVICE_MAP discovery from service.toml: noted in the doc-comment on SERVICE_MAP. Proper architectural fix for any future multi-binary service; current PR is the minimal surgical fix for the immediate sweep blocker.
  • hero_aibroker_admin runtime failure: discovered during testing of lab service hero_aibroker --start — admin failed to start with a 10s validation timeout. Out of scope for this PR; s96 hero_lib sweep / subsequent hero_aibroker sweep will address.

Signed-off-by: mik-tf

## Summary Closes 3 lab service bugs that surfaced during the hero_proc#102 service.toml + lab service sweep on its first repo (hero_code, [hero_code#15](https://forge.ourworld.tf/lhumina_code/hero_code/pulls/15) — squash-merged). | Issue | Fix | |---|---| | [#254](https://forge.ourworld.tf/lhumina_code/hero_skills/issues/254) — `lab service <name> --start` only starts one binary | Expand SERVICE_MAP entries for known multi-binary services (hero_code, hero_db, hero_aibroker, hero_browser, hero_slides) to include their `_admin` companions. Doc-comment notes that dynamic discovery from service.toml is the proper long-term fix. | | [#255](https://forge.ourworld.tf/lhumina_code/hero_skills/issues/255) — `lab service` destructively deletes hero_proc rpc.sock on false-negative liveness probe | Add a final `ping_hero_proc(&sock)` probe immediately before the `std::fs::remove_file(&sock)` call in `start_hero_proc`. Bail with a clear message if a daemon is still answering, instead of deleting the socket and orphaning supervised services. | | [#256](https://forge.ourworld.tf/lhumina_code/hero_skills/issues/256) — `lab service hero_proc --start` shells out to `screen` without ensuring it's installed | Pre-flight `which screen` check at the top of `start_hero_proc`, BEFORE any state cleanup. Error message points at `lab install base` (which already installs screen in its apt list at base.rs:29). | ## Test plan - [x] `cargo build --release -p lab` clean - [x] `lab service resetall` → bootstrap chain (hero_proc + hero_db + hero_aibroker + hero_code) → `hero_proc service list` shows all multi-binary services with both server + admin registered + running - [x] `lab service hero_code --start` now starts BOTH `hero_code_server` (6 smoke ✓) AND `hero_code_admin` (2 smoke ✓) in a single invocation; previously only the server started - [x] `lab service hero_db --start` now starts BOTH `hero_db_server` (4 smoke ✓) AND `hero_db_admin` (2 smoke ✓) - [x] Pre-existing single-binary services (`hero_proc`, `hero_router`, `hero_runner_rhai`, etc.) unaffected — SERVICE_MAP entries unchanged - [x] Verified under lab build #54729 - [ ] Reviewer go-ahead for squash-merge ## Acceptance gate carry — retroactive validation for [hero_code#15](https://forge.ourworld.tf/lhumina_code/hero_code/pulls/15) hero_code PR #15 documented criterion 5 as ✅ "with per-binary workaround" because lab couldn't start both binaries via service-name. After this PR lands, that workaround is no longer needed — `lab service hero_code --start` is the canonical single-command path. ## Out of scope (filed separately or noted) - **Dynamic SERVICE_MAP discovery from service.toml**: noted in the doc-comment on SERVICE_MAP. Proper architectural fix for any future multi-binary service; current PR is the minimal surgical fix for the immediate sweep blocker. - **`hero_aibroker_admin` runtime failure**: discovered during testing of `lab service hero_aibroker --start` — admin failed to start with a 10s validation timeout. Out of scope for this PR; s96 hero_lib sweep / subsequent hero_aibroker sweep will address. Signed-off-by: mik-tf
fix(lab/service): close 3 sweep-blocking lab service bugs
Some checks failed
Build and Test (lab workspace) / build-and-test (pull_request) Failing after 3s
ecf08632eb
Closes three issues surfaced during the hero_proc#102 service.toml +
lab service sweep (first repo: hero_code, see lhumina_code/hero_code PR #15).

#254 — `lab service <name> --start` now starts ALL daemon binaries:
  Expanded SERVICE_MAP entries for known multi-binary services to include
  their `_admin` companions (hero_code, hero_db, hero_aibroker, hero_browser,
  hero_slides). Previously each alias resolved to `[<name>_server]` only,
  so `lab service hero_code --start` silently skipped hero_code_admin.
  Verified end-to-end: `lab service hero_code --start` now starts both
  hero_code_server (6 smoke checks ✓) and hero_code_admin (2 smoke checks ✓)
  in a single invocation.

  Added a doc-comment noting that dynamic discovery from each binary's
  embedded service.toml is the proper long-term fix; SERVICE_MAP entries
  should be kept in sync with each repo's `[[binaries]]` content until then.

#255 — `lab service` no longer destroys a live hero_proc socket:
  Added a final `ping_hero_proc(&sock)` probe immediately before the
  `std::fs::remove_file(&sock)` call in start_hero_proc. If a daemon
  is still answering on the socket (which can happen when pkill misses
  the process — `pkill -x hero_proc_server` doesn't match because comm
  truncates to 15 chars), we bail with a clear message rather than
  delete the socket and orphan the supervised services.

#256 — `lab service hero_proc --start` fails fast if `screen` is missing:
  Added a `which screen` pre-flight check at the top of `start_hero_proc`,
  BEFORE any state cleanup. Previously, missing screen meant lab would
  unlink the hero_proc socket and *then* fail at the launch step,
  leaving the system worse than found. The error message now points the
  user at `lab install base` (which already installs screen in its apt
  list — base.rs:29).

Verified under lab build #54729 with full hero_code dep-chain bootstrap
+ smoke gate; hero_code re-validates 8/8 via single `lab service hero_code
--start` invocation.

Refs:
  #254
  #255
  #256
  lhumina_code/hero_proc#102
Signed-off-by: mik-tf <logismos@protonmail.ch>
mik-tf merged commit d8389c4046 into development 2026-05-15 20:47:19 +00:00
mik-tf deleted branch development_mik 2026-05-15 20:47:19 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_skills!257
No description provided.