fix(skills+services): FORGE_TOKEN-optional bootstrap + uniform start --download (D-06) #228

Merged
mik-tf merged 5 commits from development_mik_proc_token_optional into development 2026-05-07 01:53:08 +00:00
Owner

Summary

The hero_skills companion to hero_proc PR #97. Together they enable a fresh VM (TF Grid or dev box) to bootstrap the supervisor without any FORGE_TOKEN in env, and gives 14 more services a uniform service_X start --download path.

Locked by D-06: forge-token-bootstrap-optional in the ai-pipeline workspace.

Why

  • Bootstrap chicken-and-egg: hero_proc is the secrets store. On a fresh VM there is nowhere to read FORGE_TOKEN from before the supervisor boots.
  • Sessions 53–71 worked around this by manually exporting FORGE_TOKEN before each smoke test. The first end-to-end service_proc start attempt (session 72) hit the latent gate immediately and revealed three layered issues — fixed here.

Commits

  1. fix(lib): make svc_install_download FUSE-safe + verify_binaries_fresh download-awaresvc_install_download now touches each binary after copy (FUSE filesystems lag mtime visibility ~5s, breaking the freshness check); svc_verify_binaries_fresh accepts download: bool = false and returns immediately when set (the freshness check is meaningless for versioned release artifacts). Default value preserves existing behaviour.

  2. fix(service_proc): soft-warn missing FORGE_TOKEN + nohup print quote-fix + download-aware verify — three fixes to bring up the supervisor on a fresh VM: replace the error make on missing FORGE_TOKEN with a print warning (passes empty string through with-env); fix latent nu syntax bug at line 553 ((pid file: ...) parsed parens as subexpression pid command — only nohup detach path hits it, screen path was always fine); thread $download into the verify call.

  3. feat(services): uniform --download flag in start for 14 services + router — adds --download and --version to start for: aibroker, biz, books, browser, db, editor, foundry, indexer, matrixchat, osis, proxy, router, slides, whiteboard. Threading is mechanical; updates also pass $download to verify in the 10 services that already had --download (proc, agent, collab, embedder, logic, office, planner, runner_rhai, shrimp, voice). Result: 24 services with uniform start --download semantics. Mycelium remains cargo-only — release pipeline + --download for geomind_code/mycelium_network is a follow-up session.

  4. docs(hero_running): FORGEJO_TOKEN no longer required at boot — aligns the onboarding skill text with the new D-06 bootstrap shape.

Validation

End-to-end on heroci with FORGE_TOKEN/FORGEJO_TOKEN cleared from env:

$ unset FORGE_TOKEN; unset FORGEJO_TOKEN
$ nu -c 'use nutools/modules/services *; service_proc start --download'
...
  ⚠ FORGE_TOKEN not set; secret pull/push and forge merge will fail until set via env or `proc secret set`.
→ host: Linux — detach mode: nohup
→ starting via nohup setsid — pid file: /root/hero/var/hero_proc_server.pid ...
→ waiting for hero_proc to become healthy (max 30s)...
=== hero_proc status ===
  state    : running ✓
  ✓ hero_proc is running and healthy

Status: running ✓ pid alive: true.

Test plan

  • all modified service_*.nu parse-check cleanly via nu -c "use <file> *; ..."
  • aggregate use nutools/modules/services * loads cleanly
  • hero_proc bootstrap on heroci with no FORGE_TOKEN → green
  • (after hero_proc PR #97 merges + tag v0.5.0-rc1) service_proc install --download --version v0.5.0-rc1 && service_proc start --download from a clean state on heroci
  • dev-box smoke deferred until mycelium routing restored from local workstation

Refs

Out of scope (session 73 follow-up)

  • Mycelium release pipeline on geomind_code/mycelium_network (Bucket-C-class) + --download path in service_mycelium.
  • service_complete --download plumbing in packages.nu.
  • Verify deployed hero_router binary version accepts --address flag (or align nu script).
## Summary The hero_skills companion to [hero_proc PR #97](https://forge.ourworld.tf/lhumina_code/hero_proc/pulls/97). Together they enable a fresh VM (TF Grid or dev box) to bootstrap the supervisor without any `FORGE_TOKEN` in env, and gives 14 more services a uniform `service_X start --download` path. Locked by **D-06: forge-token-bootstrap-optional** in the ai-pipeline workspace. ## Why - Bootstrap chicken-and-egg: hero_proc *is* the secrets store. On a fresh VM there is nowhere to read FORGE_TOKEN from before the supervisor boots. - Sessions 53–71 worked around this by manually exporting FORGE_TOKEN before each smoke test. The first end-to-end `service_proc start` attempt (session 72) hit the latent gate immediately and revealed three layered issues — fixed here. ## Commits 1. **`fix(lib): make svc_install_download FUSE-safe + verify_binaries_fresh download-aware`** — `svc_install_download` now `touch`es each binary after copy (FUSE filesystems lag mtime visibility ~5s, breaking the freshness check); `svc_verify_binaries_fresh` accepts `download: bool = false` and returns immediately when set (the freshness check is meaningless for versioned release artifacts). Default value preserves existing behaviour. 2. **`fix(service_proc): soft-warn missing FORGE_TOKEN + nohup print quote-fix + download-aware verify`** — three fixes to bring up the supervisor on a fresh VM: replace the `error make` on missing FORGE_TOKEN with a `print` warning (passes empty string through `with-env`); fix latent nu syntax bug at line 553 (`(pid file: ...)` parsed parens as subexpression `pid` command — only nohup detach path hits it, screen path was always fine); thread `$download` into the verify call. 3. **`feat(services): uniform --download flag in start for 14 services + router`** — adds `--download` and `--version` to start for: aibroker, biz, books, browser, db, editor, foundry, indexer, matrixchat, osis, proxy, router, slides, whiteboard. Threading is mechanical; updates also pass `$download` to verify in the 10 services that already had `--download` (proc, agent, collab, embedder, logic, office, planner, runner_rhai, shrimp, voice). Result: 24 services with uniform `start --download` semantics. Mycelium remains cargo-only — release pipeline + `--download` for `geomind_code/mycelium_network` is a follow-up session. 4. **`docs(hero_running): FORGEJO_TOKEN no longer required at boot`** — aligns the onboarding skill text with the new D-06 bootstrap shape. ## Validation End-to-end on heroci with `FORGE_TOKEN`/`FORGEJO_TOKEN` cleared from env: ``` $ unset FORGE_TOKEN; unset FORGEJO_TOKEN $ nu -c 'use nutools/modules/services *; service_proc start --download' ... ⚠ FORGE_TOKEN not set; secret pull/push and forge merge will fail until set via env or `proc secret set`. → host: Linux — detach mode: nohup → starting via nohup setsid — pid file: /root/hero/var/hero_proc_server.pid ... → waiting for hero_proc to become healthy (max 30s)... === hero_proc status === state : running ✓ ✓ hero_proc is running and healthy ``` Status: `running ✓ pid alive: true`. ## Test plan - [x] all modified `service_*.nu` parse-check cleanly via `nu -c "use <file> *; ..."` - [x] aggregate `use nutools/modules/services *` loads cleanly - [x] hero_proc bootstrap on heroci with no FORGE_TOKEN → green - [ ] (after hero_proc PR #97 merges + tag v0.5.0-rc1) `service_proc install --download --version v0.5.0-rc1 && service_proc start --download` from a clean state on heroci - [ ] dev-box smoke deferred until mycelium routing restored from local workstation ## Refs - Companion: https://forge.ourworld.tf/lhumina_code/hero_proc/pulls/97 - D-06 forge-token-bootstrap-optional (ai-pipeline workspace `decisions/D-06-forge-token-bootstrap-optional.md`) - https://forge.ourworld.tf/lhumina_code/home/issues/225 (META compliance umbrella) - https://forge.ourworld.tf/lhumina_code/home/issues/121 (HeroOS dev environment / onboarding flow) ## Out of scope (session 73 follow-up) - Mycelium release pipeline on `geomind_code/mycelium_network` (Bucket-C-class) + `--download` path in `service_mycelium`. - `service_complete --download` plumbing in `packages.nu`. - Verify deployed `hero_router` binary version accepts `--address` flag (or align nu script).
Two related changes to the install/verify path used by every service
that supports `--download`:

1. `svc_install_download` now `touch`es each binary after copy. On
   FUSE-backed filesystems (e.g. TF Grid VM root) `cp -f`'s mtime
   update can lag visibility by several seconds — long enough for
   the freshness check downstream to reject the just-installed
   binary as "older than build_started_at". Belt-and-suspenders for
   non-FUSE filesystems (where it's a no-op equivalent).

2. `svc_verify_binaries_fresh` accepts a new `download: bool = false`
   parameter. When set, the function returns immediately. Rationale:
   the freshness check is meaningful for the cargo-build path
   (catches stale leftovers when a build no-ops). On `--download` the
   binary's mtime carries no signal — it's a versioned release
   artifact, not a local build — and on FUSE filesystems the lag
   makes the check produce false positives. Default value `false`
   preserves existing behaviour for all callers that don't pass it.

Default-arg signature means existing call sites that omit the param
continue to work unchanged. Companion changes in this PR thread the
new arg through service_proc.nu and the 14 service modules whose
`start` accepts `--download`.

Refs: lhumina_code/home#225

Signed-off-by: mik-tf
Three related fixes to the nu wrapper around hero_proc_server, all
needed to bring up the supervisor on a fresh VM via `--download`:

1. **FORGE_TOKEN soft-warn (lines 527-531).** The wrapper used to
   `error make` and abort if FORGE_TOKEN was missing from `$env`.
   This contradicts the new bootstrap shape locked by D-06: the
   supervisor must start regardless of whether FORGE_TOKEN is present.
   Forge-touching operations fail at their own call sites with their
   own error messages. The wrapper now `print`s a warning and passes
   the empty string through `with-env` (the binary, on the matching
   hero_proc PR, no longer hard-fails on missing FORGE_TOKEN either).

2. **nohup print quote-fix (line 553).** Latent nu syntax bug: the
   interpolated string `(pid file: ($pid_file))` parsed the outer
   parens as a subexpression `pid file: ...`, which nu read as
   command `pid` with arg `file:`. Only the nohup detach path hits
   this (heroci has no `screen`); the screen path was always fine,
   which is why this never bit on herodemo. Rewrite to drop the
   confusing parens.

3. **Pass `$download` to `svc_verify_binaries_fresh` (line 488).**
   service_proc's start function already accepts `--download`, so
   thread it through to the freshness check (see companion lib.nu
   commit). On the cargo path the check is preserved; on the
   download path it's correctly skipped.

Validated end-to-end on heroci: `service_proc start --download` with
no FORGE_TOKEN/FORGEJO_TOKEN in env brings hero_proc up healthy and
binds rpc.sock + ui.sock cleanly.

Refs: lhumina_code/home#225

Signed-off-by: mik-tf
Threads the `--download` flag (and its companion `--version`) through
every `service_X start` whose `install` already supports `--download`
but where `start` previously did not. With this change, a fresh VM
can do `service_X start --download` for any of these 14 services and
get the canonical defensive-restart cycle (drop registration → wait
→ purge → install via Forgejo Releases → register → start) without
needing local source or cargo.

Services updated (alphabetical):
  service_aibroker, service_biz, service_books, service_browser,
  service_db, service_editor, service_foundry, service_indexer,
  service_matrixchat, service_osis, service_proxy, service_router,
  service_slides, service_whiteboard.

Per-service edits are mechanical and identical:
  - Add `--download` and `--version: string = "latest"` flags to the
    `start` function signature.
  - Forward `--download=$download --version $version` into the
    `install` call.
  - Pass `$download` into `svc_verify_binaries_fresh` (default false
    is preserved for callers that omit it).

`service_router` additionally gains the same flag (it had none
before this PR despite its `install` accepting `--download`); needed
for heroci bootstrap after `service_proc` and mycelium are up.

Services that do NOT have `--download` in `install` are deliberately
left unchanged (claude, code, codescalers, livekit, mycelium, os).
Mycelium support is the subject of a follow-up session: add release
pipeline to `geomind_code/mycelium_network` + `--download` to
`service_mycelium`.

10 of 11 services that already had `--download` in start (proc, agent,
collab, embedder, logic, office, planner, runner_rhai, shrimp, voice)
are updated to also pass `$download` to `svc_verify_binaries_fresh`,
making the verify-skip uniform.

Validated: nu module aggregate loads cleanly; per-service parse
checks all pass.

Refs: lhumina_code/home#225

Signed-off-by: mik-tf
docs(hero_running): FORGEJO_TOKEN no longer required at boot
All checks were successful
Build and Publish Skills / build-and-publish (pull_request) Successful in 3s
2133cd37ef
The first-time setup snippet implied FORGE_TOKEN was required during
`secrets_edit` (before `service_complete`). With the supervisor
patch in this session (and D-06 forge-token-bootstrap-optional),
the canonical onboarding flow now succeeds even when FORGE_TOKEN is
empty at boot — only forge-touching operations (`secret pull`,
`secret push`, `forge merge`) require it.

Signed-off-by: mik-tf
fix(skills+dispatcher): close --version + dispatcher gap for service_X start --download
All checks were successful
Build and Publish Skills / build-and-publish (pull_request) Successful in 4s
a18f3a2bef
Surfaced during the v0.5.0-pre1 end-to-end test on heroci:

1. **service_proc.nu start was missing --version flag.** All 14 services
   patched in the previous commit got `--download` AND `--version`, but
   service_proc's start function only got `--download` (the --version was
   in install). Without --version on start, `service_proc start
   --download --version v0.5.0-pre1 --reset` errored with "unknown flag
   `version`". Fixed by adding --version to start signature + forwarding
   it into the install call.

2. **Same gap for 8 other services** that already had --download in
   start but were missing --version: agent, collab, logic, office,
   planner, runner_rhai, shrimp, voice. All now consistent with the
   14 services patched in commit 3.

3. **Dispatcher [X start] entries weren't forwarding --download /
   --version.** The dispatcher universally drops the flags on start
   for every service that supports them. Same shape as the dispatcher
   gap fix from session 62 (download forwarding for install aliases),
   but for start. Patched 14 dispatcher entries: aibroker, biz, books,
   browser, db, editor, foundry, indexer, matrixchat, osis, proxy,
   shrimp, slides, whiteboard. Plus [proc start] and [router start].

Result: `service proc start --download --version v0.5.0-pre1 --reset`
via the bareword dispatcher AND `service_proc start --download
--version v0.5.0-pre1 --reset` via direct module form both work
end-to-end. Validated on heroci against the v0.5.0-pre1 release
artifact with no FORGE_TOKEN in env.

Refs: lhumina_code/home#225

Signed-off-by: mik-tf
mik-tf merged commit b89f9b9567 into development 2026-05-07 01:53:08 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_skills!228
No description provided.