fix: security and resilience fixes for #106–#112 #114

Merged
sameh-farouk merged 7 commits from fix/issues-106-112 into development 2026-04-21 17:16:10 +00:00
Member

Summary

Fixes the 7 bugs filed in #106–#112, discovered during a security review of the multi-user provisioning and service lifecycle code. Two are security-critical (both confirmed actively leaking on the live dev host), the rest are functional bugs that block common workflows.

Closes #106
Closes #107
Closes #108
Closes #109
Closes #110
Closes #111
Closes #112

Commit layout (per-issue for reviewability)

Commit Issues File
fix(service_proc): close argv leak and fix stop error #106, #109 service_proc.nu
fix(loader): populate $env.MYCELIUM_IP in nushell login shells #112 hero_loader.nu
fix(multi_user_add): pass --forgetoken via env, not argv #108 multiuser.nu
fix(multi_user_add): lock down home and secrets perms #107 multiuser.nu
fix(forge): only accept partial local match when basename is exact #110 forge.nu
fix(secrets): warn on duplicate keys in secrets.toml at load time #111 secrets_lib.nu

Squash at your discretion — the per-issue commits are for review traceability.

Security highlights (read these even if you skim everything else)

#106 — hero_proc argv leak. service_proc start no longer interpolates FORGE_TOKEN/WEBROOT into the bash -c command string. Previously those ended up in /proc/<pid>/cmdline, which is world-readable on Linux, so any user on a shared host could read the invoking user's forge token with plain ps aux — no root, no -e flag. Confirmed leaked on the live dev host. Propagation is now via with-env + sudo --preserve-env.

#107 — default world-readable secrets. multi_user_add now chmod 700 on ~ and ~/hero/code/secrets/, and chmod 600 on secrets.toml/secrets.sh. Also tightens ~/hero/cfg/hero_cfg.toml from 644 to 600 in bridge_save_state. On the dev host before this fix, 13 users' secrets.toml were readable cross-user.

#108 — --forgetoken at provision time. The install.sh and secrets_sync invocations no longer go through sudo bash -c "… FORGE_TOKEN=<T> …". The token is passed via env inheritance with sudo --preserve-env. Closes a transient argv leak during multi_user_add.

Non-security fixes

  • #109: service_proc stop no longer errors out on the final socket-cleanup step (rm -f was a nu builtin — cannot be wrapped with | complete; now uses ^rm -f).
  • #110: forge repo resolver no longer silently picks hero_osis for hero_os. Partial matches now require exact basename; substring-only hits fall through to API resolution.
  • #111: secrets source warns when secrets.toml has duplicate keys, which silently cause rotation to appear not-to-work (TOML last-wins).
  • #112: $env.MYCELIUM_IP / $env.MYCELIUM_BRIDGE are now populated in nu login shells from ~/hero/cfg/hero_cfg.toml via hero_loader.nu. Previously only the bash export in .profile was written, which nu doesn't source.

Operational follow-ups (not in this PR)

These are policy/ops actions, not code changes:

  1. Rotate forge tokens on every current host. Previously-running hero_proc processes leaked them for the life of the process. This PR prevents future leaks but cannot retroactively invalidate tokens other users may have already copied.
  2. Users should rotate anything else sensitive in secrets.toml (Anthropic, OpenAI, DB passwords, mail creds) on shared hosts — the file was cross-user-readable by default until this PR.
  3. Retroactive chmod on existing users: this PR only hardens newly-provisioned accounts. An --audit or similar subcommand for multi_user_add could apply the same chmod/chown to existing users; left as a follow-up.

Testing gap

Changes are static-review-grade only. Recommended validation before merging / on a staging host:

  1. #106: fresh service_proc start on a multi-user host; ps aux | grep FORGE_TOKEN returns no matches in the hero_proc chain.
  2. #107: provision a new user; ls -ld /home/<newuser> is drwx------; ls -l /home/<newuser>/hero/code/secrets/secrets.toml is -rw-------.
  3. #108: multi_user_add foo --forgetoken TESTVAL while watching ps auxe | grep TESTVAL in another shell; TESTVAL should never appear in install.sh or secrets_sync argv.
  4. #109: service_proc stop completes without the "complete only works with external commands" error.
  5. #110: on a host with hero_osis cloned but no hero_os, forge pull lhumina_code/hero_os should clone hero_os fresh (via API) instead of pulling into hero_osis.
  6. #111: add a duplicate key manually to secrets.toml, run secrets source, confirm the warning fires listing the duplicated key.
  7. #112: fresh nu login after a bridge-enabled multi_user_add; $env.MYCELIUM_IP populated without the user adding anything to config_user.nu.

Notes

  • All chmod additions are guarded (test -d / test -f) to be safe on partially-provisioned accounts.
  • #106 and #109 share a commit because they touch the same file and are both small. Happy to split on request.
  • Branch: fix/issues-106-112.
## Summary Fixes the 7 bugs filed in #106–#112, discovered during a security review of the multi-user provisioning and service lifecycle code. Two are security-critical (both confirmed actively leaking on the live dev host), the rest are functional bugs that block common workflows. Closes #106 Closes #107 Closes #108 Closes #109 Closes #110 Closes #111 Closes #112 ## Commit layout (per-issue for reviewability) | Commit | Issues | File | |---|---|---| | `fix(service_proc): close argv leak and fix stop error` | #106, #109 | `service_proc.nu` | | `fix(loader): populate $env.MYCELIUM_IP in nushell login shells` | #112 | `hero_loader.nu` | | `fix(multi_user_add): pass --forgetoken via env, not argv` | #108 | `multiuser.nu` | | `fix(multi_user_add): lock down home and secrets perms` | #107 | `multiuser.nu` | | `fix(forge): only accept partial local match when basename is exact` | #110 | `forge.nu` | | `fix(secrets): warn on duplicate keys in secrets.toml at load time` | #111 | `secrets_lib.nu` | Squash at your discretion — the per-issue commits are for review traceability. ## Security highlights (read these even if you skim everything else) **#106 — hero_proc argv leak.** `service_proc start` no longer interpolates `FORGE_TOKEN`/`WEBROOT` into the `bash -c` command string. Previously those ended up in `/proc/<pid>/cmdline`, which is world-readable on Linux, so any user on a shared host could read the invoking user's forge token with plain `ps aux` — no root, no `-e` flag. Confirmed leaked on the live dev host. Propagation is now via `with-env` + `sudo --preserve-env`. **#107 — default world-readable secrets.** `multi_user_add` now `chmod 700` on `~` and `~/hero/code/secrets/`, and `chmod 600` on `secrets.toml`/`secrets.sh`. Also tightens `~/hero/cfg/hero_cfg.toml` from 644 to 600 in `bridge_save_state`. On the dev host before this fix, 13 users' `secrets.toml` were readable cross-user. **#108 — --forgetoken at provision time.** The `install.sh` and `secrets_sync` invocations no longer go through `sudo bash -c "… FORGE_TOKEN=<T> …"`. The token is passed via env inheritance with `sudo --preserve-env`. Closes a transient argv leak during `multi_user_add`. ## Non-security fixes - **#109**: `service_proc stop` no longer errors out on the final socket-cleanup step (`rm -f` was a nu builtin — cannot be wrapped with `| complete`; now uses `^rm -f`). - **#110**: `forge` repo resolver no longer silently picks `hero_osis` for `hero_os`. Partial matches now require exact basename; substring-only hits fall through to API resolution. - **#111**: `secrets source` warns when `secrets.toml` has duplicate keys, which silently cause rotation to appear not-to-work (TOML last-wins). - **#112**: `$env.MYCELIUM_IP` / `$env.MYCELIUM_BRIDGE` are now populated in nu login shells from `~/hero/cfg/hero_cfg.toml` via `hero_loader.nu`. Previously only the bash `export` in `.profile` was written, which nu doesn't source. ## Operational follow-ups (not in this PR) These are policy/ops actions, not code changes: 1. **Rotate forge tokens** on every current host. Previously-running hero_proc processes leaked them for the life of the process. This PR prevents future leaks but cannot retroactively invalidate tokens other users may have already copied. 2. **Users should rotate** anything else sensitive in `secrets.toml` (Anthropic, OpenAI, DB passwords, mail creds) on shared hosts — the file was cross-user-readable by default until this PR. 3. **Retroactive chmod on existing users**: this PR only hardens newly-provisioned accounts. An `--audit` or similar subcommand for `multi_user_add` could apply the same chmod/chown to existing users; left as a follow-up. ## Testing gap Changes are static-review-grade only. Recommended validation before merging / on a staging host: 1. **#106**: fresh `service_proc start` on a multi-user host; `ps aux | grep FORGE_TOKEN` returns no matches in the hero_proc chain. 2. **#107**: provision a new user; `ls -ld /home/<newuser>` is `drwx------`; `ls -l /home/<newuser>/hero/code/secrets/secrets.toml` is `-rw-------`. 3. **#108**: `multi_user_add foo --forgetoken TESTVAL` while watching `ps auxe | grep TESTVAL` in another shell; `TESTVAL` should never appear in install.sh or secrets_sync argv. 4. **#109**: `service_proc stop` completes without the "complete only works with external commands" error. 5. **#110**: on a host with hero_osis cloned but no hero_os, `forge pull lhumina_code/hero_os` should clone hero_os fresh (via API) instead of pulling into hero_osis. 6. **#111**: add a duplicate key manually to `secrets.toml`, run `secrets source`, confirm the warning fires listing the duplicated key. 7. **#112**: fresh nu login after a bridge-enabled `multi_user_add`; `$env.MYCELIUM_IP` populated without the user adding anything to `config_user.nu`. ## Notes - All chmod additions are guarded (`test -d` / `test -f`) to be safe on partially-provisioned accounts. - `#106` and `#109` share a commit because they touch the same file and are both small. Happy to split on request. - Branch: `fix/issues-106-112`.
- Do not interpolate FORGE_TOKEN/WEBROOT into the bash -c argv; pass
  them via env inheritance using with-env + sudo --preserve-env. The
  old string-based approach put the token in /proc/<pid>/cmdline,
  which is world-readable. Closes #106.

- Use external rm (^rm -f) in the stop-path socket cleanup. The nu
  builtin rm cannot be wrapped with `| complete`, which previously
  caused service_proc stop to error out on the last step. Closes #109.
multi_user_add writes `export MYCELIUM_IP="..."` into .profile and
.bashrc, but nushell is the login shell and does not source those
files. Users therefore hit `Cannot find column 'MYCELIUM_IP'` on
the first `service_router start --address $env.MYCELIUM_IP`.

Load the mycelium section from ~/hero/cfg/hero_cfg.toml at the end
of hero_loader.nu (which is already sourced by config.nu) and set
$env.MYCELIUM_IP / $env.MYCELIUM_BRIDGE when present. Silent no-op
on machines without hero_cfg.toml (non-provisioned / non-bridge
accounts). Closes #112.
The install.sh and secrets_sync invocations used `sudo bash -c
"... FORGE_TOKEN=<T> ..."`, which interpolated the token into
the process argv of the bash wrapper and every child. For the
few seconds that subprocess ran, the token was visible in
`ps auxe` / /proc/<pid>/cmdline to any user on the host.

Propagate the token via env inheritance instead: set in nushell
env via `with-env`, then `sudo --preserve-env=FORGE_TOKEN,...`
keeps it across the privilege drop. The outer `sudo bash -c`
wrapper is no longer needed; feed empty stdin via a nu pipe
instead of the `< /dev/null` redirect.

secrets_sync no longer needs --token on the CLI — it reads from
$env.FORGE_TOKEN when unset.

Closes #108.
By default, newly-provisioned user homes inherit 755/644 perms from
the btrfs template snapshot, leaving ~/hero/code/secrets/secrets.toml
world-readable. On a live host with 13 users, every secrets.toml
(forge tokens, LLM API keys, DB passwords, mail creds) was readable
cross-user via `cat /home/<other>/hero/code/secrets/secrets.toml`.

After the final chown, chmod 700 on ~, 700 on ~/hero/code/secrets/,
and 600 on secrets.toml + secrets.sh when present. Root retains
access; no other non-root user can traverse or read.

Also tighten hero_cfg.toml from 644 to 600 in bridge_save_state —
only the user's own tools consume that file.

Closes #107.
forge_resolve_local's partial-match branch used
`str contains -i $repo_part`, which made "hero_os" match "hero_osis"
(because "hero_os" is a substring of "hero_osis"). When hero_osis
was cloned first (via service_osis install), any subsequent
`forge pull lhumina_code/hero_os` or `service_os install` resolved
to hero_osis's directory and failed with:

    error: no bin target named `hero_os` in default-run packages

Keep the existing `str contains -i` filter (still useful for other
call paths) but require the selected match's basename to equal
$repo_part exactly before accepting it. Non-exact substring hits
now fall through to null, letting the API-based `forge_resolve`
look up the canonical name and clone it fresh.

Closes #110.
TOML parsers use last-wins for duplicate keys. When a user rotates a
secret by editing secrets.toml and accidentally adds a second entry
for an already-present key (e.g. a second forge_token below the
[cfg] one), the stale duplicate silently wins on load and the new
value never takes effect. This manifests as "I rotated my token but
forge user still returns 401" — misdiagnosed as a scope/auth issue.

The write path (secrets_write_key) already replaces in place, so
rotation via `secrets_set` is safe. Manual edits are where
duplicates enter. Detect them in `secrets source` and print a
visible warning listing the duplicated keys, so users can fix with
`secrets_edit` before the wrong value sticks.

Closes #111.
The first attempt at #112 only read ~/hero/cfg/hero_cfg.toml, which
is written by the newer bridge_save_state. Users provisioned before
that refactor have state in the legacy /etc/hero-users/<user>.env
(shell-style KEY=VALUE), so $env.MYCELIUM_IP still came up empty for
them on fresh nu shells.

Check the TOML path first, fall back to the legacy env file. Both
paths feed into the same two env vars at the end. No change when
neither file exists (non-bridge / non-provisioned accounts).

Still closes #112.
sameh-farouk force-pushed fix/issues-106-112 from 05325146f2 to b11a766397 2026-04-21 17:11:30 +00:00 Compare
sameh-farouk merged commit 27ae65e216 into development 2026-04-21 17:16:10 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_skills!114
No description provided.