[infra][P1] load_init_sh does not follow source directives — env.sh exports silently lost across all service deploys #191

Open
opened 2026-05-01 22:18:00 +00:00 by mik-tf · 0 comments
Owner

Summary

hero_loader.nu's load_init_sh scrapes export lines from ~/hero/cfg/init.sh directly, but does not follow source directives. Anything exported from ~/hero/cfg/env/env.sh (which is sourced by init.sh, not declared in it) is silently dropped before nu modules see it.

This silently breaks every service that depends on env.sh-defined env vars: API keys, JWT secrets, HERO_CONTEXTS, etc. Each failure manifests as a different downstream symptom, hiding the shared root cause.

Symptom chain (observed during 2026-04-30 herodemo redeploy)

  1. HERO_CONTEXTS empty in hero_osis → unknown context names silently fall back to root → context dropdown shows "Root, Root, Root, Root" instead of distinct names. (Per hero_osis#43 silent-fallback issue, the missing env amplifies the bug.)
  2. GROQ_API_KEY / OPENROUTER_API_KEY empty in hero_aibroker"Llama 3.3 70B not available on any configured provider" → hero_books AI summary fails.
  3. ONLYOFFICE_JWT_SECRET not even in env.sh → office editor fails with "ONLYOFFICE_JWT_SECRET is not set on the server".

In each case the env var was present at the shell level (env.sh sets it correctly) but reached the daemon process as 0 chars, because the nu deploy pipeline only reads init.sh's own export lines.

Root cause

tools/hero_loader.nu load_init_sh (in lhumina_code/hero_skills):

  • Reads ~/hero/cfg/init.sh.
  • Greps for lines starting with export .
  • Sets each KEY=VALUE into $env for nu modules.
  • Does not parse source <path> directives.

env.sh is sourced from init.sh (typical bash convention to keep secrets out of the main config file), so its exports are invisible to the loader. A user who sets GROQ_API_KEY=... in env.sh and runs source ~/hero/cfg/env/env.sh interactively gets the var in their shell — but service_X start --reset (which spawns nu with the loader) does not.

Reproduction

On any deploy host:

echo 'export FOO_FROM_ENV_SH="hello"' >> ~/hero/cfg/env/env.sh
cat ~/hero/cfg/init.sh | grep -i 'source.*env.sh'  # confirms env.sh is sourced
nu -c 'use ~/hero/code/hero_skills/tools/hero_loader.nu *; load_init_sh; echo $env.FOO_FROM_ENV_SH?'
# Expected: "hello"
# Actual:   (empty / undefined)

Fix options

Make the loader recursively expand source <path> lines so anything sourceable from init.sh is in scope. Bounded recursion (depth 2-3 max) is sufficient.

Pros: zero migration; matches operator intuition ("if init.sh sources env.sh, env.sh's exports are loaded").
Cons: small loader change; needs careful handling of variable expansion ($ROOTDIR, ${HOME}).

B) Move all required exports into init.sh directly

Stop sourcing env.sh; inline the exports into init.sh. Operationally equivalent but loses the secrets-vs-config separation.

Cons: loses git-ignore convention (init.sh is in repo, env.sh typically isn't).

C) Wrapper that sources env.sh before invoking nu

Wrap every service_X entry point in a bash helper:

#!/bin/bash
source ~/hero/cfg/env/env.sh
exec nu -c "use ...; service_X $@"

Cons: every service entry point grows a shim; easy to forget for new services; the --reset path already spawns sub-processes that need the env, so multiple shim points needed.

D) Move runtime secrets into hero_proc secret store

Stop relying on env vars; use proc secret set <KEY> <value> once at deploy time, services read via proc_client::get_secret(). This is the pattern hero_aibroker already uses (and what unblocked the API-key failure on 2026-04-30).

Pros: per-context scoping, audit trail, no env propagation issues, central rotation.
Cons: each service needs a code change to read from secret store instead of env; slow migration.

Inconsistent patterns to clean up

While diagnosing this, the following inconsistencies surfaced:

  • hero_aibroker reads API keys from hero_proc secret store
  • hero_office_server reads ONLYOFFICE_JWT_SECRET from env (broken under load_init_sh)
  • hero_osis reads HERO_CONTEXTS from env (broken under load_init_sh)
  • service_onlyoffice.nu has hardcoded placeholder OO_DEFAULT_SECRET = "hero-demo-jwt-secret-change-in-prod" baked into the docker run command at action-registration time. Even after env.sh is fixed, the action spec retains the placeholder until --reset re-registers it.

Recommendation

  1. Short term (P1 / before next deploy): fix load_init_sh to follow source directives (option A). Smallest blast radius; zero downstream code changes.
  2. Medium term (P2): standardize on hero_proc secret store for all secrets (option D). API keys + JWT secrets + per-context credentials. Amend the deploy ritual: proc secret set X is now part of service_install_all.
  3. Cleanup: remove the OO_DEFAULT_SECRET placeholder from service_onlyoffice.nu; require ONLYOFFICE_JWT_SECRET to be present (fail-closed) instead of falling back to a public placeholder.

Blast radius

This bug has been latent across every Hero deploy that uses env.sh. Past "phantom" failures attributed to "stale build", "stale binary", or "wrong context" almost certainly include some that were really this. Worth a sweep of past Forge issues for "env var not set" / "API key missing" / "context fallback" reports.

  • hero_osis#43 — silent context fallback to root (this issue makes #43's symptoms much worse since HERO_CONTEXTS is most-often empty due to this bug)
  • hero_proc secret store API (the pattern aibroker already uses)
  • tools/hero_loader.nu load_init_sh in lhumina_code/hero_skills

Signed-off-by: mik-tf

## Summary `hero_loader.nu`'s `load_init_sh` scrapes `export` lines from `~/hero/cfg/init.sh` directly, but does **not** follow `source` directives. Anything exported from `~/hero/cfg/env/env.sh` (which is sourced by init.sh, not declared in it) is silently dropped before nu modules see it. This silently breaks every service that depends on env.sh-defined env vars: API keys, JWT secrets, `HERO_CONTEXTS`, etc. Each failure manifests as a different downstream symptom, hiding the shared root cause. ## Symptom chain (observed during 2026-04-30 herodemo redeploy) 1. **`HERO_CONTEXTS` empty in hero_osis** → unknown context names silently fall back to root → context dropdown shows "Root, Root, Root, Root" instead of distinct names. (Per [hero_osis#43](https://forge.ourworld.tf/lhumina_code/hero_osis/issues/43) silent-fallback issue, the missing env amplifies the bug.) 2. **`GROQ_API_KEY` / `OPENROUTER_API_KEY` empty in hero_aibroker** → `"Llama 3.3 70B not available on any configured provider"` → hero_books AI summary fails. 3. **`ONLYOFFICE_JWT_SECRET` not even in env.sh** → office editor fails with `"ONLYOFFICE_JWT_SECRET is not set on the server"`. In each case the env var was present at the shell level (env.sh sets it correctly) but reached the daemon process as 0 chars, because the nu deploy pipeline only reads init.sh's own `export` lines. ## Root cause `tools/hero_loader.nu` `load_init_sh` (in `lhumina_code/hero_skills`): - Reads `~/hero/cfg/init.sh`. - Greps for lines starting with `export `. - Sets each `KEY=VALUE` into `$env` for nu modules. - **Does not parse `source <path>` directives**. `env.sh` is sourced from init.sh (typical bash convention to keep secrets out of the main config file), so its exports are invisible to the loader. A user who sets `GROQ_API_KEY=...` in env.sh and runs `source ~/hero/cfg/env/env.sh` interactively gets the var in their shell — but `service_X start --reset` (which spawns nu with the loader) does not. ## Reproduction On any deploy host: ```bash echo 'export FOO_FROM_ENV_SH="hello"' >> ~/hero/cfg/env/env.sh cat ~/hero/cfg/init.sh | grep -i 'source.*env.sh' # confirms env.sh is sourced nu -c 'use ~/hero/code/hero_skills/tools/hero_loader.nu *; load_init_sh; echo $env.FOO_FROM_ENV_SH?' # Expected: "hello" # Actual: (empty / undefined) ``` ## Fix options ### A) Fix `load_init_sh` to follow `source` directives (recommended) Make the loader recursively expand `source <path>` lines so anything sourceable from init.sh is in scope. Bounded recursion (depth 2-3 max) is sufficient. **Pros**: zero migration; matches operator intuition ("if init.sh sources env.sh, env.sh's exports are loaded"). **Cons**: small loader change; needs careful handling of variable expansion (`$ROOTDIR`, `${HOME}`). ### B) Move all required exports into init.sh directly Stop sourcing env.sh; inline the exports into init.sh. Operationally equivalent but loses the secrets-vs-config separation. **Cons**: loses git-ignore convention (init.sh is in repo, env.sh typically isn't). ### C) Wrapper that sources env.sh before invoking nu Wrap every `service_X` entry point in a bash helper: ```bash #!/bin/bash source ~/hero/cfg/env/env.sh exec nu -c "use ...; service_X $@" ``` **Cons**: every service entry point grows a shim; easy to forget for new services; the `--reset` path already spawns sub-processes that need the env, so multiple shim points needed. ### D) Move runtime secrets into hero_proc secret store Stop relying on env vars; use `proc secret set <KEY> <value>` once at deploy time, services read via `proc_client::get_secret()`. This is the pattern hero_aibroker already uses (and what unblocked the API-key failure on 2026-04-30). **Pros**: per-context scoping, audit trail, no env propagation issues, central rotation. **Cons**: each service needs a code change to read from secret store instead of env; slow migration. ## Inconsistent patterns to clean up While diagnosing this, the following inconsistencies surfaced: - **hero_aibroker** reads API keys from hero_proc secret store ✅ - **hero_office_server** reads `ONLYOFFICE_JWT_SECRET` from env (broken under load_init_sh) - **hero_osis** reads `HERO_CONTEXTS` from env (broken under load_init_sh) - **`service_onlyoffice.nu`** has hardcoded placeholder `OO_DEFAULT_SECRET = "hero-demo-jwt-secret-change-in-prod"` baked into the docker run command at action-registration time. Even after env.sh is fixed, the action spec retains the placeholder until `--reset` re-registers it. ## Recommendation 1. **Short term (P1 / before next deploy)**: fix `load_init_sh` to follow `source` directives (option A). Smallest blast radius; zero downstream code changes. 2. **Medium term (P2)**: standardize on hero_proc secret store for *all* secrets (option D). API keys + JWT secrets + per-context credentials. Amend the deploy ritual: `proc secret set X` is now part of `service_install_all`. 3. **Cleanup**: remove the `OO_DEFAULT_SECRET` placeholder from `service_onlyoffice.nu`; require `ONLYOFFICE_JWT_SECRET` to be present (fail-closed) instead of falling back to a public placeholder. ## Blast radius This bug has been latent across every Hero deploy that uses env.sh. Past "phantom" failures attributed to "stale build", "stale binary", or "wrong context" almost certainly include some that were really this. Worth a sweep of past Forge issues for "env var not set" / "API key missing" / "context fallback" reports. ## Related - [hero_osis#43](https://forge.ourworld.tf/lhumina_code/hero_osis/issues/43) — silent context fallback to root (this issue makes #43's symptoms much worse since `HERO_CONTEXTS` is most-often empty due to this bug) - hero_proc secret store API (the pattern aibroker already uses) - `tools/hero_loader.nu` `load_init_sh` in `lhumina_code/hero_skills` Signed-off-by: mik-tf
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_skills#191
No description provided.