[infra][P1] load_init_sh does not follow source directives — env.sh exports silently lost across all service deploys #191
Labels
No labels
prio_critical
prio_low
type_bug
type_contact
type_issue
type_lead
type_question
type_story
type_task
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/hero_skills#191
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
hero_loader.nu'sload_init_shscrapesexportlines from~/hero/cfg/init.shdirectly, but does not followsourcedirectives. Anything exported from~/hero/cfg/env/env.sh(which is sourced by init.sh, not declared in it) is silently dropped before nu modules see it.This silently breaks every service that depends on env.sh-defined env vars: API keys, JWT secrets,
HERO_CONTEXTS, etc. Each failure manifests as a different downstream symptom, hiding the shared root cause.Symptom chain (observed during 2026-04-30 herodemo redeploy)
HERO_CONTEXTSempty in hero_osis → unknown context names silently fall back to root → context dropdown shows "Root, Root, Root, Root" instead of distinct names. (Per hero_osis#43 silent-fallback issue, the missing env amplifies the bug.)GROQ_API_KEY/OPENROUTER_API_KEYempty in hero_aibroker →"Llama 3.3 70B not available on any configured provider"→ hero_books AI summary fails.ONLYOFFICE_JWT_SECRETnot even in env.sh → office editor fails with"ONLYOFFICE_JWT_SECRET is not set on the server".In each case the env var was present at the shell level (env.sh sets it correctly) but reached the daemon process as 0 chars, because the nu deploy pipeline only reads init.sh's own
exportlines.Root cause
tools/hero_loader.nuload_init_sh(inlhumina_code/hero_skills):~/hero/cfg/init.sh.export.KEY=VALUEinto$envfor nu modules.source <path>directives.env.shis sourced from init.sh (typical bash convention to keep secrets out of the main config file), so its exports are invisible to the loader. A user who setsGROQ_API_KEY=...in env.sh and runssource ~/hero/cfg/env/env.shinteractively gets the var in their shell — butservice_X start --reset(which spawns nu with the loader) does not.Reproduction
On any deploy host:
Fix options
A) Fix
load_init_shto followsourcedirectives (recommended)Make the loader recursively expand
source <path>lines so anything sourceable from init.sh is in scope. Bounded recursion (depth 2-3 max) is sufficient.Pros: zero migration; matches operator intuition ("if init.sh sources env.sh, env.sh's exports are loaded").
Cons: small loader change; needs careful handling of variable expansion (
$ROOTDIR,${HOME}).B) Move all required exports into init.sh directly
Stop sourcing env.sh; inline the exports into init.sh. Operationally equivalent but loses the secrets-vs-config separation.
Cons: loses git-ignore convention (init.sh is in repo, env.sh typically isn't).
C) Wrapper that sources env.sh before invoking nu
Wrap every
service_Xentry point in a bash helper:Cons: every service entry point grows a shim; easy to forget for new services; the
--resetpath already spawns sub-processes that need the env, so multiple shim points needed.D) Move runtime secrets into hero_proc secret store
Stop relying on env vars; use
proc secret set <KEY> <value>once at deploy time, services read viaproc_client::get_secret(). This is the pattern hero_aibroker already uses (and what unblocked the API-key failure on 2026-04-30).Pros: per-context scoping, audit trail, no env propagation issues, central rotation.
Cons: each service needs a code change to read from secret store instead of env; slow migration.
Inconsistent patterns to clean up
While diagnosing this, the following inconsistencies surfaced:
ONLYOFFICE_JWT_SECRETfrom env (broken under load_init_sh)HERO_CONTEXTSfrom env (broken under load_init_sh)service_onlyoffice.nuhas hardcoded placeholderOO_DEFAULT_SECRET = "hero-demo-jwt-secret-change-in-prod"baked into the docker run command at action-registration time. Even after env.sh is fixed, the action spec retains the placeholder until--resetre-registers it.Recommendation
load_init_shto followsourcedirectives (option A). Smallest blast radius; zero downstream code changes.proc secret set Xis now part ofservice_install_all.OO_DEFAULT_SECRETplaceholder fromservice_onlyoffice.nu; requireONLYOFFICE_JWT_SECRETto be present (fail-closed) instead of falling back to a public placeholder.Blast radius
This bug has been latent across every Hero deploy that uses env.sh. Past "phantom" failures attributed to "stale build", "stale binary", or "wrong context" almost certainly include some that were really this. Worth a sweep of past Forge issues for "env var not set" / "API key missing" / "context fallback" reports.
Related
HERO_CONTEXTSis most-often empty due to this bug)tools/hero_loader.nuload_init_shinlhumina_code/hero_skillsSigned-off-by: mik-tf
mik-tf referenced this issue from lhumina_code/hero_demo2026-05-02 03:28:52 +00:00