my_compute_zos_server service.toml missing [[env]] for TFGRID_NETWORK / TFGRID_NODE_IDS / TFGRID_MNEMONIC — supervised path defaults to mainnet regardless of hero_proc secret values #127

Closed
opened 2026-05-25 15:28:17 +00:00 by mik-tf · 2 comments
Owner

crates/my_compute_zos_server/service.toml only declares [[env]] blocks for PATH_ROOT, HERO_SOCKET_DIR, and RUST_LOG. The chain-targeting env vars TFGRID_NETWORK, TFGRID_NODE_IDS, and TFGRID_MNEMONIC are not declared, so when hero_proc supervises the daemon, those values are not injected into the process environment (the executor composes env from context secrets + service spec env; missing keys are not added).

Result: setting hero_proc secret set core/TFGRID_NETWORK qa then hero_proc service restart my_compute_zos_server does NOT switch the daemon to QA — the daemon reads std::env::var(\"TFGRID_NETWORK\") at startup, gets Err(NotPresent), and falls back to the default main per crates/my_compute_zos_server/src/config.rs:42.

Workaround used in s158: stop the supervised daemon, launch manually via nohup with explicit env (mirrors the TFGRID_DEBUG=1 workaround from s157d). Loses the supervisor + restart-on-crash guardrails.

Right fix: add three [[env]] blocks in service.toml with default = \"\", mirroring Lessons #17 + #19 (any env var the daemon reads must be declared in service.toml so hero_proc + lab propagate it). Then the existing hero_proc supervisor context-secrets injection at crates/hero_proc_server/src/supervisor/executor.rs:61-71 will route the secret values into the process env at spawn time.

Caught during s158 admin-on-TFGrid setup when pivoting from mainnet to QAnet.

`crates/my_compute_zos_server/service.toml` only declares `[[env]]` blocks for `PATH_ROOT`, `HERO_SOCKET_DIR`, and `RUST_LOG`. The chain-targeting env vars `TFGRID_NETWORK`, `TFGRID_NODE_IDS`, and `TFGRID_MNEMONIC` are not declared, so when hero_proc supervises the daemon, those values are not injected into the process environment (the executor composes env from context secrets + service spec env; missing keys are not added). Result: setting `hero_proc secret set core/TFGRID_NETWORK qa` then `hero_proc service restart my_compute_zos_server` does NOT switch the daemon to QA — the daemon reads `std::env::var(\"TFGRID_NETWORK\")` at startup, gets `Err(NotPresent)`, and falls back to the default `main` per `crates/my_compute_zos_server/src/config.rs:42`. Workaround used in s158: stop the supervised daemon, launch manually via `nohup` with explicit env (mirrors the `TFGRID_DEBUG=1` workaround from s157d). Loses the supervisor + restart-on-crash guardrails. Right fix: add three `[[env]]` blocks in service.toml with `default = \"\"`, mirroring Lessons #17 + #19 (any env var the daemon reads must be declared in service.toml so hero_proc + lab propagate it). Then the existing hero_proc supervisor context-secrets injection at `crates/hero_proc_server/src/supervisor/executor.rs:61-71` will route the secret values into the process env at spawn time. Caught during s158 admin-on-TFGrid setup when pivoting from mainnet to QAnet.
Author
Owner

Correcting the diagnosis on this issue after re-reading the canonical hero_proc architecture.

The fix is NOT to add [[env]] blocks to service.toml. Per the hero_proc_secrets_and_meta skill (canonical), every _admin and _server process must source all configuration exclusively from the hero_proc secret store via hero_proc_sdk::secret_get, not from OS env, not from a .env file, not from service.toml [[env]].

The actual root cause is in crates/my_compute_zos_server/src/util.rs::load_env() and crates/hero_compute_sdk/src/lib.rs::load_env(). Both read ~/hero/var/.env into process env at startup. Config::from_env() then calls std::env::var("TFGRID_NETWORK") etc. This whole chain is off-pattern: .env files are not a supported config source for hero-supervised processes.

The proper fix is a Config::from_hero_proc() method that connects to the hero_proc UDS and calls secret_get for each TFGrid-namespaced key (TFGRID_MNEMONIC, TFGRID_NETWORK, plus operational settings). The Config struct already has a hero_proc_client() helper and a set_secret() writer; we just need the symmetric reader. Standalone CLI modes (sign_auth, cancel_contracts, delete_all_vms) should use the same client.

Plan: ship the from_hero_proc() refactor for my_compute_zos_server in this session, file follow-ups for the four sibling binaries (my_compute_mos_server, my_compute_explorer_server, my_compute_explorer_admin, my_compute_zos_admin) that all currently call hero_compute_sdk::load_env(). After the refactor lands, util::load_env() and hero_compute_sdk::load_env() become deprecated and can be removed across the workspace in a sweep.

Reference: hero_proc_secrets_and_meta skill, Hero Compute Suite docs.

Correcting the diagnosis on this issue after re-reading the canonical hero_proc architecture. The fix is NOT to add `[[env]]` blocks to `service.toml`. Per the `hero_proc_secrets_and_meta` skill (canonical), every `_admin` and `_server` process must source all configuration exclusively from the hero_proc secret store via `hero_proc_sdk::secret_get`, not from OS env, not from a `.env` file, not from `service.toml [[env]]`. The actual root cause is in `crates/my_compute_zos_server/src/util.rs::load_env()` and `crates/hero_compute_sdk/src/lib.rs::load_env()`. Both read `~/hero/var/.env` into process env at startup. `Config::from_env()` then calls `std::env::var("TFGRID_NETWORK")` etc. This whole chain is off-pattern: `.env` files are not a supported config source for hero-supervised processes. The proper fix is a `Config::from_hero_proc()` method that connects to the hero_proc UDS and calls `secret_get` for each TFGrid-namespaced key (`TFGRID_MNEMONIC`, `TFGRID_NETWORK`, plus operational settings). The `Config` struct already has a `hero_proc_client()` helper and a `set_secret()` writer; we just need the symmetric reader. Standalone CLI modes (sign_auth, cancel_contracts, delete_all_vms) should use the same client. Plan: ship the `from_hero_proc()` refactor for `my_compute_zos_server` in this session, file follow-ups for the four sibling binaries (`my_compute_mos_server`, `my_compute_explorer_server`, `my_compute_explorer_admin`, `my_compute_zos_admin`) that all currently call `hero_compute_sdk::load_env()`. After the refactor lands, `util::load_env()` and `hero_compute_sdk::load_env()` become deprecated and can be removed across the workspace in a sweep. Reference: hero_proc_secrets_and_meta skill, Hero Compute Suite docs.
Author
Owner

Closed by hero_compute 32a2e2eConfig::from_hero_proc() async method reads TFGRID_MNEMONIC / TFGRID_NETWORK / slice settings from the hero_proc secret store via hero_proc_sdk::secret_get. Replaces the off-pattern util::load_env() + Config::from_env() chain that read ~/hero/var/.env into process env. Per the canonical hero_proc_secrets_and_meta skill — already explained in comment 36979. Sibling binaries (my_compute_mos_server, my_compute_explorer_server, my_compute_explorer_admin, my_compute_zos_admin) still call hero_compute_sdk::load_env() and need the same treatment — tracked as new follow-up below.

Closed by [hero_compute `32a2e2e`](https://forge.ourworld.tf/lhumina_code/hero_compute/commit/32a2e2e) — `Config::from_hero_proc()` async method reads TFGRID_MNEMONIC / TFGRID_NETWORK / slice settings from the hero_proc secret store via `hero_proc_sdk::secret_get`. Replaces the off-pattern `util::load_env() + Config::from_env()` chain that read `~/hero/var/.env` into process env. Per the canonical `hero_proc_secrets_and_meta` skill — already explained in [comment 36979](https://forge.ourworld.tf/lhumina_code/hero_compute/issues/127#issuecomment-36979). Sibling binaries (`my_compute_mos_server`, `my_compute_explorer_server`, `my_compute_explorer_admin`, `my_compute_zos_admin`) still call `hero_compute_sdk::load_env()` and need the same treatment — tracked as new follow-up below.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_compute#127
No description provided.