Converge deployer on development RPC stack #31

Merged
mik-tf merged 184 commits from development_mik into development 2026-06-24 04:33:07 +00:00
Owner

Normalizes deployer OpenRPC methods for the development macro stack, keeps generated params single-input, installs hero_components with the admin app set, and wires deployer secret operations to the restored context-aware hero_proc SDK.

Refs: #30
Refs: lhumina_code/hero_proc#163

Signed-by: mik-tf mik-tf@noreply.invalid

Normalizes deployer OpenRPC methods for the development macro stack, keeps generated params single-input, installs hero_components with the admin app set, and wires deployer secret operations to the restored context-aware hero_proc SDK. Refs: https://forge.ourworld.tf/lhumina_code/hero_os_tfgrid_deployer/issues/30 Refs: https://forge.ourworld.tf/lhumina_code/hero_proc/issues/163 Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: scope the per-tester base bundle to the demo app set
All checks were successful
lab publish / publish (push) Successful in 10m5s
895bb38bad
The base bundle a fresh tester installs is now the demo set — cockpit,
planner, slides, whiteboard, agent, voice — on top of the always-on base
(proxy, router, supervisor, data store, orchestrator). Drop hero_memory,
hero_books, and hero_biz, which are not part of this demo; they remain
install-on-demand.

Add hero_db ahead of the apps: slides persists to it, and without it
slides logs "hero_db socket unreachable" and can fail its first boot.

Also fix two install aborts the trim exposed:
- the shared-engine wiring no longer restarts hero_memory_server, which is
  no longer installed (restarting an absent service aborts the install).
- the default-library seed is forced empty when hero_books is not in the
  bundle, instead of falling back to a HERO_BOOKS_DEFAULT_REPOS value that
  may linger in the deployer's environment and re-trigger a hero_books
  restart.

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: drop hero_agent from the base bundle
All checks were successful
lab publish / publish (push) Successful in 23m4s
55525543b5
hero_agent is the old assistant and is not the path forward (it does not
ship from the stable branch and its admin UI binary fails to register
because it depends on the bare hero_agent CLI, which cannot run as a
service). Remove it from the per-tester base bundle so a fresh tester does
not show a stopped agent tile; the new agent is added separately once it
ships from the stable branch. The shared-engine wiring no longer restarts
hero_agent_server or sets its routing-mode knob, since neither is installed.

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(deployer): add the Kimi assistant to the tester base bundle + wire its MCP and key
All checks were successful
lab publish / publish (push) Successful in 23m29s
59e78914a2
Add hero_kimi (built from hero_kimi_rust) to the per-tester stack so every
tester gets the Kimi AI assistant. The component name is the binary prefix
hero_kimi; the install download resolves it to the hero_kimi_rust repo via the
now-activated COMPONENT_REPO map, while the start loop matches hero_kimi_web.

Write two per-tester files into ~/.kimi at install time: config.toml points the
agent at OpenRouter and reads the key from the OPENAI_API_KEY process env (never
on disk), and mcp.json registers the tester's own planner and whiteboard as MCP
servers reached through the router's local MCP gateway, addressed as the tester.

Proven live: install adds kimi to the bundle, both config files land, the router
MCP gateway exposes the planner (71 tools) and whiteboard (88 tools) surfaces and
tool calls create content on both.

Signed-by: mik-tf <mik-tf@noreply.invalid>
fix(deployer): write the Kimi assistant config with the working provider type
All checks were successful
lab publish / publish (push) Successful in 23m40s
054fc12822
The per-tester ~/.kimi/config.toml declared the OpenRouter provider as
type "openai_legacy", but the assistant's agent implements only its built-in
"kimi" chat provider (which itself speaks the OpenAI-compatible
chat/completions API and so drives OpenRouter via base_url). The unsupported
type meant the agent never built an LLM client and every chat failed with
"LLM is not set". Switch the provider type to "kimi" and update the comment to
record that the key is read from the OPENROUTER_API_KEY / OPENAI_API_KEY
process env, with api_key left empty on disk.

Signed-by: mik-tf <mik-tf@noreply.invalid>
fix(deployer): never publish a tester without a login gate; self-heal the web gateway at install
All checks were successful
lab publish / publish (push) Successful in 10m3s
6641a6a5e5
A tester provisioned through the admin add -> provision -> install flow
could end up live with no login gate. At provision time the web gateway
deploy can return an error to the deployer after the gateway contract was
already created on the grid (or time out), so the deployer recorded no
gateway domain. With the domain empty, the per-tester OAuth app step was
skipped, and install pushed empty OAuth credentials to the tester, so its
cockpit was served with no Forge sign-in gate (the admin also showed no
Cockpit URL).

Changes:
- Extract the gateway deploy + persist into a shared ensure_webgateway
  helper, used by both provision and install.
- install_hero_stack now self-heals: if the row has no gateway domain it
  re-runs ensure_webgateway, and if the per-tester OAuth app is missing it
  creates it, then fails closed (refuses the install) unless a full login
  gate is present. A single Install/Reinstall click repairs a tester that
  came up without a Cockpit URL. An explicit DEPLOYER_ALLOW_INSECURE_INSTALL=1
  override remains for intentional debugging installs.
- Admin shows a "Set up gateway & install" action for a ready VM that has
  no gateway domain yet.

The underlying daemon contract gap (deploy returning an error after the
contract is live, and the lack of a gateway lookup) is filed at
lhumina_code/hero_compute#133.
Tracking: lhumina_code/home#253.

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(admin): show a loading spinner on VM action buttons
All checks were successful
lab publish / publish (push) Successful in 23m50s
9700e33cee
The Provision / Install / Reinstall / Set up gateway / Destroy / Delete
buttons were plain synchronous form POSTs with no client feedback, so the
page looked frozen during the slow on-chain grid call. Add a delegated
submit handler that disables the clicked submit button and swaps in a
Bootstrap spinner with an active label (Provisioning..., Installing...,
Destroying..., etc.). The page stays on screen with the spinner until the
server returns the re-rendered page.

lhumina_code/home#252

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: enable whiteboard public share on tester install
All checks were successful
lab publish / publish (push) Successful in 23m45s
b39a2dabf1
Set core/HERO_PROXY_PUBLIC_WHITEBOARD_SHARE=1 on a tester at install time
so a freshly provisioned sandbox tester comes up with whiteboard share
links reachable without a login, matching the live tester. The whiteboard
backend still re-scopes each shared call to its token.

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(deployer): simplify tester onboarding (no SSH key needed; register existing Forge users)
All checks were successful
lab publish / publish (push) Successful in 23m53s
b232314bc2
Sandbox testers use the cockpit web apps only and never open a shell on
their VM, so provisioning no longer reads or requires a tester SSH key.
A provisioned VM gets only the shared installer key (so the install step
can SSH in and run setup) and the workspace admin keys. This removes the
"upload an SSH key first" onboarding step and the provisioning-disabled-
without-a-key failure mode; the admin UI no longer shows the SSH-key
warning or gates the Provision button on it.

Adds an "add existing user" path so the admin can onboard someone who
already has a forge.ourworld.tf account: a new deployer.add_existing_user
RPC (verifies the account on Forge, pulls display name and email from the
profile, writes the deployer row, creates no account and sets no password)
plus a matching admin form. They sign in with their existing credentials
over SSO.

lhumina_code/home#247

Signed-by: mik-tf <mik-tf@noreply.invalid>
fix(deployer): onboard existing users with mixed-case names + clearer create-user result
All checks were successful
lab publish / publish (push) Successful in 23m29s
78c242e5a3
Registering an existing Forge account whose username has uppercase (or other
characters not valid in a hostname) would have produced an invalid web gateway
name and a half-provisioned VM. The gateway name is now lowercased from the
username, and add_existing_user rejects up front any username that cannot form
a valid sandbox web address (letters and digits only), so we never register a
user we then cannot provision.

Also fixes the "Create user" result when the account already exists: it now
says clearly that the account already exists and no password was changed, and
points to "Add existing user", instead of a "User created" heading with an
empty initial-password field.

lhumina_code/home#247

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(deployer): welcome email on ready + unified add-user flow
Some checks failed
lab publish / publish (push) Has been cancelled
88e761e1e4
Port an EmailProvider trait + Resend implementation (ureq 3.x, with a
dev-mode fallback when no key is set) into the deployer and send a
best-effort "your Hero sandbox is ready" email at the install ready
transition, the first point at which the cockpit URL and the login gate
both exist. The recipient is read from the user row; a send failure only
logs and never affects the install result.

Merge the two add-user admin cards into one "Add a user" form with an
Existing-Forge-account vs Create-new toggle, and add an optional email
override to add_existing_user (openrpc.json + regenerated client) so an
existing user whose Forge profile email is private still gets the mail.

Sender defaults to "Hero OS <noreply@hero.lhumina.org>", overridable via
the EMAIL_FROM_ADDRESS / EMAIL_FROM_NAME env without a rebuild; the key is
read from RESEND_API_KEY. The new [[env]] blocks need the service
re-registered (not just restarted) to take effect.

Signed-by: mik-tf <mik-tf@noreply.invalid>
chore(deployer): default email sender to noreply@mail.lhumina.org
All checks were successful
lab publish / publish (push) Successful in 23m39s
8c2869b139
Align the baked EMAIL_FROM_ADDRESS default with the verified Resend
sending subdomain (mail.lhumina.org) so a fresh deploy is correct without
an env override. Still overridable via EMAIL_FROM_ADDRESS.

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(deployer): single welcome email states username + login, plus username slug normalization
All checks were successful
lab publish / publish (push) Successful in 22m57s
b24c757c63
Welcome email: the one email sent at install->ready now states the login
explicitly and links straight at the gated cockpit app, not the bare domain.
A new account carries its one-time Forge password (generated at create time,
stashed on the user row via a new nullable temp_password column [migration
M8], and wiped after a real send); an existing Forge account is told to use
its existing password (no temp password is invented). Still a single email,
fired once at ready. Dev-mode sends (no API key) skip the wipe so the stashed
password is not lost when nothing was actually delivered.

Username slug: add gateway_slug() — lowercase then keep [a-z0-9] — as the
single canonical transform from a Forge username to the gateway / workload /
DNS label, used in ensure_webgateway and add_existing_user. A username like
mik-tf now onboards as miktf instead of being rejected; add_existing_user
only refuses a name with no letters or digits at all. The Forge username
itself is kept in its canonical case for SSO identity (proxy auth is
case-insensitive); only the web-address label is lowercased.

Refresh the create_user next_steps: step 2 no longer says the admin sends the
cockpit URL out of band — it now arrives by email once the sandbox is ready.

Tests: gateway_slug normalization, temp_password set/get/clear round-trip
(also exercises M8), welcome-email new-vs-existing-account branches and the
cockpit app link. 98 server tests green; fmt + clippy clean.

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(deployer): per-tester build identity + check-for-updates
All checks were successful
lab publish / publish (push) Successful in 23m28s
094bccef8d
Record what build each tester VM is running and surface whether a newer
build is available, productizing the recurring "did the update land?" gap
from the hand-driven fast-path deploys.

- M9 adds vms.installed_releases: at each successful install the deployer
  snapshots the source commit (target_commitish) of every installed
  component's rolling `latest` Forgejo release and stores it as JSON.
  Source commits, not binary md5, so the snapshot is immune to the UPX
  pre-pack md5 wrinkle. When a repo publishes a branch name as
  target_commitish, the staleness diff falls back to the release publish
  time so those repos are still tracked.
- The snapshot is captured at install START and written only on ready, so
  it can never claim a build newer than what actually landed.
- The tracked set is derived from the install manifest
  (A30_STACK_COMPONENTS via a component->repo map mirroring
  setup-binaries.sh), so components added to the bundle later are tracked
  automatically.
- New read-only deployer.check_build_updates RPC diffs the stored snapshot
  against current `latest`, run on demand (an admin button) not on the
  status poll, to bound Forge load.
- Admin user_detail gains a Build column (per-component commits, capture
  date) and a "Check for updates" button; the reinstall action is
  relabeled "Update / Reinstall".
- setup-binaries.sh passes --force so a reinstall actually re-pulls current
  `latest` instead of skipping already-present binaries; first installs
  have no cache so it is a no-op there.

Updating a machine reuses install_hero_stack (self-heals, fails closed,
holds the installing lock) rather than a bespoke partial-update path.

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(deployer): one-click Add & set up onboarding + node capacity awareness
Some checks failed
lab publish / publish (push) Failing after 35s
be7312121b
Add a single "Add & set up" action to the admin users page that chains
register/create user, provision VM, and install the Hero stack in one
go, with staged progress (adding, provisioning, installing, ready) and a
copy-ready cockpit URL. The per-step buttons stay on the user detail
page as the resume path for a failed or deferred step. The browser
orchestrates the existing calls via three new JSON admin routes
(onboard-create / provision / install) that wrap the existing SDK
methods; sequencing stays client-side. A new account's one-time password
is surfaced inside the flow; the welcome email still fires only when the
install reaches ready.

Add node capacity awareness so a full or offline node is handled before
anything is created:
- New read-only deployer.node_capacity RPC reads ComputeService
  list_nodes (total + online status) and list_slices (per-slice
  free/used) and reports free/total slots plus "room for N more testers"
  (free slots divided by the fixed demo-bundle slice count).
- The users form shows a live "room for N more testers" readout, and the
  same capacity feeds a hard preflight inside provision_vm that refuses
  up front (no contract created) when the node is full or offline. The
  preflight fails open on unknown capacity, so it only ever turns a
  guaranteed failure into a fast, clean rejection, never a new blocker.

Proven end to end on the QA node: the one-click chain created,
provisioned and installed a throwaway tester to ready (login gate 302,
welcome email sent, build snapshot recorded), the readout tracked 4, 3,
4 testers as it was added and removed, and teardown was clean.

Closes lhumina_code/home#255

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(deployer): set email sender and default Kimi key from the admin dashboard
Some checks failed
lab publish / publish (push) Failing after 44s
5dfc626a85
Add a Service setup panel to the deployer admin dashboard so an operator can
configure the welcome-email sender and the default tester assistant key from
the browser instead of by hand on the server.

- email.rs: read the Resend sender config (api key, from address, from name)
  live from the secret store at send time instead of the start-time env
  snapshot, so a dashboard change applies on the next email with no deployer
  restart. Replaces from_env with from_parts plus send_welcome_with.
- web.rs: new deployer.get_service_config (presence booleans plus the
  non-secret from fields only, never the keys) and deployer.set_service_config
  (writes only the provided non-empty values), plus an admin_secret_value
  helper that reads the local hero_proc store.
- ssh.rs: the tester install now writes the Kimi subscription config
  (api.kimi.com/coding/v1, model kimi-for-coding) so the assistant has web
  search and fetch, and seeds the operator default key into the tester
  core/KIMI_API_KEY slot then restarts hero_kimi_web so the assistant works on
  first login. A tester can still override the key in the cockpit settings.
- admin: Service setup card with write-only key fields and set/not-set badges,
  wired to the new RPCs over the generated SDK client.
- openrpc.json: the two new methods and their output schemas.

Keys are write-only in the UI and never returned or logged. Seeded keys are
expected to be spend-capped and rotated.

Refs lhumina_code/home#256

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(deployer): per-tester assistant key seeding and welcome email controls
All checks were successful
lab publish / publish (push) Successful in 23m55s
b76da1e9d6
Generalize the single default Kimi key into a provider registry (Groq,
OpenRouter, SambaNova, Kimi). The admin dashboard stores a default key per
provider; when adding a tester the operator seeds all configured keys or a
chosen subset, or none for a bring-your-own-key tester. Kimi keeps its
dedicated path (assistant config plus hero_kimi_web restart); the rest are
plain core secret slots a consumer reads from its process env, so seeding a
key whose consumer is not installed yet is harmless and future-proof.

Add welcome-email controls: a per-tester send toggle and an instance-wide
email master switch, both defaulting to on so existing machines are
unchanged with no migration. The operator can customize the welcome email
subject, opening paragraph, and sign-off (the login link and sign-in line
are always system-rendered, so a customization cannot break the email), and
send a test copy to a chosen address before any tester receives one.

Wire changes live in openrpc.json (the SDK client is regenerated by the
openrpc_client! macro): seed_providers and send_welcome_email params on
install_hero_stack, the provider and email fields on get/set_service_config,
and a new send_test_welcome method.

lhumina_code/home#256

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(deployer-admin): navbar plus dedicated Service setup page, honest per-tester key checkboxes
All checks were successful
lab publish / publish (push) Successful in 23m49s
7e263fbf61
The Service setup card was buried on the home page with no navbar link, so
configuring keys and email wording was undiscoverable. Add a top nav with
Overview, Users, and Service setup (with active-page highlighting), move the
configuration to its own /settings page split into Email sender, Default
assistant keys, and Welcome email sections, and turn the home page into a
launcher with a Service setup card.

On the add-tester form, the per-provider seed checkboxes now read the live
configuration and disable any provider with no key set, with an inline link
to Service setup, so the selection reflects what will actually happen.

lhumina_code/home#256

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(deployer-admin): add an admin Manual page
Some checks failed
lab publish / publish (push) Has been cancelled
1968f50616
Add a Manual tab to the deployer admin dashboard, mirroring the tester
cockpit's manual. A new /manual page renders a Markdown admin guide via the
shared markdown viewer component, served as text from /manual.md. The guide
covers service setup (keys, email sender and on/off, welcome email wording and
the test send), adding a tester (existing vs new account, the two buttons, and
the per-tester setup options), managing testers, and a short FAQ. Adds a Manual
link to the navbar and a Manual card on the overview.

lhumina_code/home#256

Signed-by: mik-tf <mik-tf@noreply.invalid>
style(deployer-admin): rename Service setup nav to Settings, drop nav icons
Some checks failed
lab publish / publish (push) Has been cancelled
4bcd127977
Match the rest of the navbar (Overview and Users have no icons) by removing the
gear and book icons from the Settings and Manual links, and rename the "Service
setup" label to "Settings" everywhere user-facing (navbar, overview card, page
heading, and the manual) so it matches the /settings route.

lhumina_code/home#256

Signed-by: mik-tf <mik-tf@noreply.invalid>
fix(deployer-admin): align the Send test button with its input on the Settings preview
Some checks failed
lab publish / publish (push) Failing after 41s
fc9d968bad
The preview row used align-items-end, which dropped the button to the help-text
line instead of lining it up with the email input. Put the input and button on
one row with the label above and help text below.

lhumina_code/home#256

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(deployer): manage admin SSH keys from the dashboard
Some checks failed
lab publish / publish (push) Has been cancelled
a5a43240ac
Admin SSH keys (the operators' public keys injected into every tester for
shell access) can now be viewed, added, and removed from the admin Settings
page instead of only as a server-side secret set by hand.

- New deployer.set_admin_ssh_keys RPC writes the full list to the
  core/ADMIN_SSH_PUBKEYS secret (empty clears it); each entry is validated
  as an SSH public key. get_service_config returns the list (public keys)
  plus a master-installer-key presence flag.
- Provisioning reads ADMIN_SSH_PUBKEYS live from the secret store (env
  fallback) so a dashboard edit applies to the next tester with no deployer
  restart, matching the email-config live-read pattern.
- Settings page gains an "Admin SSH keys" editor; openrpc.json + SDK
  regenerated; unit tests for the pubkey validation.

Applies to newly provisioned testers; recreate a tester to refresh its keys
(sandbox tier). Part of lhumina_code/home#256

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(deployer): per-tester cockpit allowlist + opt-in tester SSH key
All checks were successful
lab publish / publish (push) Successful in 23m19s
baa04fc4ba
Admins can now manage, from the user-detail page, which Forge accounts may
sign in to a tester's cockpit, and optionally grant a technical tester shell
access to their VM.

- M10 adds users.extra_allowed_users + users.tester_ssh_pubkey, keyed to the
  user so both survive a VM recreate (plain ALTER, not a vms rebuild).
- Cockpit allowlist: the effective list is force-unioned server-side from the
  admin set (ADMIN_FORGE_USERS) + the tester's own username + the operator
  extras, so editing the extras can never lock the team or the tester out of
  the cockpit (that slot is the sole gate for the whole cockpit). Folded into
  the install-time allowlist; applies on the next install/reinstall.
- Tester SSH key: opt-in, off by default; injected at the next provision when
  set. Sandbox/testing tier, so recreate the VM to apply. A tester shell can
  read the still-shared provider keys, so it is operator-set per tester.
- New RPCs get_tester_access / set_tester_allowlist / set_tester_ssh_keys
  (openrpc.json SSOT + regenerated SDK + smoke tests); user-detail "Access &
  keys" panel; unit tests for the force-union guard + username validation.

Completes lhumina_code/home#256 part 2.

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(admin): embed the assistant widget and voice bar in the navbar
All checks were successful
lab publish / publish (push) Successful in 24m28s
dc811dfe53
Add the agent assistant widget (Kimi default) and the voice bar to the admin
dashboard navbar so an operator can use the assistant by chat or by voice.

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(admin): hide the provision node field (fixed to the default slice)
All checks were successful
lab publish / publish (push) Successful in 23m17s
c9dead1ec8
The provision Node field is really a slice SID, and any value other than the
default silently fails to provision. Hide it and submit the default so an
operator cannot enter a non-existent slice. Explicit node selection returns
when the deployer supports more than one dedicated node.

Signed-by: mik-tf <mik-tf@noreply.invalid>
A fresh tester install was failing in three independent ways that compounded:

1. Mycelium route wait. A freshly-provisioned VM reports running on-chain well
   before its overlay route converges (minutes, sometimes longer). The bare scp
   died in ~3s with "No route to host" and the whole install failed. install_hero_stack
   now waits for SSH reachability before transferring, spending up to half the
   budget so a slow-but-eventual route still installs on the first attempt, with
   a clear error if it never comes up. The install timeout default rises to 1800s
   to cover the wait plus the ~10-15 min install (also the stale-lock window).

2. OAuth gate seed. hero_proxy seeds the forge OAuth provider once at boot by
   reading the client_id/secret from hero_proc. On a fresh tester that read races
   and version-skews, silently returning "not set" even when the secret is present,
   so the provider is never created and every page returns 500. The deployer already
   holds the per-tester client_id/secret (it minted the OAuth app), so it now pushes
   the provider straight into hero_proxy via oauth.set_provider after the gateway is
   healthy. Success is confirmed by the response, not the HTTP status, and the install
   fails closed if it never succeeds. Skipped when there is no OAuth app (empty
   client_id), so the gate stays inert rather than seeding a broken record.

3. Cold-boot service-start races. hero_proc can still be registering its RPC
   surface when the first lab service --start round runs, so a component (e.g.
   hero_orchestrator) could fail with "method not found: action.set". setup-binaries.sh
   now waits for hero_proc to answer before starting services and retries any
   component that failed the first round. The deployer's hero_proxy restart also
   falls back to re-registering the service if a race left it unregistered, instead
   of aborting the whole install with "service not found".

See lhumina_code/home#265

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: derive the gateway fqdn when the daemon returns ready without one
Some checks failed
lab publish / publish (push) Failing after 5m48s
e6fd3ad7de
deploy_webgateway can reach on-chain ready while the grid SDK read-back returns
an empty fqdn (an intermittent miss). The deployer treated that as a hard error,
which skipped the per-tester OAuth app and blocked install, stalling onboarding
for that tester until manual intervention.

A name gateway's fqdn is deterministic: <gateway_name>.<zone>, where the zone is
the gateway node's domain suffix shared by every tester. When the daemon returns
ready without an fqdn, derive it (zone from TFGRID_GATEWAY_ZONE or inferred from
an existing tester's fqdn), persist it, and continue, so install + the login gate
proceed normally. install_hero_stack's repair path reuses this, so a single
Install recovers a tester that came up without a Cockpit URL.

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: decouple gateway name from username + SSD-aware capacity
All checks were successful
lab publish / publish (push) Successful in 10m12s
596553cb53
Two onboarding-reliability fixes for the single-node sandbox.

Gateway name: the web-gateway / DNS label is no longer the bare Forge
username, so re-provisioning a user never collides on a leftover
on-chain name contract. A fresh VM gets a name unique to that VM (the
username slug plus the VM short id, e.g. alice003x), and the operator
can pin a custom web-address label on the Add / Provision form
(lowercased and stripped to [a-z0-9]). A new vms.gateway_name column
holds it; existing testers are backfilled from their current fqdn label
so a reinstall keeps their address. ensure_webgateway resolves the name
once and persists it so the provision and install-repair paths agree.

Capacity: node_capacity now reports how many slices actually fit right
now, from the compute daemon's live free vCPU / RAM / SSD with the
deploy headroom (the binding constraint is usually SSD), instead of a
raw free-slot count that over-reported on a disk-bound node. It falls
back to the catalog free-slice count when talking to a daemon that
predates the new ComputeService.node_capacity, so the readout never
regresses across a version skew. The admin "room for N more testers"
banner and the pre-provision check use the honest figure.

#22
#21

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(install): seed a catch-all deny route so unrouted hosts are refused
All checks were successful
lab publish / publish (push) Successful in 23m58s
5c9a2565f7
The tester install already registers the public hostname as an
oauth-gated route on hero_proxy. Add a sibling "*" deny route so any
request that does not match the public hostname (for example one reaching
the VM by its raw mycelium or backend address) is refused with 404 rather
than served the cockpit unauthenticated. SSO is deliberately not used for
that path: the login redirect is bound to the public hostname and could
never complete off it, so a flat deny is the honest answer. The public
hostname keeps its own exact-match oauth gate.

Requires the matching hero_proxy change (catch-all "*" / deny support);
the route is inert against an older proxy.

lhumina_code/home#271

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(deployer): multi-node and multi-chain provisioning across a fleet of compute daemons
Some checks failed
lab publish / publish (push) Has been cancelled
85aad0ebd0
Replace the single HERO_COMPUTE_NODE_ADDR with a fleet of compute daemons, one
per TFGrid network, and aggregate their dedicated nodes. The deployer can now
see and place tester VMs across more than one node, and across more than one
chain; the runtime stays unified by mycelium regardless of which chain rented a
node. A single-chain deployment (no HERO_COMPUTE_DAEMONS set) is unchanged.

- Compute fleet built from HERO_COMPUTE_DAEMONS (JSON) or the single-daemon env.
- Each VM records its owning daemon (schema M12, additive column) so provision,
  delete and gateway repair route back to the right chain.
- New deployer.list_nodes RPC aggregating every node across every chain with a
  live SSD-aware capacity snapshot and a fleet summary.
- Node selection: an explicit (node, chain) pin or "auto", which picks the
  most-free fitting node server-side at provision time so it cannot go stale.
- Admin dashboard: a Nodes page, an Overview capacity strip and Nodes card, and
  a labelled node dropdown on the Add form (auto by default).

Proven live: deployed to the admin machine with no regression (M12 migrated
11->12, existing VMs intact); the aggregated view exposed a second QA node that
the single-node UI hid; a tester provisioned onto that second node and was then
torn down, with provision and delete both routing by the recorded chain.

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(deployer): per-daemon hero_router path so co-located daemons share one router port
Some checks failed
lab publish / publish (push) Failing after 4m31s
2014b02556
The compute fleet distinguished daemons only by addr while using a fixed
/my_compute_zos/rpc/rpc path, which assumed each daemon sat behind its own
router port. hero_router is a single TCP entry point that routes to services by
path, so a second daemon co-located on one admin VM is registered under a
distinct socket name (e.g. my_compute_zos_main) and reached on the same port at
/my_compute_zos_main/rpc/rpc. Add an optional rpc_path to each
HERO_COMPUTE_DAEMONS entry (and ComputeAdapter), defaulting to the canonical
path so single-daemon and existing multi-daemon configs are unchanged.

Signed-by: mik-tf <mik-tf@noreply.invalid>
ci: canonical lab-publish workflow (build main/development/integration)
Some checks failed
lab publish / publish (push) Has been cancelled
3928f5ef9e
Publishes musl-x86_64 binaries to per-branch releases (latest,
latest-dev, latest-integration) and installs lab from the matching
hero_skills branch (clone + build via --branch). Triggers only on push
to these three branches.
ci: canonical lab-publish workflow (build main/development/integration)
Some checks failed
lab publish / publish (push) Has been cancelled
4cc3493141
Publishes musl-x86_64 binaries to per-branch releases (latest,
latest-dev, latest-integration) and installs lab from the matching
hero_skills branch (clone + build via --branch). Triggers only on push
to these three branches.
ci: trigger lab-publish run
Some checks are pending
lab publish / publish (push) Waiting to run
a8dc79b998
ci: trigger lab-publish run
Some checks failed
lab publish / publish (push) Failing after 2m47s
25fc0d454c
Merge remote-tracking branch 'origin/main' into main_home_277_admin_access
Some checks are pending
lab publish / publish (push) Waiting to run
bb1c149620
feat(deployer): apply access settings to existing testers
Some checks are pending
lab publish / publish (push) Waiting to run
87c7fc140b
feat(admin): show the TFGrid network per VM on the Users page
Some checks are pending
lab publish / publish (push) Waiting to run
2e5e5e5e76
The deployer now provisions across more than one TFGrid network from one admin
VM, but the Users page VM table showed only a node short id, which is ambiguous
(the same short id exists on more than one network). Thread the owning fleet
daemon (daemon_label) through list_vms (VmRow schema + response) into the admin
and add a Network column to the VM table, so each tester clearly shows which
network and dedicated node it lives on. Display only; the value is already
stored per VM from the multi-node work. Empty (pre-multi-node rows) shows
`default` (the first-configured daemon).

Signed-by: mik-tf <mik-tf@noreply.invalid>
Merge remote-tracking branch 'origin/main' into main_home_277_admin_access
All checks were successful
lab publish / publish (push) Successful in 8m58s
8c32e45e35
feat(deployer): manage dedicated nodes from admin
All checks were successful
lab publish / publish (push) Successful in 7m1s
d468dc4cfb
fix(admin): polish dedicated node search UX
All checks were successful
lab publish / publish (push) Successful in 7m9s
8a9b51e59e
feat(admin): manage compute wallet mnemonics
All checks were successful
lab publish / publish (push) Successful in 7m6s
efb9141350
feat(deployer): add testnet compute fleet support
All checks were successful
lab publish / publish (push) Successful in 7m2s
b6db118dcb
fix(deployer): present networks in canonical order
All checks were successful
lab publish / publish (push) Successful in 7m10s
9f8490f605
fix(deployer): add onboarding retry controls
All checks were successful
lab publish / publish (push) Successful in 7m11s
68b2571d1d
feat(deployer): add node detail drawer
Some checks failed
lab publish / publish (push) Has been cancelled
6ae9db320e
feat(deployer): show active testers per node
All checks were successful
lab publish / publish (push) Successful in 9m2s
90d0141cc7
fix(deployer): default node search to rentable
Some checks failed
lab publish / publish (push) Has been cancelled
f6d42e8e4b
fix(deployer): remove users locally only
All checks were successful
lab publish / publish (push) Successful in 10m13s
5e3633afaa
feat(deployer): copy mycelium addresses
All checks were successful
lab publish / publish (push) Successful in 7m7s
b3ee76bfa5
fix(deployer): cap demo testers at one slice
Some checks failed
lab publish / publish (push) Has been cancelled
78b7e590e1
Set the demo tester profile default to one TFGrid slice, which is 4 GB RAM. Keep the OpenRPC description and SDK compile smoke aligned so the admin capacity math and provision default speak the same profile size.

Signed-by: mik-tf <mik-tf@noreply.invalid>
fix(deployer): require two-slice testers with disk floor
Some checks failed
lab publish / publish (push) Has been cancelled
9fabc02c44
Use the current hero_compute slice model honestly for demo testers: two slices gives 2 vCPU and 8 GB RAM today. Add a 50 GB SSD floor to node suitability so the Nodes search and auto-placement do not treat low-disk-per-slice nodes as viable for the expanded tester bundle.

Signed-by: mik-tf <mik-tf@noreply.invalid>
fix(deployer): lower tester disk floor
All checks were successful
lab publish / publish (push) Successful in 10m46s
05d8accd3f
feat(deployer): show node links and include foundry
All checks were successful
lab publish / publish (push) Successful in 7m4s
a282a3de4a
ci(deployer): publish main to latest
All checks were successful
lab publish / publish (push) Successful in 23m47s
0d454e6316
feat(deployer): include indexer in tester bundle
Some checks failed
lab publish / publish (push) Has been cancelled
e9c75d659b
fix(deployer): simplify users card label
All checks were successful
lab publish / publish (push) Successful in 26m38s
1768e90728
feat(deployer): add release-channel tester updates
All checks were successful
lab publish / publish (push) Successful in 28m43s
0478e581c3
Signed-by: mik-tf <mik-tf@noreply.invalid>
fix(deployer): restart services after tester updates
Some checks failed
lab publish / publish (push) Has been cancelled
ee0d443001
Reinstalling a tester VM already forced binary downloads, but the follow-up
service pass used idempotent starts. Existing processes therefore kept running
old binaries after an update. Restart hero_proc_server and reset each installed
service so update/reinstall actually applies the downloaded release channel.

Signed-by: mik-tf <mik-tf@noreply.invalid>
Pin the tester base bundle so hero_tfgrid_deployer is never installed into tester VMs. The deployer belongs on the admin/control VM only.

Signed-by: mik-tf <mik-tf@noreply.invalid>
Replaces lab-publish.yaml with a single lab-release workflow that pulls the
prebaked lab-builder image and publishes per-branch releases (main=stable,
development/integration=pre-release). No per-run toolchain/lab install.
Replaces lab-publish.yaml with a single lab-release workflow that pulls the
prebaked lab-builder image and publishes per-branch releases (main=stable,
development/integration=pre-release). No per-run toolchain/lab install.
feat(deployer): add admin control surface
All checks were successful
lab release / release (push) Successful in 7m41s
f53ac75e56
Add a Control tab to the TFGrid deployer admin UI for admin-VM-only shared services and provider dashboards.

Signed-by: mik-tf <mik-tf@noreply.invalid>
fix(deployer): keep control page focused on shared providers
All checks were successful
lab release / release (push) Successful in 6m24s
06d80ca3c8
Remove redundant deployer and misleading voice entries from the admin VM Control page so it only exposes dedicated shared provider admin surfaces.

Signed-by: mik-tf <mik-tf@noreply.invalid>
ci: trigger lab-release (latest-main)
All checks were successful
lab release / release (push) Successful in 19m41s
474f7242d6
ci: trigger lab-release (latest-main)
Some checks failed
lab release / release (push) Has been cancelled
0dea3a52fb
fix(deployer): avoid proxy restart during admin allowlist save
All checks were successful
lab release / release (push) Successful in 5m0s
ce57ce8141
The admin allowlist save path writes ADMIN_FORGE_USERS and previously restarted hero_proxy_server before returning. That killed the in-flight dashboard response because the request itself is proxied through hero_proxy, so the browser saw a Bad Gateway body instead of JSON.

Keep the compatibility response field, but let hero_proxy pick up the new allowlist through its existing short TTL cache. Update the settings UI and OpenRPC description to match the no-restart behavior.

Signed-by: mik-tf <mik-tf@noreply.invalid>
fix(deployer): skip uninstalled VMs during access sync
All checks were successful
lab release / release (push) Successful in 7m21s
388e50c060
Applying admin access settings expects the tester VM to have a Hero stack so it can write hero_proc secrets and restart the tester proxy. Running that path against a provisioned but uninstalled VM reports a false failure even though the VM is not a valid sync target yet.

Skip VMs whose install_state is not ready so the settings page reports actionable failures only.

Signed-by: mik-tf <mik-tf@noreply.invalid>
fix(deployer): use branch-suffixed release channels
All checks were successful
lab release / release (push) Successful in 19m18s
02defc0937
Align tester install and update flows with the current lab-release tag contract: main uses latest-main, development uses latest-development, and integration uses latest-integration.

Signed-by: mik-tf <mik-tf@noreply.invalid>
ci: multi-arch lab-release (linux-musl x86_64 + arm64)
All checks were successful
lab release / release (push) Successful in 14m25s
28aa3f3142
ci: canonical lab-release (cargo check + multi-arch + hero.releaser)
Some checks failed
lab release / release (push) Has been cancelled
b36ed7aa29
ci: canonical lab-release (cargo check + multi-arch + hero.releaser)
All checks were successful
lab release / release (push) Successful in 34m22s
d353f649e4
Auto provisioning can now be scoped by daemon/network, so a QA tester stays on a QA-capable node even when another chain has a duplicate node SID with more free capacity. The admin user forms submit the selected daemon for both explicit nodes and Auto, and the regression is covered by server tests.

Signed-by: mik-tf <mik-tf@noreply.invalid>
merge remote main before deployer auto-scope push
Some checks failed
lab release / release (push) Has been cancelled
eb08e9b120
Integrate the canonical lab-release workflow updates that landed on main while the deployer auto-selection fix was being prepared.

Signed-by: mik-tf <mik-tf@noreply.invalid>
fix(deployer): mark managed nodes in finder
All checks were successful
lab release / release (push) Successful in 8m10s
2a898e7738
The Nodes finder now detects TFGrid nodes already registered in the deployer and shows a Registered action that opens the managed-node detail drawer instead of offering a duplicate Register action.

Signed-by: mik-tf <mik-tf@noreply.invalid>
fix(deployer): require rent before node registration
All checks were successful
lab release / release (push) Successful in 12m29s
ad23d0384f
Dedicated TFGrid nodes can no longer be added to the deployer catalog unless Grid Proxy confirms the selected daemon's wallet twin owns the rent. The Nodes finder now shows Rent + register as the only action for rentable unrented nodes and keeps actions visibly busy while rent/register/unregister calls run.

Signed-by: mik-tf <mik-tf@noreply.invalid>
fix(deployer): clarify node unregister behavior
All checks were successful
lab release / release (push) Successful in 12m20s
6057596e80
The Nodes page now labels the trash action as deployer-catalog unregister only and asks for confirmation before running it. The server-side unregister guard also treats legacy empty-daemon VM rows as QA when a QA daemon exists, so mainnet node removal is not blocked by old QA tester rows sharing the same short SID.

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(deployer): add node unrent action
All checks were successful
lab release / release (push) Successful in 8m9s
d14d8cf20c
The deployer now exposes a guarded cancel-rent RPC that infers the active rent contract from Grid Proxy, verifies daemon ownership when available, and refuses while the node is still registered. The Nodes finder shows Unrent for rented unmanaged nodes, keeping catalog unregister and on-chain rent cancellation as separate operator actions.

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(deployer): add node adopt retire lifecycle
All checks were successful
lab release / release (push) Successful in 12m10s
a926403461
Add a managed-node retire RPC that refuses while tester VMs still use the node, unregisters it from the compute catalog, and cancels the TFGrid rent contract when owned by the deployer wallet. Update the Nodes UI terminology to Adopt node for rent/register and Retire node for unregister/cancel-rent, keeping Unregister only as an explicit advanced path.

Signed-by: mik-tf <mik-tf@noreply.invalid>
fix(deployer): use app modal for node lifecycle actions
All checks were successful
lab release / release (push) Successful in 8m15s
033f2dd948
Replace browser-native node lifecycle confirmations with the deployer UI modal and unwrap nested compute RPC errors so node retire/unregister failures show actionable text.

Signed-by: mik-tf <mik-tf@noreply.invalid>
fix(deployer): hide rpc prefix in admin json errors
All checks were successful
lab release / release (push) Successful in 12m12s
a534dfe0c6
Strip generated JSON-RPC transport prefixes from admin JSON error bodies so node lifecycle banners show the actionable message returned by the deployer server.

Signed-by: mik-tf <mik-tf@noreply.invalid>
fix(deployer): show compute-local node blockers
All checks were successful
lab release / release (push) Successful in 8m13s
c033a106e6
Add a node_compute_vms RPC so the Nodes drawer can show VMs that exist in
the compute daemon but are not assigned to a deployer user row. Retire and
unregister now refuse on those compute-local blockers before attempting node
catalog removal, and the drawer disables lifecycle actions with an explicit
warning table.

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(deployer): manage compute-local node blockers
All checks were successful
lab release / release (push) Successful in 8m13s
dcbb60c0d2
Add guarded cleanup for unassigned compute-daemon VMs from the Nodes drawer, refresh the operator manual, and align top-level deployer page guidance.

Signed-by: mik-tf <mik-tf@noreply.invalid>
fix(deployer): route manual sections
All checks were successful
lab release / release (push) Successful in 29m44s
0dec02b3ec
Show one manual section at a time through hash routes so sidebar navigation changes the docs panel instead of scrolling a long page.

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat: include aibroker in tester bundle
All checks were successful
lab release / release (push) Successful in 8m26s
aa23cb9f97
Add hero_aibroker to the deployer-managed tester stack and installer component map so fresh sandbox installs pull the current latest-main broker binaries alongside Books and Memory.

Signed-by: mik-tf <mik-tf@noreply.invalid>
Merge origin/main into integration
All checks were successful
lab release / release (push) Successful in 11m38s
e6460feeca
Bring current main changes into integration before release convergence.

Signed-by: mik-tf <mik-tf@noreply.invalid>
Default sandbox installs to integration channel
Some checks failed
lab release / release (push) Has been cancelled
d9fa9aca24
Make the deployer admin UI, install script, RPC defaults, and operator docs use latest-integration as the sandbox install default while keeping latest-main selectable for promoted demos and rollback.

Signed-by: mik-tf <mik-tf@noreply.invalid>
ci: canonical-only lab-release (+cargo test); remove other workflows
All checks were successful
lab release / release (push) Successful in 18m39s
b2e90b568a
feat(deployer): show linked tester build details
Some checks failed
lab release / release (push) Has been cancelled
1a9028e485
Record the selected release tag in new tester VM build snapshots and keep old snapshots compatible through serde defaults. Make the user VM Build badge open a details modal with per-component Forge repo, release, and commit links.

Local gates:
- cargo +1.96 check -p hero_tfgrid_deployer_admin -p hero_tfgrid_deployer_server
- cargo +1.96 test -p hero_tfgrid_deployer_server releases::tests --lib
- git diff --check

Signed-by: mik-tf <mik-tf@noreply.invalid>
fix(deployer): clarify whole-stack update channel
Some checks failed
lab release / release (push) Has been cancelled
a4b31654f4
The deployer update controls reinstall tester VMs from one selected stack channel.
Make the selector and action tooltips explicit so this is not confused with Cockpit per-service channel choices.

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(deployer): preserve service channels on tester updates
All checks were successful
lab release / release (push) Successful in 45m13s
b2410fd34e
Ready tester VM batch updates now delegate to the tester Cockpit and update each service bundle from its recorded channel.
First installs and retries keep using the selected whole-stack channel, and new installs copy the stack build snapshot to the tester for Cockpit build backfill.

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer.list_nodes now returns ok and error per configured daemon plus
a daemons_unreachable count, so an unreachable chain daemon is
distinguishable from an empty fleet. Previously a downed daemon's nodes
silently vanished from the response and the dashboard showed
"no dedicated nodes configured" during an outage.

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(admin): daemon health rendering, logs tab, services tile, channel copy
Some checks failed
lab release / release (push) Has been cancelled
ff3f35bce4
Nodes page and the overview fleet strip now show which chain daemon is
unreachable with its last error and point at the admin services page,
instead of rendering an outage as an empty fleet. New Logs page embeds
the shared logs viewer against a read-only relay to the supervisor's
logs.filter (the relay renames the component's src_prefix field to the
supervisor's src and refuses non-read methods). Control gains an
"Admin VM services" tile linking the admin Cockpit services page. The
Users page channel selector is labeled as the default for new installs
and the subtitle says updates preserve each service's recorded channel.

Signed-by: mik-tf <mik-tf@noreply.invalid>
chore(sdk): regenerate client snapshot for daemon health fields
Some checks failed
lab release / release (push) Has been cancelled
aa4912623f
Signed-by: mik-tf <mik-tf@noreply.invalid>
fix(admin): bound the Control page supervisor lookup
Some checks failed
lab release / release (push) Has been cancelled
edd3c5b871
The page awaited hero_proc service.status_all with no timeout, so a
wedged supervisor hung the browser tab forever. The lookup now times
out after 8s and the page renders the fallback card list with a
warning banner instead.

Signed-by: mik-tf <mik-tf@noreply.invalid>
fix(admin): route the services links through the router path
Some checks failed
lab release / release (push) Has been cancelled
e32e276945
The admin domain proxies everything to hero_router, which only routes
service-prefixed paths, so the bare /services link returned 404 for
authenticated users. Point the Control tile and the Nodes daemon
warning at /hero_cockpit/web/services.

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(admin): navbar link to this machine's own Cockpit
All checks were successful
lab release / release (push) Successful in 14m54s
6791cf4697
The deployer admin is the fleet surface; the machine surface is this
VM's Cockpit. Make the machine surface a first-class navbar entry
instead of a tile buried on Control.

Signed-by: mik-tf <mik-tf@noreply.invalid>
fix(admin): name the Control card Hero Cockpit and land on its main page
Some checks failed
lab release / release (push) Has been cancelled
72a6864acd
The Control card and the This-machine navbar link open this admin VM's own
Hero Cockpit, not a tester's; the card now says so explicitly and both
links land on the Cockpit main page instead of jumping straight to the
Services tab. The Nodes-page daemon-restart alert keeps its deep link to
the Services tab because it points at a specific restart action.

lhumina_code/home#282

Signed-by: mik-tf <mik-tf@noreply.invalid>
ui(admin): Control page tiles match the cockpit Apps page look
Some checks failed
lab release / release (push) Has been cancelled
4cf4a512ea
Same visual language as the tester-facing Apps page: gradient hero
banner, lift/glow tiles with an icon art banner, title row with the
status badge, and full-size Open buttons, so the two machine surfaces
read as one product.

Signed-by: mik-tf <mik-tf@noreply.invalid>
ui(admin): drop This-machine navbar link; Control heading matches sibling pages
All checks were successful
lab release / release (push) Successful in 30m5s
bb4d537fad
Control is the navbar home for reaching this machine's Hero Cockpit, so
the extra navbar link is redundant. The Control heading returns to the
standard admin layout (title, lead, refresh button) used by the Nodes
page; only the tiles keep the cockpit Apps look.

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(placement): sandbox VMs deploy only to nodes our wallet rents
Some checks failed
lab release / release (push) Has been cancelled
27f4cabb8a
Provisioning could place a tester VM on any registered node, including
shared (unrented) ones, where capacity can be consumed by any grid user
at any time and where paid chains bill per deployment instead of inside
the rent. The node registration guard also waved shared nodes through:
it only protected against registering a node rented by someone else.

One placement policy now covers both paths. Registration and provision
both verify the live rent status from Grid Proxy and refuse any node
that is not rented by this deployer's twin, failing closed when the
node cannot be verified at all. The provision-time check matters even
with registration guarded, because nodes can enter a daemon's catalog
without passing through the deployer.

The policy is deliberately a gate, not a wall: setting the
deployer/TFGRID_ALLOW_SHARED_NODES secret to true opens shared-node
placement fleet-wide with no restart or rebuild (the value is read live
per decision). A node rented by another twin and an unrented dedicated
node stay blocked even with the gate open, since the chain refuses our
deployments there regardless.

#24

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(placement): chain-identity rails — daemon network and gateway zone must match
Some checks failed
lab release / release (push) Has been cancelled
4a935cbd0c
Cross-chain mixing (a VM on one network with its public gateway on
another) is structurally prevented by the per-daemon provision flow, but
only as long as each daemon really is on the chain its label claims. A
stale daemon build proved that assumption can silently fail: it ignored
its config context and answered for QA while labelled main. Mycelium
spans all chains, so a wrong-chain gateway would even have worked,
publishing a tester on another network's domain.

Three rails, all fail-closed, none with an override (there is no
legitimate reason to mix networks):

The provision path now verifies the daemon's chain identity first: the
gateway zones a daemon reports come from its own grid view, so they are
a property of the chain its credentials point at; a mismatch with the
configured network refuses the provision and names the zone. The fleet
listing runs the same probe and excludes a mismatched daemon's nodes
from placement, surfacing the mismatch in daemon health (a probe
transport failure only logs; the provision-time check still protects).

The gateway mint refuses to persist-and-publish an fqdn whose DNS zone
belongs to a different network than the owning daemon. QA, testnet,
devnet and mainnet zones are distinct; mainnet is the bare grid.tf form.

The derived-fqdn recovery path (daemon returned ready without an fqdn)
now only accepts a zone belonging to the requesting daemon's network;
it previously inferred the zone from any tester's fqdn, which would
cross chains as soon as testers span networks.

Signed-by: mik-tf <mik-tf@noreply.invalid>
fix(provision): normalize the VM name to the daemon's charset
All checks were successful
lab release / release (push) Successful in 33m30s
91583079ba
The compute daemon only accepts VM names made of lowercase letters,
digits, and hyphens, but the provision default derived the name from
the Forge username verbatim, so any mixed-case username (or one with
dots or underscores) failed its first provision with an invalid-name
error from the daemon. The gateway label already had this treatment
via gateway_slug; the VM name now gets the same: lowercased, common
separators mapped to hyphens, anything else dropped, and an explicit
name that ends up empty is rejected instead of silently renamed.

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(placement): shared node support, liveness filter, gateway-name availability
All checks were successful
lab release / release (push) Successful in 18m32s
411d1b184a
Auto-placement prefers dedicated (rented) nodes and overflows to shared
(unrented) nodes only where the per-network shared gate is open. Candidate
selection and node adoption filter on Grid Proxy liveness (healthy + recent
heartbeat) instead of the rentable/status flags a dead node keeps advertising.
Shared placement is gated by TFGRID_ALLOW_SHARED_NODES with an extra mainnet
opt-in. Provisions onto one node are serialized across the capacity preflight
and the deploy that consumes the slices. A custom web address already registered
on-chain is rejected before the VM is deployed; a blank address gets a per-VM
auto name with a free-variant suffix. Nodes admin page reframed with a
Dedicated/Shared badge, a stale-node liveness flag, type-labelled capacity, and
split Rent & adopt vs Use shared actions.

Signed-by: mik-tf <mik-tf@noreply.invalid>
fix(nodes-ui): explicit Dedicated/Shared badge, drop confusing "not reserved"
All checks were successful
lab release / release (push) Successful in 33m41s
1a9157d887
Add an explicit Dedicated (blue) / Shared (amber) badge next to each node SID in
the managed nodes table, and replace the "(not reserved)" capacity wording with a
plain slice count plus a "shared capacity, best-effort: other grid users can take
these slices" note on shared rows.

Signed-by: mik-tf <mik-tf@noreply.invalid>
fix(nodes-ui): shared nodes show "Remove", not "Retire" (no rent to cancel)
Some checks failed
lab release / release (push) Has been cancelled
33ce873787
Shared nodes are never rented, so "Retire" (which cancels the rent and stops
billing) was misleading. Shared nodes now show a single "Remove" action with a
confirm that states nothing is cancelled or billed and the node stays public;
dedicated nodes keep Retire plus Unregister-only.

Signed-by: mik-tf <mik-tf@noreply.invalid>
fix(nodes-ui): plain-English node removal verbs (drop Retire/Unregister jargon)
Some checks failed
lab release / release (push) Has been cancelled
833012a172
Shared node: "Remove" (nothing rented/billed). Dedicated node: "Remove & end
rental" (cancels the rental, stops billing) and "Remove, keep rental" (keeps
paying). Confirm dialogs and flash messages reworded to match.

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(nodes-ux): consistent Reserve/Unreserve + Shared vocabulary across the UI
Some checks failed
lab release / release (push) Has been cancelled
5eeaa34210
Adopt with an explicit choice (Reserve as dedicated / Add as shared) where both
apply; removal mirrors it (Remove and Unreserve / Remove, keep reserved / Remove
for shared). Add-user and Provision node dropdowns label each option Dedicated or
Shared; Auto is "dedicated first, then shared" with the preview matching the
server's dedicated-first placement.

Signed-by: mik-tf <mik-tf@noreply.invalid>
fix(nodes-ux): Find-Nodes badge reflects node type, not rent status
Some checks failed
lab release / release (push) Has been cancelled
93ffb39de0
Discovery badge now keys on the node's dedicated flag (Dedicated vs Shared) so an
unrented dedicated-type node reads Dedicated, not Shared. The managed-table badge
still keys on whether we rent it. Whether a dedicated node can also be used shared
is left to verify empirically; wording avoids over-asserting it.

Signed-by: mik-tf <mik-tf@noreply.invalid>
fix(nodes-ux): correct dedicated-node tooltip — shared deploy IS possible
Some checks failed
lab release / release (push) Has been cancelled
835e781948
A Dedicated+Rentable node can be deployed on shared (per-VM) without reserving it
(verified on the ThreeFold dashboard). Badge now "Dedicated-capable" with a
tooltip noting it can be reserved OR used shared per-VM (later renter could evict
a shared workload). Opening shared placement on dedicated-capable nodes in the
deployer guard is a follow-up product decision.

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(placement): allow shared deploy on dedicated-capable nodes; offer both adopt options
All checks were successful
lab release / release (push) Successful in 44m59s
1c6361dc74
A single VM deploys shared (per-resource, no rent) on a Dedicated+Rentable node
(verified on the TF dashboard), so shared placement is not limited to
non-dedicated nodes. The placement guard now allows shared on any not-rented node
when the shared gate is open; Find Nodes offers "Add as shared" on every
not-rented node plus "Reserve as dedicated" when rentable. Eviction caveat shown
in the tooltip, not blocked. Guard tests updated.

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(gateway): route tester public-URL gateway to a decoupled gateway daemon
All checks were successful
lab release / release (push) Successful in 42m35s
0d99352e89
A tester VM's public-URL gateway-name-proxy is now minted on a designated
gateway daemon (the reliable QA chain by default), independent of the chain
the tester's compute VM runs on. A gateway-name-proxy is a reverse proxy to a
mycelium backend, and mycelium is one overlay across chains, so a QA gateway
can front a VM whose compute runs on another chain. This unblocks onboarding
while the foundation mainnet name-gateways are not converging to ready.

- new TFGRID_GATEWAY_DAEMON_LABEL knob (default "qa"); resolve_gateway_daemon
  falls back to the VM's own compute daemon when no such daemon is configured,
  so single-chain deployments stay byte-identical.
- ensure_webgateway now keys the gateway node sid, deploy adapter, on-chain
  name-contract availability check, fqdn-zone derivation, and the zone guard
  off the gateway daemon's network. The zone guard is re-pointed (still
  validates the fqdn belongs to the gateway daemon's chain), not removed.
- delete reaps the gateway name contract on the chain it was minted on,
  reverse-derived from the stored fqdn zone, so a cross-chain tester never
  orphans its gateway name contract; delete_vm stays on the compute daemon.
- 3 new unit tests for daemon routing + reverse-derive; no schema change.

Signed-by: mik-tf <mik-tf@noreply.invalid>
fix(deployer): composite VM identity (daemon_label + vm_sid) so two networks can share a vm id
All checks were successful
lab release / release (push) Successful in 33m28s
275cdbcf11
Each compute daemon (one per TFGrid network) mints VM short ids from its
own sequence, so two networks can hand out the same vm_sid (for example a
test VM and a main VM both numbered 005o). The vms table keyed uniqueness
on vm_sid alone, so the second provision insert was rejected even though
the VM already deployed on chain, leaving a running VM the deployer never
recorded.

Make stored VM identity composite on (daemon_label, vm_sid):

- M13 recreates vms with UNIQUE(daemon_label, vm_sid) instead of
  UNIQUE(vm_sid). Safe on existing data (vm_sid is currently globally
  unique, so the composite holds for every existing row).
- insert_vm writes daemon_label in the same INSERT (not a follow-up
  UPDATE), so the composite key is complete the instant the row exists;
  a separate update would let both networks' rows land under ('', vm_sid)
  and still collide before the label was ever set.
- every vms mutator (state, webgateway, gateway_name, install_state,
  oauth, tenant_token, installed_releases, delete) keys on the composite,
  so a write can never touch the wrong network's row.
- find_vm(vm_sid, optional daemon_label) resolves a VM exactly when the
  label is supplied, else by vm_sid alone and errors if more than one
  network has that id (never a silent wrong-row read or delete).
- delete_vm / install_hero_stack / update_vm_services /
  check_build_updates gain an optional daemon_label param (openrpc + sdk)
  so a caller can disambiguate a shared vm_sid.

Tests: same vm_sid under two daemons both insert; exact and ambiguous
lookup; mutators target only the owning row. 172 server + 13 sdk green.

#26

Signed-by: mik-tf <mik-tf@noreply.invalid>
fix(deployer): install never leaves hero-home dirs root-owned
Some checks failed
lab release / release (push) Has been cancelled
7b98bd78e3
The install SSH payload runs as root, but every Hero service runs as the
driver user. The stack-snapshot block hand-wrote a plain root
`mkdir -p /home/driver/hero/var/cockpit` and chowned only the file
inside it, leaving the directory itself root-owned. The cockpit (driver)
then could not create its own per-action upgrade/install log files there,
so every dashboard service upgrade failed with "create log file failed:
Permission denied (os error 13)". Regressed in b2410fd; before that the
cockpit created the dir itself as driver.

Fix the whole class, not just the one dir:
- Route hero-home dir creation through a `driver_owned_dir` helper
  (`install -d -o driver -g driver`), used by the cockpit-state and .kimi
  blocks. `install -d` is idempotent and re-asserts ownership, so a dir an
  earlier install left root-owned self-heals on the next install.
- Add a tail ownership backstop: re-assert driver ownership over the hero
  state tree (`chown -R driver:driver /home/driver/hero/var`) as the last
  filesystem step, so a future slip can never ship a root-owned dir.
- Tests: pin the driver-owned cockpit dir, assert the backstop, and add a
  class-wide guard that fails if any new write-site reintroduces a root
  `mkdir` under /home/driver.

Existing testers installed by an affected deployer self-heal on their
next install; a live tester can be fixed without reinstall by
`chown -R driver:driver /home/driver/hero/var/cockpit`.

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(install): drop hero_biz, hero_slides, hero_whiteboard from the tester bundle
Some checks failed
lab release / release (push) Has been cancelled
112835af0a
Remove the three apps from the tester install set so new tester VMs no
longer install or enable them. Adds a test asserting they stay out.

Signed-by: mik-tf <mik-tf@noreply.invalid>
Revert "feat(install): drop hero_biz, hero_slides, hero_whiteboard from the tester bundle"
All checks were successful
lab release / release (push) Successful in 31m15s
c802d30be1
This reverts commit 112835af0a.
ci(release): clear stale release assets before lab upload
All checks were successful
lab release / release (push) Successful in 42m42s
39513ae6d9
lab build --upload skips release assets that already exist by name, so the
rolling latest-* release froze at its first-uploaded binaries (assets dated
2026-06-12 while the release record kept refreshing on every push). A
downloaded binary was therefore stale (pre-M13), and it panicked opening the
migrated deployer DB, which broke dashboard-driven self-upgrade of the
deployer.

Delete the tag's existing assets via the Forge API before the upload so each
push republishes fresh binaries. See
#29.

Signed-by: mik-tf <mik-tf@noreply.invalid>
ci: drop per-repo delete-before-upload workaround from lab-release
All checks were successful
lab release / release (push) Successful in 36m19s
42abc3599a
The shared lab-builder image now carries the fixed lab that re-uploads a
release asset whenever its md5 changes, so the manual asset-delete step
this repo carried is redundant. This restores the canonical publish step
shared by the other hero repos.

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(deployer): configurable gateway daemon + ordered gateway-node fallback
All checks were successful
lab release / release (push) Successful in 16m3s
4de3387caa
Mint each tester's public gateway-name-proxy on a configurable compute
daemon and an ordered list of gateway-capable nodes, most-preferred
first. The first node that mints a ready gateway on that daemon's
network wins; a failed attempt rolls back its own on-chain workload
before the next node is tried.

Two new optional, secret-backed env vars (defaults empty, so an
unconfigured or single-chain/QA deployment is byte-identical):

  TFGRID_GATEWAY_DAEMON_LABEL  which daemon mints gateways (e.g. main)
  TFGRID_GATEWAY_NODE_SIDS     ordered node ids, e.g. 8,1,13,50
                               (prefer gent02, fall back gent01/03/04)

This lets a mainnet sandbox front testers on the mainnet gent nodes
(gent02 verified converging) with automatic fallback, instead of the
single hardcoded gateway node, without a rebuild to retune.

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(deployer): always-on login-gate floor + one-click web gateway repair
All checks were successful
lab release / release (push) Successful in 45m55s
68977980e6
A tester whose web gateway domain was never recorded ended up with zero
hero_proxy domain routes, so hero_proxy fell through to ungated path-prefix
forwarding and served the cockpit with no Forge sign in. Close that hole and
make it repairable from the admin.

- Always seed the catch-all "*"/deny route in the install payload, regardless
  of whether the tester has a public fqdn, so an unrouted host is refused 404
  rather than served open. The per-tester OAuth route is still added only when
  a real fqdn + OAuth app exist.
- Push the deny floor first (before the OAuth provider seed and the fqdn
  route), so a slow or failing OAuth step can never leave a window where
  hero_proxy is up but has no deny route. Factored the gate seeding into one
  shared builder so install and repair push byte-identical routes.
- New deployer.retry_vm_webgateway RPC: recover the gateway fqdn (re-derive it
  from the already-minted gateway, or mint a fresh one), create the per-tester
  OAuth app if missing, then re-push the gate routes onto the live tester over
  SSH. Idempotent (re-adds are hero_proxy no-ops; the route domain is unique).
- Admin one-click "Repair gateway & login" / "Repair login gate" buttons on
  the user detail page.

lhumina_code/home#253

Signed-by: mik-tf <mik-tf@noreply.invalid>
docs(admin): name the operating model (admin instance / member instance / organization)
Some checks failed
lab release / release (push) Has been cancelled
ff8cf1d622
The deployer admin manual's Operating Model now uses the Hero platform naming
convention: the deployer runs on the admin instance (the organization's control
plane) and manages member instances. One organization is one admin instance
plus its member instances.

lhumina_code/home#285

Signed-by: mik-tf <mik-tf@noreply.invalid>
docs(admin): make the Model page a real Hero platform explainer
All checks were successful
lab release / release (push) Successful in 43m6s
8b9b93f2e2
Expand the manual's Model section into a full "What is the Hero platform"
explainer: a visual organization diagram (admin instance + member instances
behind the Forge login gate), a what-runs-where service list with one line per
service, how members sign in (Forge SSO + optional 2FA, gated-or-404), and how
each organization gets its own instantiation of a Hero platform (Acme example).
Fix the manual header subtitle to the same naming.

lhumina_code/home#285

Signed-by: mik-tf <mik-tf@noreply.invalid>
docs(admin): add a Services section + per-service pages to the manual
Some checks failed
lab release / release (push) Has been cancelled
b0766a803e
The operator manual now has a Services section listing every service across the
Hero platform, grouped by where it runs (every instance, member instance which
is the Hero stack, admin instance), each card opening a full per-service page
rendered from the platform single-source docs (vendored into the crate). The
Architecture page now names the Hero stack and lists hero_os, and the manual's
em dashes are cleaned. Mirrors the cockpit member manual; the renderer can be
promoted into hero_admin_lib later to de-duplicate.

lhumina_code/home#285

Signed-by: mik-tf <mik-tf@noreply.invalid>
docs(admin): place the services submenu directly under Services
Some checks failed
lab release / release (push) Has been cancelled
c34a9d6b73
The deployer manual has nine top-level sections, so the services submenu
appeared at the bottom of the list. Split the sidebar so the submenu sits
inline directly under the Services item: on the SPA it reveals only when
Services is active (leading with an Overview link to the card grid), and on the
per-service pages it stays open with the current service highlighted.

lhumina_code/home#285

Signed-by: mik-tf <mik-tf@noreply.invalid>
docs(admin): capitalize Hero Platform throughout the admin manual
All checks were successful
lab release / release (push) Successful in 26m59s
d1c89d631e
Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(deployer): app catalog with a strong default for member-instance install
Some checks failed
lab release / release (push) Has been cancelled
0b8b319ae8
Turn the flat, always-full install set into an operator-facing app catalog
(home#286). The canonical install list now splits into an always-on base
(proxy, router, supervisor, data store, member cockpit) and toggleable catalog
apps, each with a strong-default on/off. The operator picks apps on the Add and
set up form; the selection threads to install via a new optional enabled_apps
array on deployer.install_hero_stack.

Server:
- base/app catalog model (BASE_COMPONENTS, DEFAULT_OFF_APPS) + resolve_enabled_
  components(): base is always forced on, unknown/base entries ignored, result
  in canonical install order.
- render_cockpit_services_toml() now takes the resolved set; the books default-
  repos seed, the build-identity snapshot, the stack_components count, and the
  check_build_updates target snapshot all read the selection so a trimmed member
  records and checks exactly its own apps.
- absent enabled_apps preserves the member's currently installed apps (the plain
  reinstall / update-to-latest path) instead of resetting to the default.
- new read-only deployer.app_catalog RPC returns the toggleable apps (component,
  label, default_on) plus the base list.

Admin: /app_catalog.json route + the Add and set up form renders the catalog as
checkboxes (strong default pre-checked, select-all / reset-to-default) and posts
the selection in the install body.

186 server tests (9 new on the catalog model, manifest render, and RPC shape);
fmt + clippy -D warnings + musl release build clean.

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(deployer): saved setups (named app selection plus release channel)
Some checks failed
lab release / release (push) Has been cancelled
a11af0005c
Persist a chosen app selection plus a release channel as a named,
reusable setup the operator builds once on the add form and reapplies
to later member instances, so a consistent organization is repeatable.
Builds directly on the app catalog: a setup is a saved enabled_apps
selection plus a channel, both already first-class on the add form.

Server: new standalone setups table (schema M14, additive CREATE TABLE,
no recreate of users/vms) with name UNIQUE; upsert-by-name CRUD;
save_setup / list_setups / delete_setup RPCs. save_setup normalizes the
chosen apps against the catalog (drops base and unknown names, canonical
install order) so a stored setup is always a valid, base-free app set.

Admin: a setup picker on the Add and set up form (apply checks the
matching app boxes and sets the release channel) plus Save as setup and
Delete, fronted by /setups.json, /setups/save.json, /setups/delete.json.

190 server tests (db CRUD round-trip, upsert-by-name, normalize drops
base/unknown), SDK type smoke test, fmt and clippy -D warnings and musl
release build clean. Proven live on the testing organization admin
instance: save/list/upsert/delete round-trip, normalize dropped base and
unknown, M14 ran on the live DB.

lhumina_code/home#287

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(admin): move the release-channel picker into Setup options, per setup
Some checks failed
lab release / release (push) Has been cancelled
3d7854b3be
The release channel was a single selector at the top of the Users page,
next to Update all testers but labelled "Default for new installs". That
adjacency was misleading (it does not drive the fleet update) and the
saved-setups block had no channel control of its own even though a setup
stores a channel.

Move the picker down into the Add a user form's Setup options, right by
the saved-setups picker and the app checkboxes, so a setup is built and
read in one place: apply a setup checks its apps and sets this channel,
Save as setup captures the apps and channel shown there. Default stays
latest-integration; main and development remain selectable.

Update all testers no longer reads this picker: a ready member instance
refreshes on its own recorded release channel, and a non-ready install
falls back to the form's fixed latest-integration. Admin UI only, no
server or storage change. fmt, clippy -D warnings, build, and the 190
server tests stay green; proven live on the testing organization admin
instance (picker now in Setup options, top bar is the button alone).

lhumina_code/home#287

Signed-by: mik-tf <mik-tf@noreply.invalid>
fix(admin): order release channels main, integration, development
Some checks failed
lab release / release (push) Has been cancelled
55904d8ab1
Present the channel options in the natural progression main then
integration then development, instead of integration first. Integration
stays the pre-selected default. Applied to both the Users add form picker
and the per-user VM page so the dropdown reads the same everywhere.

lhumina_code/home#287

Signed-by: mik-tf <mik-tf@noreply.invalid>
feat(deployer): create a group of member instances from a saved setup and a name list
All checks were successful
lab release / release (push) Successful in 33m47s
4c387d07d4
Add a "Create a group" admin form that stands up many member instances at
once: pick a saved setup (apps plus release channel), name the organization,
and paste a list of people (existing Forge accounts or new ones). The form
loops the existing per-member onboard RPCs (create or add-existing, then
provision, then install), provisioning and kicking install for each member,
then polling them all to ready, with a per-member progress row. Re-running the
same list retries failed members and skips ones already up (create is
idempotent but provision is not, so the loop guards on an existing good VM).

Tag each member at provision with its organization, the setup it was built
from, and its release channel, so the organization can later be managed and
refreshed as one unit without a schema change:

- schema M15: three additive ALTER ADD COLUMN on vms (org, setup,
  release_channel; constant '' defaults, no recreate), with map_vm_row and
  every explicit vms SELECT list extended in lockstep.
- provision_vm takes optional org/setup/release_channel and records them in the
  same RPC right after insert (all known up front for a group), so a member is
  never left untagged even if its install later fails.
- install_hero_stack confirms the channel a member actually installed on, so the
  recorded channel reflects the running build for every member.

lhumina_code/home#288

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: organization object + Launcher (home#291 Part 1)
All checks were successful
lab release / release (push) Successful in 15m7s
c9f58d921a
Consolidate the deployer admin around composable building blocks and add a
real, savable organization entity, per the Launcher arc.

- Schema M16 (additive, no recreate): enrich `setups` to the full install
  bundle with `seed_providers` + `send_welcome_email`; add `organizations`
  and `organization_members` tables (a named org, its setup, and a granular
  member roster where each member is independently existing or new with its
  own email).
- Server RPCs: deployer.save_organization / list_organizations /
  get_organization / delete_organization; save_setup / list_setups carry the
  two new setup fields. openrpc.json is the SSOT; the SDK regenerates.
- Admin: the users page becomes the Launcher (the old /users route redirects).
  The inline Setup options move into a dedicated Setups card; the group-create
  card becomes the Organizations card with a granular mixed-member roster
  (add rows or paste a list), Save and Save-and-deploy, and a saved-orgs list.
  Deploy reuses the per-member onboard loop and tags each member at provision.

198 server + 17 SDK tests; cargo fmt + clippy -D warnings + musl release clean.

lhumina_code/home#291

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: Launcher information architecture (home#291 stage 2)
All checks were successful
lab release / release (push) Successful in 16m43s
b4b96b69e2
Reshape the deployer console into the create/operate split from the
Launcher plan, before the deeper building blocks land.

- Top nav gains Organizations. Launcher is now a sidebar shell with three
  surfaces (Organizations to compose and launch, Setups, Infrastructure as
  a "coming next" stub); the member roster moves off it.
- New Organizations registry page: every member instance grouped by its
  organization, with untagged members folded into a default Testers
  organization, a context switcher, cockpit links, and an "Add a member"
  action that opens the Launcher pre-filled with that organization.
- list_vms now reports each VM's org and setup tag (additive to the VmRow
  result), so the registry can group by organization.

198 server + 17 SDK tests; cargo fmt + clippy -D warnings + musl release
clean across server, SDK, and admin.

lhumina_code/home#291

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: refine the Launcher into one hub (home#291 stage 2 refinement)
All checks were successful
lab release / release (push) Successful in 14m22s
a7ad898eaf
Fold the operator console into a single Launcher hub per the refined plan.

- One hub: drop the separate Organizations top-nav. The Launcher sidebar is
  in build-dependency order (Infrastructure, Setups, Organizations), and the
  Organizations surface both composes new organizations and lists existing
  ones, every member grouped by its organization.
- Operate in place: per-organization "Update all instances" and a global
  "Update all organizations" (re-install each member on its recorded channel,
  apps preserved); the misplaced global "Update all testers" is gone from the
  header.
- Add a member in context: "Add a member" on an organization reveals the
  onboard form inline and tags the new member with that organization, no page
  bounce.
- Setups now lists every saved setup with its contents (apps, channel,
  assistant keys, welcome email), so you can see what a setup is.
- list_vms reports release_channel so the per-org update knows each channel.

198 server + 17 SDK tests; cargo fmt + clippy -D warnings + musl release clean.

lhumina_code/home#291

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: the unified Deploy flow (home#291 stage 3)
Some checks failed
lab release / release (push) Has been cancelled
13946be143
Collapse the two deploy forms into one, restoring a clean add-person to
deploy path and the every-member-is-in-an-organization invariant.

- One Deploy form: choose who (member rows, 1 to N), which organization
  (an existing one or a new one, since every member lives in one), what
  (a setup picker), and where (auto for now, infrastructure groups next),
  then Deploy. A standalone person is just a one-member organization; ad
  hoc deploys land in the default organization.
- The per-org "Add a member" and "New organization" both open this one
  form, scoped to the right organization, with no page or form hop.
- Every saved setup carries a "Use in a deployment" action that opens the
  Deploy form pre-selected, so a setup is never a dead end.
- Deploying to an existing org tags the new members without replacing its
  saved roster; a new org is saved then deployed.

198 server + 17 SDK tests; cargo fmt + clippy -D warnings + musl release clean.

lhumina_code/home#291

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: fix stale Setups copy referencing the removed single-add flow
Some checks failed
lab release / release (push) Has been cancelled
716819c806
The Setups section still told the operator to use a setup "for a single
member (Add a member)" and said its apps "apply to Add & set up only",
both of which named buttons that the unified Deploy flow removed. Point the
copy at the real path instead: save a setup, then Use in a deployment (or
pick it in Organizations, Deploy).

lhumina_code/home#291

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: stack presets + member display name (home#291 stage 3 cont.)
Some checks failed
lab release / release (push) Has been cancelled
bb1cc155ec
A first cut of the locked layered model: differentiate stack and setup,
and restore the member display name.

- Stack presets (Default / Base / Full) are first-class. The Deploy "what"
  picker offers them ahead of the saved setups, so a deploy always has a
  default and never needs a setup built first. The Setups builder labels its
  app shortcuts as the Default / Base / Full stack presets.
- A STACK is which apps; a SETUP is a stack plus configs (channel, assistant
  keys, welcome email). The picker resolves either to the install config; the
  clean stack/setup name is what is stored on the organization and tagged on
  each member instance.
- Member rows carry an optional display name again (username + display name +
  email), threaded to account creation.

cargo fmt + clippy -D warnings + musl release clean (admin).

lhumina_code/home#291

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: stack labels Core/Default/Full + fix Use-in-a-deployment
Some checks failed
lab release / release (push) Has been cancelled
4bbc6cc175
- Stack presets read Core / Default / Full (clean labels, that order), in
  both the Deploy picker and the Setups builder shortcuts.
- "Use in a deployment" did nothing because it revealed the Deploy form in
  the hidden Organizations section; it now switches to that section first.

lhumina_code/home#291

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: edit a saved setup in place
All checks were successful
lab release / release (push) Successful in 15m25s
4d916de59c
Selecting a saved setup now pre-fills its name and loads its stack and
configs into the builder, so adjusting and Save updates that setup instead
of forcing a re-type or a duplicate. The picker button reads Edit.

lhumina_code/home#291

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: per-organization settings, own email + assistant keys (home#291 stage 2)
Some checks failed
lab release / release (push) Has been cancelled
3dfa5b81cb
Each organization can now carry its own Resend key, email sender
(from-address and from-name), and assistant keys (Kimi, Groq, OpenRouter,
SambaNova), used when deploying that organization's members and falling back
to the General defaults when blank, so one organization can send from its own
address with its own keys while another inherits the shared defaults.

- Storage: per-organization overrides live alongside the General defaults in
  the deployer's hero_proc secret store, namespaced by the organization's
  stable row id (ORG_<id>_<SLOT>), so renaming an organization keeps its
  settings and two organizations never collide on a slot. No schema migration.
- RPCs: deployer.get_organization_settings (presence booleans only for the
  keys, raw sender fields with empty meaning inherit) and
  deployer.set_organization_settings (a field present and non-empty sets the
  override, present and empty clears it back to inherit, absent leaves it).
- Resolution at install: the member's organization tag resolves to its id, and
  the assistant-key seeding plus the welcome-email sender resolve
  org-override-else-General, so the setting actually takes effect on deploy.
- Admin: a Settings panel on each saved organization card (an in-app modal)
  loads and writes the above; the General Settings page is unchanged.

Per-network grid wallets are intentionally not included: a wallet is bound to a
compute daemon, not an organization, so per-organization wallets are a
deploy-path change rather than admin UI, tracked at
lhumina_code/home#294.

Tests: org slot namespacing and the set/clear/leave field rule (server); the
regenerated settings types compile (SDK). Full suites green (200 server, 18
SDK); fmt, clippy -D warnings, and the musl release build clean. Deployed and
confirmed live on the testing admin instance.

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: fix the Stack preset Core link (id collision) and trim its note
Some checks failed
lab release / release (push) Has been cancelled
395e3d9758
The Core preset link and the note below it both used id onboard-apps-base, so
getElementById returned the link and the catalog JS overwrote the word "Core"
with the long "Core services (...) are always installed..." sentence. The
preset line rendered as that whole sentence instead of "Core". Drop the
duplicate id, make the note a short static line without the service-list
parenthetical, and remove the now-unused JS that set it. The preset line now
reads Core / Default / Full as intended.

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: assign members to an organization, enforce one org per member (home#291)
Some checks failed
lab release / release (push) Has been cancelled
4f64c63fb7
Existing members deployed before the organization concept stay untagged and fold
into a default bucket; this adds a way to graduate them into a real organization,
and guarantees a member belongs to exactly one.

- M17: a UNIQUE index on organization_members(username) makes it impossible for a
  member to appear in two organization rosters. Additive (the table is empty until
  an organization is composed).
- New deployer.assign_members_to_organization(org, usernames): move semantics. For
  each member it removes them from any other organization, adds them to this one's
  roster, and sets the organization tag on all their instances. The target
  organization is created if it does not exist. No redeploy; the organization's own
  keys and email apply on a member's next install.
- save_organization now also moves a composed member out of any other organization
  before adding them, so both write paths uphold the invariant.
- Admin: the default bucket of untagged members is renamed "Unassigned" (a view,
  not an organization, so it never collides with a real name) and gains per-member
  checkboxes plus a "Move selected into an organization" action.

Tests: the one-org invariant (move keeps a member in exactly one organization, the
UNIQUE index rejects a blind second insert) and the narrow org-tag update that
preserves setup and channel; the regenerated assign types compile. Full suites
green (202 server, 19 SDK); fmt, clippy -D warnings, musl release clean. Deployed
to the testing admin instance, where the existing tester fleet was moved into a
real "Hero Testers" organization and confirmed.

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: fix org Settings modal (script order) + Cockpit URL copy button
All checks were successful
lab release / release (push) Successful in 23m38s
b42de4b6db
Two launcher fixes from live use:

- The per-organization Settings button did nothing: the modal script runs in the
  content block, which the layout renders before bootstrap.bundle.min.js, so
  `bootstrap` was undefined at parse time and the script returned early, never
  attaching the click handler. Attach the handler unconditionally and resolve the
  Bootstrap Modal lazily at click time, by when the bundle has loaded.

- The organization table's Cockpit column showed only a bare "cockpit" link. Show
  the full Cockpit URL as a clickable link plus a copy-to-clipboard button, matching
  the member detail page, and add the shared [data-copy] copy handler.

Template-only; admin rebuilt and confirmed live on the testing admin instance.

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: fold the Nodes page into the Launcher and add node groups
All checks were successful
lab release / release (push) Successful in 13m38s
19ae6bc362
Move the whole Nodes page into the Launcher's Infrastructure section as
three sub-tabs (Managed Nodes, Add Nodes, Group Nodes) and drop the
top-nav Nodes entry; /nodes now redirects to /launcher#infrastructure
(query preserved so a node deep-link still opens its detail drawer). The
moved view initializes lazily the first time Infrastructure is shown, so
the Launcher no longer fetches the fleet on every page view.

Group Nodes (new): a named placement pool scoped to one network, by farm
ids and/or specific node sids. New node_groups table (additive migration),
deployer.save_node_group / list_node_groups / delete_node_group RPCs, and
a Group Nodes sub-tab to create, list, and delete groups.

Deploy: the previously-disabled Infrastructure picker now lists the groups
(plus Auto). Picking a group scopes placement client-side to that group's
network and farms/nodes (freest node first, dedicated before shared), and
the whole organization is checked to fit the group's free capacity before
any account is created; it refuses with a clear message otherwise. Auto is
unchanged (the server picks the most free node).

Server node RPCs are unchanged; only the admin UI and the new group RPCs
are added. 205 server + 20 SDK tests; fmt and clippy clean.

lhumina_code/home#291

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: build a node group by picking from Managed Nodes, not typing ids
Some checks failed
lab release / release (push) Has been cancelled
2981ff227e
The Group Nodes builder now lists your Managed Nodes for the chosen
network, grouped by farm, instead of free-text farm/node id inputs. Tick a
whole farm (every managed node in it, including ones added later) and/or
individual nodes; ticking a farm disables its node checkboxes so the two
never double-count. This makes a group always a slice of the nodes you
actually manage: an organization can be pinned to its own nodes, or to a
specific farm it has a contract on.

Storage and placement are unchanged (farm_ids + node_sids; a node is in
the group if its farm or its sid matches), so deploy still only ever
places on managed nodes.

lhumina_code/home#291

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: let a node group span networks (network-qualified entries)
Some checks failed
lab release / release (push) Has been cancelled
43cc85227c
Farm ids and node sids are not unique across TFChains (mainnet farm 1 is
not QAnet farm 1), so a group's single network column forced every entry
onto one network. Drop it (migration M19) and store each entry
network-qualified ("net:id", e.g. "main:5"), so one group can mix nodes
and farms from different networks. The Group Nodes builder drops the
network select and lists your managed nodes across all networks,
sectioned network then farm; ticking a farm or node records its network
with it. Placement and the capacity preflight match per-entry network.

This is the natural model: a group is just "these nodes, wherever they
live" — an organization can run on its own nodes across networks, or be
pinned to one farm it has a contract on.

206 server + 20 SDK tests; fmt and clippy clean.

lhumina_code/home#291

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: in-app confirmations and Update selected in the Launcher
Some checks failed
lab release / release (push) Has been cancelled
a6e4dc69ce
Replace the remaining browser confirm() dialogs in the Launcher with one
reusable in-app modal: a shared launcher-confirm-modal plus a window.heroConfirm
helper now backs deleting a setup, forgetting a saved organization, and the
update-all-instances / update-all-organizations actions, matching the modal
pattern the per-organization settings and node actions already use.

Add row-selection batch update: the per-member selection checkboxes now render
on every organization card, and a new "Update selected" button beside "Update
all instances" re-installs only the ticked members on their recorded channels.

UI only; no schema, RPC, or settings change. Closes the cross-cutting polish of
lhumina_code/home#291 .

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: edit a node group after creating it
Some checks failed
lab release / release (push) Has been cancelled
825c359965
A node group could be created but never changed, so a group could not grow
as an organization took on more members. Each saved group now has an Edit
button that loads its name and current farm/node selection back into the
builder; saving the same name updates the group in place (the save already
upserts by name). Entries that are no longer among the managed nodes are not
re-ticked, so an update keeps only nodes still managed.

UI only; reuses the existing save_node_group upsert. Part of
lhumina_code/home#291 .

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: renaming a node group while editing moves it, not clones it
Some checks failed
lab release / release (push) Has been cancelled
8c04c21d14
Editing a group and changing its name left both the old and new names, because
save upserts by name and only created the new one. Track the name loaded for
editing and, when the saved name differs, delete the old group after the save so
a rename moves the group instead of cloning it.

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: "Add members" (plural) and a display-name-aware bulk paste
Some checks failed
lab release / release (push) Has been cancelled
8fffe4a59b
The per-organization action is now "Add members" since the Deploy form already
takes many at once. The "Add several at once" paste box accepts an optional
display name per line (username, email, Display Name) and splits on comma,
semicolon, or tab so a spreadsheet column pastes cleanly, instead of splitting
on spaces (which broke display names).

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: an existing member takes only a username, not a display name
Some checks failed
lab release / release (push) Has been cancelled
8b848debcc
For a member added as an existing Forge account, the display name comes from the
Forge profile and a typed one is ignored server-side, so the field is now disabled
(and cleared) when the account kind is Existing, and re-enabled for New. Email
stays an optional override for existing members (the Forge email can be private or
empty), with the placeholder relabelled to say so.

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: removing a dedicated node always unreserves; New organization toggles
All checks were successful
lab release / release (push) Successful in 38m23s
e93288baed
Drop the "Remove, keep reserved" trash button from a dedicated node (table row and
detail drawer): keeping a node reserved while removing it from the deployer means
it silently keeps billing, which is misleading. A dedicated node now has one
action, "Remove and Unreserve"; shared nodes keep their single "Remove". The
unregister handler is shared-only now.

The top "New organization" button now toggles the Deploy form: a second click
folds it away instead of leaving it taking up the page.

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: edit an organization in place + propagate its setup's apps
Some checks failed
lab release / release (push) Has been cancelled
b08c39a7b5
Add deployer.update_organization to rename a saved organization and/or
switch its setup without disturbing its roster. A rename cascades in one
transaction to every member instance's org tag (which keys the
per-organization secret resolution) and to the organization-owned setup,
so a renamed organization keeps its email identity and editable bundle;
a rename that collides with another organization is rejected.

Each organization now owns a uniquely-named setup (created from the
picked stack when it is saved), so editing it affects only that
organization. Update all instances now sends the organization's setup's
current apps to each member, so adding a service to the setup and
updating the organization installs it on everyone. Guard delete_setup
against removing a setup an organization still uses.

lhumina_code/home#295

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: rewrite the Manual around the Launcher and organizations
Some checks failed
lab release / release (push) Has been cancelled
c0084f7b55
Replace the sandbox-era Users and Nodes tabs with Organizations, Setups,
and Infrastructure, mirroring the Launcher, and refresh Overview,
Architecture, Updates, Settings, Control, and Troubleshooting to the
organization model: deploy a list of members on a setup behind their own
logins, manage them as one unit (update all, edit, per-organization keys
and email), build reusable setups, and group the grid nodes members run
on. The Manual now reads as the finished product an operator drives.

lhumina_code/home#291

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: richer Manual Overview and Updates
Some checks failed
lab release / release (push) Has been cancelled
5ef8e7c936
Overview now shows the whole picture from both sides: what the operator
does (the Launcher journey) and what each member gets (their own private,
login-gated Hero), with organizations tying them together. Updates lays
out every granularity: a member updating one service or all their
services from their Cockpit, and the operator updating one member,
selected members, a whole organization, or every organization at once,
each applying the organization's setup.

lhumina_code/home#291

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: precise member wording in the Manual Overview
Some checks failed
lab release / release (push) Has been cancelled
f5aa84e645
A member lands on their own member instance (their machine, running their
Hero stack), where the Cockpit is the console and Hero OS is the shell
that presents their apps as one view. Replaces the vague "their own Hero",
matching the Architecture and platform-overview vocabulary.

lhumina_code/home#291

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: Manual member view — Cockpit is the console, hero_os is an app
Some checks failed
lab release / release (push) Has been cancelled
343e3d9bee
Correct the member overview: a member lands on their member instance and
their console is the Cockpit (hero_cockpit); hero_os is one app among the
others (the dock-and-islands desktop), not the shell that presents
everything.

lhumina_code/home#291

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: Manual — Cockpit and Hero OS as two distinct surfaces
All checks were successful
lab release / release (push) Successful in 32m41s
61722b6563
A member instance runs a Hero stack (a compilation of services). The
Cockpit is the control surface for running and managing those services;
Hero OS is the more integrated environment, the apps brought together as
one desktop of islands you can sign in to and work in. Reframe the
Overview and the hero_os service page away from calling hero_os "the
shell", so the two surfaces are described distinctly.

lhumina_code/home#291

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: setup ownership column (setups.org_id) + Setups page CRUD
Some checks failed
lab release / release (push) Has been cancelled
fcca243f88
Make setup ownership a queryable column instead of an implicit name
match, and turn the Setups page into create / edit-in-place / duplicate
actions (home#295).

- M20 adds setups.org_id (0 = reusable template, >0 = owning org), with a
  name-join backfill so setups that were owned by naming convention
  before this column are stamped correctly and never read as templates.
- save_organization stamps the owned setup server-side (the client never
  passes org_id); upsert_setup preserves org_id on conflict so editing an
  owned setup's apps never reclassifies it; forgetting an org reaps its
  owned setup.
- Setups page: New / Edit / Duplicate / Delete per row; the per-row
  "Use in a deployment" button is removed (deploy from Organizations).
- org_id filters owned setups out of the Deploy picker, restricts the
  Edit-organization setup switch to templates plus the org's own setup,
  and the Settings modal shows the org's setup with an edit shortcut.
- openrpc.json + SDK regenerated; 212 server + 20 SDK tests (3 new).

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: Manual renders the final organization-editing vision
All checks were successful
lab release / release (push) Successful in 38m33s
9c56d6adb4
Update the Manual to the locked model (home#296): an organization is
edited in one place, remembers where it runs, and sets its own settings
at create.

- Deploy flow: the chosen infrastructure group is remembered on the
  organization (reused on updates and added members), and you can set the
  organization's own settings (bring your own key) at create.
- Manage: merge the separate Edit and Settings steps into one
  "Edit organization" surface (name, setup, infrastructure, own settings).
- Vocabulary: a setup is a stack plus its non-secret settings; rename the
  per-organization "configs" wording to "Settings"; settings (keys and
  sender) live in Settings as a General default or per-organization
  override and never inside a setup.

Vision-only doc change (Manual describes the complete product; home#296
closes reality to it). Live on the testing organization admin instance.

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: one Edit-organization place — persisted node group + settings at create
All checks were successful
lab release / release (push) Successful in 18m29s
319451dc86
home#296: an organization now remembers where it runs and is edited in one
screen.

- M21: organizations.node_group (additive, empty = Auto). Threaded through
  OrganizationRow, save_organization, update_organization, get/list, openrpc.json
  (Organization + save/update params and outputs) and the regenerated SDK.
- The chosen Infrastructure node group is saved onto the organization at deploy,
  shown and changeable in Edit, and reused for updates and added members instead
  of being re-picked each time.
- delete_node_group is guarded: a node group an organization runs on cannot be
  deleted (or renamed via delete) out from under it, so a stored placement name
  never silently dangles to Auto. Mirrors the owned-setup delete guard.
- The separate Edit (name + setup) and Settings (keys + email) dialogs merge into
  one Edit-organization screen covering name, setup, infrastructure, and the
  organization's own settings.
- Per-organization settings can be set at create (bring your own keys/sender) in
  an optional section on the Deploy form; blank inherits the General default.
- delete_organization reaps the organization's own core/ORG_<id>_* settings
  secrets so a forgotten organization leaves no orphaned key.
- Vocabulary: the per-organization "configs" are now "Settings"; a setup is the
  stack plus non-secret settings, and keys/email live on the organization, never
  in a setup.

215 server tests (3 new: node_group round-trip, delete-node-group guard, M21
additive) + 20 SDK smoke; fmt/clippy/musl-release clean. Live-verified on the
testing organization admin instance: M21 applied, the 7-member org untouched,
node group persisted + the delete guard refused an in-use group + repoint to Auto
freed it.

lhumina_code/home#296

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: auto-load an organization's remembered node group into the Deploy picker
All checks were successful
lab release / release (push) Successful in 40m30s
199e602061
home#296 follow-on: when you add members to an existing organization, the
Infrastructure picker now defaults to the node group the organization runs on, so
added members reuse its placement without re-picking. launcherDeployTo() fetches
the organization and pre-selects its stored node group when the group still
exists (otherwise it stays on Auto; a transient fetch error leaves the picker
untouched). Changing the group only ever updates the stored placement; existing
member instances are never moved.

lhumina_code/home#296

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: split setups into stacks and settings profiles
Some checks failed
lab release / release (push) Has been cancelled
d98cb37abc
Adds Stack and Settings profile storage/API/UI so organizations reference what runs separately from how it is configured. Profile secrets live in shared profile slots and install resolution now prefers org override, then Settings profile, then General defaults.

Deployed and verified on the live testing admin instance 0069.

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: allow profile keys to select providers
Some checks failed
lab release / release (push) Has been cancelled
5d86558317
Entering a provider key in a Settings profile now enables and checks that provider for the profile. Missing General keys are shown as hints instead of disabling profile provider choices.

Deployed and verified on the live testing admin instance 0069.

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: uncheck auto-selected provider when key is cleared
All checks were successful
lab release / release (push) Successful in 16m19s
0545fec977
Provider key inputs now auto-select their matching Settings profile provider only while the key field remains populated. Clearing a key field reverses that auto-selection without overriding manual checkbox choices.

Deployed and verified on the live testing admin instance 0069.

Signed-by: mik-tf <mik-tf@noreply.invalid>
docs: align the Manual with the Stack and Settings profile model
All checks were successful
lab release / release (push) Successful in 41m13s
b97d927d3d
The operator-facing Manual still described a single combined setup
object. The deployer now exposes two reusable building blocks: a Stack
(apps plus release channel) and a Settings profile (provider keys,
email sender, welcome behavior). An organization references one Stack
and an optional Settings profile, values resolve organization override
then Settings profile then General default, and a referenced Stack or
profile cannot be deleted while an organization still uses it.

Rewrites the Manual "Setups" section into "Stacks & Settings", fixes
the overview, organizations, updates, and settings copy, renames the
manual route and the per-service page back-link to match, and updates
the vendored hero_tfgrid_deployer service page.

lhumina_code/home#297

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: every deployment lands in a real, named organization
All checks were successful
lab release / release (push) Successful in 30m44s
bb397957b7
The Launcher compose deploy already required an organization, but it seeded a
fake default option ("Testers organization") that the client mapped to an empty
vms.org tag, dropping those members into the Unassigned view; and the deploy
picker scraped every org card, so the Unassigned recovery bucket was itself
offered as a deploy target.

Make a deployment self-consistent at the one seam that writes vms.org. When an
organization name is present, handle_provision_vm now ensures a real
organizations row exists for that name and the member is in its roster, before
writing the tag. Without this, a name written to vms.org with no backing row
would strand a member as neither a real organization (no settings, no roster)
nor Unassigned (that view keys on an empty tag). A new idempotent
ensure_member_in_org leaves an existing same-org roster row untouched (so the
compose flow, which saves the organization first, is a no-op) and otherwise
moves the member in, upholding the one-member-one-organization invariant.

Client: drop the fake default organization option and the name-to-empty
normalization, offer only saved organizations in the deploy picker (the
Unassigned bucket and any orphaned-tag group are recovery views, never deploy
targets), and prefill the "+ New organization" name with a renameable
suggestion derived from the first member, so a solo deploy still gets a real,
settings-capable organization from the start.

lhumina_code/home#298

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: add a Launcher Overview landing page with live counts
All checks were successful
lab release / release (push) Successful in 31m52s
4c64a27b4e
Open the Launcher on a new Overview section, now the default landing route:
six tiles counting organizations, members (with a ready/installing/failed
rollup), fleet capacity, Stacks, Settings profiles, and node groups; a
"Needs attention" block (failed installs to retry, instances with a newer
build available checked on demand, and any Stack or Settings profile no
organization references) that collapses to one "all good" line; and an
"Organizations at a glance" list, one row per organization with its member
count and health.

A new read-only /overview.json aggregate serves the counts and signals by
reusing the Organizations member rollup. Fleet capacity loads lazily from
/nodes.json (a live grid query); the newer-build check is on demand so it
never hammers the build host on a page view. The top navigation page becomes
navigation only, with the fleet capacity strip moved onto the Overview. The
Manual now describes the Overview.

lhumina_code/home#299

Signed-by: mik-tf <mik-tf@noreply.invalid>
Seed the admin console URL onto each member instance at provision
Some checks failed
lab release / release (push) Has been cancelled
7c78868bbe
A member instance now receives its admin instance's own console URL
(core/ADMIN_CONSOLE_URL) plus the admin allowlist mirrored into the core
context (core/ADMIN_FORGE_USERS, previously seeded only in the deployer
context), so the member cockpit can render an admin-only link back to the
control plane. The URL is taken from an explicit ADMIN_CONSOLE_URL env or
derived from the admin OAuth callback host; it is empty when neither is
available, which leaves the member-side link hidden rather than wrong.

lhumina_code/home#300

Signed-by: mik-tf <mik-tf@noreply.invalid>
Gate the admin-console link on an admin-only list, seed it on update
All checks were successful
lab release / release (push) Successful in 38m18s
e49b6da141
The member cockpit now reads core/ADMIN_CONSOLE_USERS (the workspace admins
only) instead of the proxy login allowlist. The login allowlist also contains
the member's own username, so gating on it would have shown the admin-console
link to the member themselves. The install path and the Update all instances
path both seed core/ADMIN_CONSOLE_USERS plus the console URL, so the link
works on the existing fleet after a routine update, not only on fresh
installs. Seeds on the update path are non-fatal so they never abort the
binary update, which is the critical path.

lhumina_code/home#300

Signed-by: mik-tf <mik-tf@noreply.invalid>
Seed the tester Kimi config to drop the shell and file-read tools
All checks were successful
lab release / release (push) Successful in 34m39s
f1eddc4a0a
A sandbox tester VM is pre-fed shared provider API keys for the assistant. The
Kimi agent ships a shell tool that runs as the service user, so a browser-only
tester could read those shared keys out of the agent's environment or files. The
tester's seeded ~/.kimi/config.toml now sets exclude_tools to drop the shell and
the arbitrary-path file readers, which the agent honors, closing that read path
without affecting a developer's own coding agent.

lhumina_code/home#249

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: install the Hero OS desktop bundle on provision and update
Some checks failed
lab release / release (push) Has been cancelled
0d065b1e4e
The hero_os_admin service serves the Dioxus desktop from
~/hero/share/hero_os/public and refuses to start without it, but that
desktop is a WASM bundle (built with dx), not a musl service binary, so
the lab build/download loop never installs it. A fresh member therefore
booted hero_os_admin asset-less and the desktop only existed where it
had been copied by hand.

Add a step after the lab build loop that, when hero_os is in the enabled
set, fetches the published hero_os_app-web-dist.tar.gz for the active
release channel and unpacks it into ~/hero/share/hero_os/public. Runs on
a fresh provision and on "Update all instances", so members self-deliver
and self-update the desktop.

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: refresh the Hero OS desktop bundle on the update path too
All checks were successful
lab release / release (push) Successful in 21m11s
b0e1914526
The full install delivers the desktop bundle via setup-binaries.sh, but
"Update all instances" on an already-ready member runs update_vm_services,
which refreshes the service binaries through cockpit.upgrade_service and
never runs setup-binaries.sh. So a normal fleet update advanced the hero_os
binaries while leaving the WASM desktop bundle stale (or absent on a member
that never had it), the exact drift this pipeline is meant to remove.

Append a shared bundle-refresh block to the binary-update payload: when
hero_os is in the member's install set, fetch the published bundle for the
member's release channel, unpack it into ~/hero/share/hero_os/public, and
restart hero_os_admin. It runs after the binary update and is non-fatal, so
a missing asset can never block the critical binary path. Both delivery
paths (fresh install and update) now self-deliver the current desktop.

Signed-by: mik-tf <mik-tf@noreply.invalid>
docs(manual): vendor hero_office shared-engine service page
All checks were successful
lab release / release (push) Successful in 32m39s
5cdca0bdbf
Sync the deployer Manual's hero_office page with the platform docs: the
OnlyOffice editor is a shared engine on the admin instance (not a per-member
container), reaching back into each member to fetch and save documents that
never leave that member.

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: wire members to the shared OnlyOffice editor on install and update
All checks were successful
lab release / release (push) Successful in 34m24s
195e598311
When the admin instance runs a shared OnlyOffice Document Server (hub mode with
ONLYOFFICE_JWT_SECRET set) and a member's install set includes hero_office, seed
that member's OnlyOffice slots so the engine can fetch and save through the
member without bypassing its login gate: ONLYOFFICE_JWT_SECRET,
CONNECTOR_EXTERNAL_URL (the member's own hero_proxy at mycelium:9997),
OO_UPSTREAM_BASE (the admin engine at mycelium:80), and the
HERO_PROXY_PUBLIC_HERO_OFFICE carve-out flag.

The install path seeds the slots and restarts hero_office; the update path
("Update all instances") self-gates on hero_office being present, then seeds and
re-registers hero_office plus hero_proxy with --reset so the updated binaries'
new env blocks take effect. Read the shared secret from the deployer's own
ONLYOFFICE_JWT_SECRET env (a new service.toml env block; adding it needs a
deployer re-register).

Tracked at lhumina_code/home#304

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: point the OnlyOffice connector at the member FQDN, not mycelium
All checks were successful
lab release / release (push) Successful in 14m56s
4e636e1abb
The shared OnlyOffice Document Server fetches each member's document and
posts the save callback by calling the connector URL we seed into the
member (core/CONNECTOR_EXTERNAL_URL). We were seeding the member's
mycelium address as http://[<ipv6>]:9997, but the Document Server's URL
parser rejects bracketed IPv6 literals (ERR_INVALID_URL), so it could
never download the document or post the callback: editing failed with
"Download failed" the moment a real edit ran. A curl-based check passed
because curl handles bracketed IPv6, which hid the gap.

Seed the connector as the member's https FQDN instead, at both the
install and the update wiring sites. The engine then reaches the same two
carve-out paths (/files + /callback) through the member's public gateway,
still JWT-gated and still behind the login floor for everything else;
falls back to the mycelium form only if the FQDN is unset.

Proven on a live member: minting a valid JWT and fetching the document
over https://<member-fqdn>/hero_office/ui/files/... returns 200 with the
real document bytes (and 404 for a forged token).

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: seed each member its own Hero OS context at install and update
All checks were successful
lab release / release (push) Successful in 14m24s
c558c5bab5
Add a per-member Hero OS context seed to both the install SSH payload and
the "Update all instances" payload: set the member's core/HERO_OS_MEMBER_CONTEXT
slot (its forge username) and re-register hero_os_admin with --reset so the
new [[env]] enters the stored service def and takes effect. The desktop then
redirects every browser entry into the member's own workspace, so a member
lands in a context named for themselves and a support admin signing in lands
in the member's context too, instead of a build-time demo default.

Self-gated on hero_os being installed and best-effort (|| true) so it never
fails the install or update. Adds tester_context to TesterServiceUpdateParams
(populated from the member's forge username at the update-all call site) and
unit tests asserting both payloads seed the slot and re-register before the
restart.

lhumina_code/home#305

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: install hero_browser per member so Slides can export PDF/PPTX
All checks were successful
lab release / release (push) Successful in 31m32s
5d9e17555c
hero_slides renders its PDF and PPTX exports by driving a local headless
Chrome (hero_browser) over 127.0.0.1:8884. Add hero_browser to the member
install catalog (opt-in, off by default since it brings in a full Chrome),
pull it in automatically whenever Slides is enabled so an export never
silently fails, and install Google Chrome in the member setup script when
hero_browser is in the install set. Fresh provision and "Update all
instances" share the same resolver and install path.

lhumina_code/home#279

Signed-by: mik-tf <mik-tf@noreply.invalid>
deployer: default a member's Office to their own workspace
All checks were successful
lab release / release (push) Successful in 46m21s
0ae0a8be8c
hero_office stores documents under the active Hero context and falls
back to DEFAULT_CONTEXT when none is given. Seed that default to the
member's own context (its forge username) at install and update so a
member's Office lands in their own workspace instead of the build-time
"demo" default. DEFAULT_CONTEXT has a non-empty default in the office
service.toml, which shadows a plain secret set, so it is passed on the
lab re-register to take effect. Self-gated on hero_office being
installed and best-effort so it never fails an install.

Refs lhumina_code/home#306

Signed-by: mik-tf <mik-tf@noreply.invalid>
Make Browser, Slides, Planner, and Whiteboard default-on member apps
All checks were successful
lab release / release (push) Successful in 33m33s
2531140400
Remove hero_browser, hero_slides, hero_planner, and hero_whiteboard from
DEFAULT_OFF_APPS so a default member instance installs them out of the box.
DEFAULT_OFF_APPS is the single source feeding the provision default,
resolve_enabled_components, and the app_catalog default_on flag (and thus the
Launcher "Default" preset and the New-Stack checkbox state), so the provision
default and the UI preset never drift apart. hero_browser brings a headless
Chrome (the accepted cost of having Slides export work everywhere); the setup
script already installs Chrome when hero_browser is in the set, and the cockpit
already surfaces these apps on the Admin and Services pages. Office, Collab,
Biz, Code, Orchestrator, and the parked AI Broker stay opt-in.

lhumina_code/home#279

Signed-by: mik-tf <mik-tf@noreply.invalid>
Convergence step 1 (mechanical): base = integration's functional state (D-45
gateway routing etc.); point all hero_* git deps at the development branch and
rename herolib_derive -> herolib_macros to match the development libs. Source
build/API drift fixes follow in the next step.

Signed-by: mik-tf <mik-tf@noreply.invalid>
Signed-by: mik-tf <mik-tf@noreply.invalid>
Normalize deployer OpenRPC methods for the development macro stack, keep method params single-input, install hero_components with the admin app set, and wire deployer secret operations to the restored context-aware hero_proc SDK. Pin hero_proc_sdk to the merged development revision containing the context secrets API.

Refs: #30
Refs: lhumina_code/hero_proc#163

Signed-by: mik-tf <mik-tf@noreply.invalid>
Resolve the development merge by adopting the branch's ignored Cargo.lock policy while preserving the deployer SDK socket override needed for local admin tooling.

Refs: #30

Signed-by: mik-tf <mik-tf@noreply.invalid>
mik-tf merged commit b533e06972 into development 2026-06-24 04:33:07 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_os_tfgrid_deployer!31
No description provided.