Cockpit shows all apps with one-click bundle install; deployer pre-installs the right tester set; runbook matches #241

Open
opened 2026-05-30 15:29:06 +00:00 by mik-tf · 1 comment
Owner

A tester who opens the cockpit should see the full app set, use the ones that are pre-installed, and install the rest with one click. None of this should be a manual operator step: the deployer must install the right set on a fresh tester, and the runbook must document the admin-side hub and the tester-side set accurately. This issue tracks making that true end to end.

Today there are gaps. On a live tester we found: only the Hero Books server binary was running (no web, no admin), so the librarian had no UI; the cockpit catalog lists each app by its admin binary only, so the server and web variants never show; hero_memory (the librarian's backing store) is not in the cockpit catalog; and the deployment runbook still lists services that the current architecture removed. The validation we did to confirm the pieces work was done by hand on one tester. The point of this issue is to move all of it into the deployer and the runbook so it is automatic.

The model we agreed (cockpit UX)

A service ships as a bundle of binaries (server, web, admin). The cockpit treats the unit of install as the app and the unit of display as the binary:

  • When none of an app's binaries are installed: show one collapsed row, for example "Hero Books", with an Install button. Clicking it installs and starts all of the app's binaries together (server, web, admin). No way to install a half-app.
  • When any of the app's binaries are running: show the individual binary rows as they come from the process supervisor today, each with its own state, URL, and start/stop/restart/logs actions.

This is simple and avoids the footgun of installing a web frontend without its server.

Three coordinated pieces

1. Cockpit catalog (hero_cockpit)

  • Restructure the catalog from one admin-binary-per-app to one entry per app that lists its binary set (the server/web/admin binaries that the process supervisor manages). Binary sets are read from each repo's service manifests, for example Hero Books is server + web + admin, Hero Agent is server + admin, some apps are admin only.
  • list_catalog: an app is installed if any of its binaries is known to the supervisor. Uninstalled apps emit one bundle row; installed apps emit nothing extra because their binaries already appear from the supervisor's running set.
  • install_service: after the build step, start each binary in the app's set (today it starts a single binary). Validate the requested app name against the catalog.
  • Add hero_memory to the catalog.

2. Deployer tester service set (hero_os_tfgrid_deployer)

This is the piece that makes it not-manual. A fresh tester must come up pre-installed with the agreed set, matching what we proved by hand on the live tester:

  • Hero Books as a full app: server + web + admin, registered with the supervisor (not server only).
  • hero_memory: the per-tester backing store the librarian and assistant read and write.
  • The Hero Voice widget on the tester (the voice bar in the cockpit), talking to the shared voice engine on the admin VM.
  • Already shipped earlier and part of this same flow: the four default public libraries seeded at first start, and the library directory fix so seeding lands in a writable path.

The pre-installed set stays as the current default (the apps that auto-install today); the extended set remains install-on-click via the cockpit. We can tune membership later, but the mechanism must be the deployer, not a manual step.

3. Deployment runbook (home docs)

The admin-VM deployment runbook needs to match the current architecture and document the non-manual flow:

  • The admin VM hosts the shared engines: the embedding provider and the voice provider. Document standing these up.
  • The tester VM hosts the per-tester data and UI: cockpit, books (all three binaries), memory, agent, the voice widget, plus the rest of the demo apps.
  • Fix the stale "demo-app scope" section, which still lists an embedder and an AI broker as always-running on the tester. The embedding work now runs as a shared provider on the admin VM, and the AI broker is currently off the active path. The runbook should describe the current split: shared stateless engines on the admin VM, per-tester data and UI on the tester VM, talking to the engines over the overlay network with per-tester auth.

Definition of done

  • A freshly provisioned tester, with no manual steps, shows in its cockpit: the pre-installed apps running with working URLs (including Hero Books web), and the remaining catalog apps as one-click Install bundles.
  • Clicking Install on an uninstalled app builds and starts all of its binaries and the app then shows its individual binary rows.
  • The runbook describes this exact flow and the admin-vs-tester split with no stale services.

Notes

  • Deploy method while the publish pipeline is being stabilised: build locally and copy to the VM, then register and start. This is a deploy shortcut; the changes still land on the default branch so the pipeline and other machines stay in sync.
  • Related and separate: the libraries' generated question-and-answer data is currently not published back to the library repos, so each tester re-pays the language-model cost. That is tracked in hero_books and is a different fix from this issue.
A tester who opens the cockpit should see the full app set, use the ones that are pre-installed, and install the rest with one click. None of this should be a manual operator step: the deployer must install the right set on a fresh tester, and the runbook must document the admin-side hub and the tester-side set accurately. This issue tracks making that true end to end. Today there are gaps. On a live tester we found: only the Hero Books server binary was running (no web, no admin), so the librarian had no UI; the cockpit catalog lists each app by its admin binary only, so the server and web variants never show; hero_memory (the librarian's backing store) is not in the cockpit catalog; and the deployment runbook still lists services that the current architecture removed. The validation we did to confirm the pieces work was done by hand on one tester. The point of this issue is to move all of it into the deployer and the runbook so it is automatic. ## The model we agreed (cockpit UX) A service ships as a bundle of binaries (server, web, admin). The cockpit treats the unit of install as the app and the unit of display as the binary: - When none of an app's binaries are installed: show one collapsed row, for example "Hero Books", with an Install button. Clicking it installs and starts all of the app's binaries together (server, web, admin). No way to install a half-app. - When any of the app's binaries are running: show the individual binary rows as they come from the process supervisor today, each with its own state, URL, and start/stop/restart/logs actions. This is simple and avoids the footgun of installing a web frontend without its server. ## Three coordinated pieces ### 1. Cockpit catalog (hero_cockpit) - Restructure the catalog from one admin-binary-per-app to one entry per app that lists its binary set (the server/web/admin binaries that the process supervisor manages). Binary sets are read from each repo's service manifests, for example Hero Books is server + web + admin, Hero Agent is server + admin, some apps are admin only. - list_catalog: an app is installed if any of its binaries is known to the supervisor. Uninstalled apps emit one bundle row; installed apps emit nothing extra because their binaries already appear from the supervisor's running set. - install_service: after the build step, start each binary in the app's set (today it starts a single binary). Validate the requested app name against the catalog. - Add hero_memory to the catalog. ### 2. Deployer tester service set (hero_os_tfgrid_deployer) This is the piece that makes it not-manual. A fresh tester must come up pre-installed with the agreed set, matching what we proved by hand on the live tester: - Hero Books as a full app: server + web + admin, registered with the supervisor (not server only). - hero_memory: the per-tester backing store the librarian and assistant read and write. - The Hero Voice widget on the tester (the voice bar in the cockpit), talking to the shared voice engine on the admin VM. - Already shipped earlier and part of this same flow: the four default public libraries seeded at first start, and the library directory fix so seeding lands in a writable path. The pre-installed set stays as the current default (the apps that auto-install today); the extended set remains install-on-click via the cockpit. We can tune membership later, but the mechanism must be the deployer, not a manual step. ### 3. Deployment runbook (home docs) The admin-VM deployment runbook needs to match the current architecture and document the non-manual flow: - The admin VM hosts the shared engines: the embedding provider and the voice provider. Document standing these up. - The tester VM hosts the per-tester data and UI: cockpit, books (all three binaries), memory, agent, the voice widget, plus the rest of the demo apps. - Fix the stale "demo-app scope" section, which still lists an embedder and an AI broker as always-running on the tester. The embedding work now runs as a shared provider on the admin VM, and the AI broker is currently off the active path. The runbook should describe the current split: shared stateless engines on the admin VM, per-tester data and UI on the tester VM, talking to the engines over the overlay network with per-tester auth. ## Definition of done - A freshly provisioned tester, with no manual steps, shows in its cockpit: the pre-installed apps running with working URLs (including Hero Books web), and the remaining catalog apps as one-click Install bundles. - Clicking Install on an uninstalled app builds and starts all of its binaries and the app then shows its individual binary rows. - The runbook describes this exact flow and the admin-vs-tester split with no stale services. ## Notes - Deploy method while the publish pipeline is being stabilised: build locally and copy to the VM, then register and start. This is a deploy shortcut; the changes still land on the default branch so the pipeline and other machines stay in sync. - Related and separate: the libraries' generated question-and-answer data is currently not published back to the library repos, so each tester re-pays the language-model cost. That is tracked in hero_books and is a different fix from this issue.
Author
Owner

Progress on the three pieces.

Piece 1, cockpit catalog: done and merged. The catalog is now modelled as apps with binary bundles. Each app lists its full set of server/web/admin binaries; an app counts as installed when any of its binaries is running, uninstalled apps render as one collapsed Install row that builds the repo once and starts every binary in the set, and installed apps show their individual binary rows. Hero Memory was added. A test pins each app's binary set to the audited repo manifests so a variant can never be silently dropped (this caught a missing Hero Foundry web binary, now fixed). Verified live on the running tester: Hero Books shows its server, web, and admin rows, and the uninstalled apps show one-click bundle Install rows.

Piece 2, deployer tester set: already correct, no change needed. The deployer's per-tester install set already lists memory, books, and voice, and the installer already registers each app's server, web, and admin variants. The reason the live tester was missing some web and admin binaries is that it was provisioned before the multi-binary layout existed, so it is a stale machine, not a deployer gap. A genuinely fresh provision produces the full set. Proving that end to end needs the provisioning environment unblocked (the compute node lookup error tracked separately); the repaired live tester stands in until then.

Piece 3, runbook: done and merged. The deployment runbook now describes the real split (shared engines on the admin VM, per-tester data and UI on the tester VM), adds a step to start the embedding and voice providers on the admin VM, and rewrites the demo-app scope to the cockpit's app-bundle model. The stale references to a per-tester embedder and AI broker are gone.

Net: the cockpit UX and the docs match the agreed model, and the install path is driven by the deployer and the catalog rather than manual steps. The one open verification is a fresh provision once the compute environment is available.

Progress on the three pieces. Piece 1, cockpit catalog: done and merged. The catalog is now modelled as apps with binary bundles. Each app lists its full set of server/web/admin binaries; an app counts as installed when any of its binaries is running, uninstalled apps render as one collapsed Install row that builds the repo once and starts every binary in the set, and installed apps show their individual binary rows. Hero Memory was added. A test pins each app's binary set to the audited repo manifests so a variant can never be silently dropped (this caught a missing Hero Foundry web binary, now fixed). Verified live on the running tester: Hero Books shows its server, web, and admin rows, and the uninstalled apps show one-click bundle Install rows. Piece 2, deployer tester set: already correct, no change needed. The deployer's per-tester install set already lists memory, books, and voice, and the installer already registers each app's server, web, and admin variants. The reason the live tester was missing some web and admin binaries is that it was provisioned before the multi-binary layout existed, so it is a stale machine, not a deployer gap. A genuinely fresh provision produces the full set. Proving that end to end needs the provisioning environment unblocked (the compute node lookup error tracked separately); the repaired live tester stands in until then. Piece 3, runbook: done and merged. The deployment runbook now describes the real split (shared engines on the admin VM, per-tester data and UI on the tester VM), adds a step to start the embedding and voice providers on the admin VM, and rewrites the demo-app scope to the cockpit's app-bundle model. The stale references to a per-tester embedder and AI broker are gone. Net: the cockpit UX and the docs match the agreed model, and the install path is driven by the deployer and the catalog rather than manual steps. The one open verification is a fresh provision once the compute environment is available.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/home#241
No description provided.