[deployer] latest-integration release binary is stale (pre-M13), self-upgrade panics on the live DB #29
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The
latest-integrationdeployer binary asset is stale: it predates the M13 schema migration (the composite VM-identity migration). Downloading and running it on an admin VM whosedeployer.dbis already at M13 panics at startup:The process binds its rpc.sock and then exits, so the supervisor reports the service failed and there is no rpc.sock. A source build of the current integration HEAD opens the same M13 database fine, so this is the released asset being behind the source, not a code bug.
How it was found: refreshing the admin VM deployer by downloading
latest-integrationproduced a binary that would not start; an isolated run surfaced the migration error above. The admin VM was recovered by installing a binary built from the current source.Impact: a dashboard-driven self-upgrade of the deployer (download latest-integration, restart) lands this stale binary and the deployer fails to come up; a fresh install also misses every change since M13. The published cockpit asset is similarly behind the source (older Admin page), though it still boots because it has no incompatible migration. So recent green CI publish runs do not appear to be updating the release binary assets. Related install-side symptom: lhumina_code/hero_skills#303
Signed-by: mik-tf mik-tf@noreply.invalid
Root cause confirmed: the release record refreshes on every push (latest published 2026-06-14), but the binary assets were frozen at 2026-06-12.
lab build --uploadskips assets that already exist by name, so the rolling release never got new binaries after the first upload. That stale binary predates the M13 migration, hence the panic on the live DB.Fix pushed to integration (
39513ae): the release workflow now deletes the tag's existing assets via the Forge API beforelab build --upload, so each push republishes fresh binaries. This repo's workflow is patched first to verify; the same change should land in the canonical workflow template so every repo benefits. Verifying the next CI run reuploads dated-today assets.Signed-by: mik-tf mik-tf@noreply.invalid
Fixed and verified.
Cause: the release publish step did not replace assets that already existed by name, so the rolling latest-integration binaries stayed frozen at their first upload (2026-06-12, pre-M13) while the release record kept refreshing. A self-upgrade downloaded that stale pre-M13 binary, which paniced opening the migrated DB.
Fix (this repo, commit
39513ae; same change on hero_cockpit commit 8988dc1): the lab-release workflow deletes the tag's existing assets via the Forge API beforelab build --upload, forcing a fresh upload on every push.Verified: the next push republished all deployer binaries dated today (2026-06-14), the release kept all 12 assets (not emptied), and the freshly-published binary was downloaded and confirmed to boot on the live M13 database and serve RPC. So a dashboard-driven deployer self-upgrade now lands a current, compatible binary.
Permanent fix tracked at lhumina_code/hero_skills#323 (make the CI lab-builder image always replace assets, so the per-repo workaround can be dropped).
Signed-by: mik-tf mik-tf@noreply.invalid