lhumina_code/hero_os_tfgrid_deployer

Fork 0

[deployer] latest-integration release binary is stale (pre-M13), self-upgrade panics on the live DB #29

New issue

Closed

opened 2026-06-14 13:53:52 +00:00 by mik-tf · 2 comments

mik-tf commented

2026-06-14 13:53:52 +00:00

Owner

The latest-integration deployer binary asset is stale: it predates the M13 schema migration (the composite VM-identity migration). Downloading and running it on an admin VM whose deployer.db is already at M13 panics at startup:

build AppState failed: rusqlite_migration to_latest: Attempt to migrate a database with a migration number that is too high

The process binds its rpc.sock and then exits, so the supervisor reports the service failed and there is no rpc.sock. A source build of the current integration HEAD opens the same M13 database fine, so this is the released asset being behind the source, not a code bug.

How it was found: refreshing the admin VM deployer by downloading latest-integration produced a binary that would not start; an isolated run surfaced the migration error above. The admin VM was recovered by installing a binary built from the current source.

Impact: a dashboard-driven self-upgrade of the deployer (download latest-integration, restart) lands this stale binary and the deployer fails to come up; a fresh install also misses every change since M13. The published cockpit asset is similarly behind the source (older Admin page), though it still boots because it has no incompatible migration. So recent green CI publish runs do not appear to be updating the release binary assets. Related install-side symptom: lhumina_code/hero_skills#303

Signed-by: mik-tf mik-tf@noreply.invalid

The `latest-integration` deployer binary asset is stale: it predates the M13 schema migration (the composite VM-identity migration). Downloading and running it on an admin VM whose `deployer.db` is already at M13 panics at startup: ``` build AppState failed: rusqlite_migration to_latest: Attempt to migrate a database with a migration number that is too high ``` The process binds its rpc.sock and then exits, so the supervisor reports the service failed and there is no rpc.sock. A source build of the current integration HEAD opens the same M13 database fine, so this is the released asset being behind the source, not a code bug. How it was found: refreshing the admin VM deployer by downloading `latest-integration` produced a binary that would not start; an isolated run surfaced the migration error above. The admin VM was recovered by installing a binary built from the current source. Impact: a dashboard-driven self-upgrade of the deployer (download latest-integration, restart) lands this stale binary and the deployer fails to come up; a fresh install also misses every change since M13. The published cockpit asset is similarly behind the source (older Admin page), though it still boots because it has no incompatible migration. So recent green CI publish runs do not appear to be updating the release binary assets. Related install-side symptom: https://forge.ourworld.tf/lhumina_code/hero_skills/issues/303 Signed-by: mik-tf <mik-tf@noreply.invalid>

mik-tf referenced this issue

2026-06-14 13:54:06 +00:00

[deployer] latest-integration release binary is stale (pre-M13) — self-upgrade panics on the live DB #28

mik-tf referenced this issue from a commit

2026-06-14 14:03:40 +00:00

ci(release): clear stale release assets before lab upload

mik-tf commented

2026-06-14 14:04:04 +00:00

Author

Owner

Root cause confirmed: the release record refreshes on every push (latest published 2026-06-14), but the binary assets were frozen at 2026-06-12. lab build --upload skips assets that already exist by name, so the rolling release never got new binaries after the first upload. That stale binary predates the M13 migration, hence the panic on the live DB.

Fix pushed to integration (39513ae): the release workflow now deletes the tag's existing assets via the Forge API before lab build --upload, so each push republishes fresh binaries. This repo's workflow is patched first to verify; the same change should land in the canonical workflow template so every repo benefits. Verifying the next CI run reuploads dated-today assets.

Signed-by: mik-tf mik-tf@noreply.invalid

Root cause confirmed: the release record refreshes on every push (latest published 2026-06-14), but the binary assets were frozen at 2026-06-12. `lab build --upload` skips assets that already exist by name, so the rolling release never got new binaries after the first upload. That stale binary predates the M13 migration, hence the panic on the live DB. Fix pushed to integration (39513ae): the release workflow now deletes the tag's existing assets via the Forge API before `lab build --upload`, so each push republishes fresh binaries. This repo's workflow is patched first to verify; the same change should land in the canonical workflow template so every repo benefits. Verifying the next CI run reuploads dated-today assets. Signed-by: mik-tf <mik-tf@noreply.invalid>

mik-tf referenced this issue from a commit

2026-06-14 14:49:55 +00:00

ci(release): clear stale release assets before lab upload

mik-tf referenced this issue from lhumina_code/hero_skills

2026-06-14 14:50:27 +00:00

[lab] release publish skips re-uploading changed binaries, freezing rolling latest-* assets #323

mik-tf commented

2026-06-14 14:52:16 +00:00

Author

Owner

Fixed and verified.

Cause: the release publish step did not replace assets that already existed by name, so the rolling latest-integration binaries stayed frozen at their first upload (2026-06-12, pre-M13) while the release record kept refreshing. A self-upgrade downloaded that stale pre-M13 binary, which paniced opening the migrated DB.

Fix (this repo, commit 39513ae; same change on hero_cockpit commit 8988dc1): the lab-release workflow deletes the tag's existing assets via the Forge API before lab build --upload, forcing a fresh upload on every push.

Verified: the next push republished all deployer binaries dated today (2026-06-14), the release kept all 12 assets (not emptied), and the freshly-published binary was downloaded and confirmed to boot on the live M13 database and serve RPC. So a dashboard-driven deployer self-upgrade now lands a current, compatible binary.

Permanent fix tracked at lhumina_code/hero_skills#323 (make the CI lab-builder image always replace assets, so the per-repo workaround can be dropped).

Signed-by: mik-tf mik-tf@noreply.invalid

Fixed and verified. Cause: the release publish step did not replace assets that already existed by name, so the rolling latest-integration binaries stayed frozen at their first upload (2026-06-12, pre-M13) while the release record kept refreshing. A self-upgrade downloaded that stale pre-M13 binary, which paniced opening the migrated DB. Fix (this repo, commit 39513ae; same change on hero_cockpit commit 8988dc1): the lab-release workflow deletes the tag's existing assets via the Forge API before `lab build --upload`, forcing a fresh upload on every push. Verified: the next push republished all deployer binaries dated today (2026-06-14), the release kept all 12 assets (not emptied), and the freshly-published binary was downloaded and confirmed to boot on the live M13 database and serve RPC. So a dashboard-driven deployer self-upgrade now lands a current, compatible binary. Permanent fix tracked at https://forge.ourworld.tf/lhumina_code/hero_skills/issues/323 (make the CI lab-builder image always replace assets, so the per-repo workaround can be dropped). Signed-by: mik-tf <mik-tf@noreply.invalid>

mik-tf closed this issue

2026-06-14 14:52:26 +00:00

No labels

No milestone

No project

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference

lhumina_code/hero_os_tfgrid_deployer#29

No description provided.

Rows
Columns