[bug] init_schema: idx_jobs_archived created before its ALTER TABLE migration — breaks DB upgrade from pre-58d82b7 #91
Labels
No labels
prio_critical
prio_low
type_bug
type_contact
type_issue
type_lead
type_question
type_story
type_task
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/hero_proc#91
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
On any hero_proc DB created before commit
58d82b7(which addedjobs.archived), the new binary aborts on startup with:Hit live during a herodemo deploy on 2026-05-01.
Cause
crates/hero_proc_lib/src/db/jobs/model.rs:457-518has the canonical CREATE TABLE batch (lines 458-489) ending with fiveCREATE INDEXstatements, including line 489:That runs in the same
execute_batchas the CREATE TABLE. On a pre-existing DB without thearchivedcolumn:CREATE TABLE IF NOT EXISTS jobs (...)is a no-op (table already exists)CREATE INDEX ... ON jobs(archived)fails (old table has noarchivedcolumn)?operator returns the errorLines 516-517 already do the right thing AFTER the migration:
So the index is created in two places — but the early one (line 489) breaks the upgrade before the migration can run.
Fix
Delete line 489. The index gets created at line 517 after the column exists. Fresh DBs are unaffected because the canonical CREATE TABLE on line 483 already includes
archived.runs/model.rs:133-173uses the same two-stage pattern correctly: CREATE TABLE → ALTER TABLE migrations → CREATE INDEX in a separate batch. That's the reference shape.Convention worth adopting
When adding a column + its index in this codebase, both belong in the migration block at the bottom of
init_schema, never in the canonical CREATE TABLE batch. The first batch is for fresh-DB schema; ALTER TABLE blocks are for upgrading old DBs. Indexes on new columns must run after their migration, not alongside the canonical CREATE.A short comment at the top of
init_schemacould state the rule explicitly.Impact
Took down the herodemo deploy on 2026-05-01. Only recovery was
service_proc start --clear, which wipes all service registrations and forces a full re-installation cascade (~30-60 min). With this one-line fix, the same upgrade would be ~30 seconds with zero state loss — the proper deploy path.Related: hero_proc#87 (the deployment-time deploy lag is what surfaced this).
Closing as "by design" — design intent is wipe-not-migrate
Per CEO: hero_proc's DB stores operational state (services, jobs, runs, secrets, actions), all reconstructable from authoritative sources elsewhere — TOML service manifests, secrets.toml, on-disk action definitions. The intended path on schema change is
service_proc start --clear(wipe + rebuild), not runtime migration.Rationale: a DB you can rebuild in 30 seconds from authoritative sources doesn't benefit from migration code. Migration paths add bug surface (rollback edge cases, partial-state recovery, schema drift) for zero meaningful gain. Wiping is simpler, faster, and bulletproof.
The deploy-time CREATE INDEX error this issue called out is the intended trigger for the operator to run
--clear. Adding migration logic that lets the binary boot against an old DB contradicts the design.The squash-merge
f831243has been reverted by90d995bondevelopment. Closing this issue.If the design preference here changes in future (e.g. hero_proc starts holding state that's NOT cheap to rebuild), this issue can be re-opened with that new context.