[bug] init_schema: idx_jobs_archived created before its ALTER TABLE migration — breaks DB upgrade from pre-58d82b7 #91

Closed
opened 2026-05-01 15:10:31 +00:00 by mik-tf · 1 comment
Owner

Summary

On any hero_proc DB created before commit 58d82b7 (which added jobs.archived), the new binary aborts on startup with:

Error: Database error: no such column: archived in
        CREATE INDEX IF NOT EXISTS idx_jobs_archived ON jobs(archived); at offset 62

Hit live during a herodemo deploy on 2026-05-01.

Cause

crates/hero_proc_lib/src/db/jobs/model.rs:457-518 has the canonical CREATE TABLE batch (lines 458-489) ending with five CREATE INDEX statements, including line 489:

CREATE INDEX IF NOT EXISTS idx_jobs_archived ON jobs(archived);

That runs in the same execute_batch as the CREATE TABLE. On a pre-existing DB without the archived column:

  • CREATE TABLE IF NOT EXISTS jobs (...) is a no-op (table already exists)
  • CREATE INDEX ... ON jobs(archived) fails (old table has no archived column)
  • The ? operator returns the error
  • The migration block at lines 491-518 never executes

Lines 516-517 already do the right thing AFTER the migration:

let _ = conn.execute_batch("ALTER TABLE jobs ADD COLUMN archived INTEGER NOT NULL DEFAULT 0;");
let _ = conn.execute_batch("CREATE INDEX IF NOT EXISTS idx_jobs_archived ON jobs(archived);");

So the index is created in two places — but the early one (line 489) breaks the upgrade before the migration can run.

Fix

Delete line 489. The index gets created at line 517 after the column exists. Fresh DBs are unaffected because the canonical CREATE TABLE on line 483 already includes archived.

runs/model.rs:133-173 uses the same two-stage pattern correctly: CREATE TABLE → ALTER TABLE migrations → CREATE INDEX in a separate batch. That's the reference shape.

Convention worth adopting

When adding a column + its index in this codebase, both belong in the migration block at the bottom of init_schema, never in the canonical CREATE TABLE batch. The first batch is for fresh-DB schema; ALTER TABLE blocks are for upgrading old DBs. Indexes on new columns must run after their migration, not alongside the canonical CREATE.

A short comment at the top of init_schema could state the rule explicitly.

Impact

Took down the herodemo deploy on 2026-05-01. Only recovery was service_proc start --clear, which wipes all service registrations and forces a full re-installation cascade (~30-60 min). With this one-line fix, the same upgrade would be ~30 seconds with zero state loss — the proper deploy path.

Related: hero_proc#87 (the deployment-time deploy lag is what surfaced this).

## Summary On any hero_proc DB created before commit `58d82b7` (which added `jobs.archived`), the new binary aborts on startup with: ``` Error: Database error: no such column: archived in CREATE INDEX IF NOT EXISTS idx_jobs_archived ON jobs(archived); at offset 62 ``` Hit live during a herodemo deploy on 2026-05-01. ## Cause [`crates/hero_proc_lib/src/db/jobs/model.rs:457-518`](https://forge.ourworld.tf/lhumina_code/hero_proc/src/branch/development/crates/hero_proc_lib/src/db/jobs/model.rs#L457-L518) has the canonical CREATE TABLE batch (lines 458-489) ending with five `CREATE INDEX` statements, including line 489: ```sql CREATE INDEX IF NOT EXISTS idx_jobs_archived ON jobs(archived); ``` That runs in the same `execute_batch` as the CREATE TABLE. On a pre-existing DB without the `archived` column: - `CREATE TABLE IF NOT EXISTS jobs (...)` is a no-op (table already exists) - `CREATE INDEX ... ON jobs(archived)` fails (old table has no `archived` column) - The `?` operator returns the error - The migration block at lines 491-518 never executes Lines 516-517 already do the right thing AFTER the migration: ```rust let _ = conn.execute_batch("ALTER TABLE jobs ADD COLUMN archived INTEGER NOT NULL DEFAULT 0;"); let _ = conn.execute_batch("CREATE INDEX IF NOT EXISTS idx_jobs_archived ON jobs(archived);"); ``` So the index is created in two places — but the early one (line 489) breaks the upgrade before the migration can run. ## Fix Delete line 489. The index gets created at line 517 after the column exists. Fresh DBs are unaffected because the canonical CREATE TABLE on line 483 already includes `archived`. [`runs/model.rs:133-173`](https://forge.ourworld.tf/lhumina_code/hero_proc/src/branch/development/crates/hero_proc_lib/src/db/runs/model.rs#L133-L173) uses the same two-stage pattern correctly: CREATE TABLE → ALTER TABLE migrations → CREATE INDEX in a separate batch. That's the reference shape. ## Convention worth adopting When adding a column + its index in this codebase, both belong in the migration block at the bottom of `init_schema`, never in the canonical CREATE TABLE batch. The first batch is for fresh-DB schema; ALTER TABLE blocks are for upgrading old DBs. Indexes on new columns must run after their migration, not alongside the canonical CREATE. A short comment at the top of `init_schema` could state the rule explicitly. ## Impact Took down the herodemo deploy on 2026-05-01. Only recovery was `service_proc start --clear`, which wipes all service registrations and forces a full re-installation cascade (~30-60 min). With this one-line fix, the same upgrade would be ~30 seconds with zero state loss — the proper deploy path. Related: [hero_proc#87](https://forge.ourworld.tf/lhumina_code/hero_proc/issues/87) (the deployment-time deploy lag is what surfaced this).
Author
Owner

Closing as "by design" — design intent is wipe-not-migrate

Per CEO: hero_proc's DB stores operational state (services, jobs, runs, secrets, actions), all reconstructable from authoritative sources elsewhere — TOML service manifests, secrets.toml, on-disk action definitions. The intended path on schema change is service_proc start --clear (wipe + rebuild), not runtime migration.

Rationale: a DB you can rebuild in 30 seconds from authoritative sources doesn't benefit from migration code. Migration paths add bug surface (rollback edge cases, partial-state recovery, schema drift) for zero meaningful gain. Wiping is simpler, faster, and bulletproof.

The deploy-time CREATE INDEX error this issue called out is the intended trigger for the operator to run --clear. Adding migration logic that lets the binary boot against an old DB contradicts the design.

The squash-merge f831243 has been reverted by 90d995b on development. Closing this issue.

If the design preference here changes in future (e.g. hero_proc starts holding state that's NOT cheap to rebuild), this issue can be re-opened with that new context.

## Closing as "by design" — design intent is wipe-not-migrate Per CEO: hero_proc's DB stores **operational state** (services, jobs, runs, secrets, actions), all reconstructable from authoritative sources elsewhere — TOML service manifests, secrets.toml, on-disk action definitions. The intended path on schema change is `service_proc start --clear` (wipe + rebuild), not runtime migration. Rationale: a DB you can rebuild in 30 seconds from authoritative sources doesn't benefit from migration code. Migration paths add bug surface (rollback edge cases, partial-state recovery, schema drift) for zero meaningful gain. Wiping is simpler, faster, and bulletproof. The deploy-time CREATE INDEX error this issue called out is the **intended trigger** for the operator to run `--clear`. Adding migration logic that lets the binary boot against an old DB contradicts the design. The squash-merge `f831243` has been reverted by `90d995b` on `development`. Closing this issue. If the design preference here changes in future (e.g. hero_proc starts holding state that's NOT cheap to rebuild), this issue can be re-opened with that new context.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_proc#91
No description provided.