Integration tests leak scheduled actions into persistent hero_proc.db #126

Open
opened 2026-05-24 16:32:54 +00:00 by mik-tf · 0 comments
Owner

Integration tests register scheduled actions (cron and interval) into the persistent actions table at /home/pctwo/hero/var/hero_proc.db. The tests never clean up on exit, so the actions survive in the database. On the next hero_proc_server restart the supervisor loads them from disk and immediately begins firing them at full cadence. Today a single workstation had 45 such leaked actions (names like sched-interval-short-8398, sched-rapid-8398, sched-multi-context, sched19-66171, plus 14 sched-window-*-8398 variants) that together created 2,429 jobs in 1h17m of supervisor uptime, 9,276 invalid-cron WARN entries, sustained 83% CPU, and exhausted the RPC budget within roughly 45 minutes of every restart. Recovery is DELETE FROM actions WHERE name LIKE 'sched-%' OR name LIKE 'sched%-66171' plus VACUUM, which reduces the database from 352 MB to 909 KB. Two structural fix options to consider: (a) the integration test harness uses an ephemeral test database under a temp directory and never touches the operator database, or (b) every test-registered action carries a test=true tag and the supervisor purges all test=true actions on startup before resuming the scheduler.

Signed-by: mik-tf mik-tf@noreply.invalid

Integration tests register scheduled actions (cron and interval) into the persistent `actions` table at /home/pctwo/hero/var/hero_proc.db. The tests never clean up on exit, so the actions survive in the database. On the next `hero_proc_server` restart the supervisor loads them from disk and immediately begins firing them at full cadence. Today a single workstation had 45 such leaked actions (names like `sched-interval-short-8398`, `sched-rapid-8398`, `sched-multi-context`, `sched19-66171`, plus 14 `sched-window-*-8398` variants) that together created 2,429 jobs in 1h17m of supervisor uptime, 9,276 invalid-cron WARN entries, sustained 83% CPU, and exhausted the RPC budget within roughly 45 minutes of every restart. Recovery is `DELETE FROM actions WHERE name LIKE 'sched-%' OR name LIKE 'sched%-66171'` plus `VACUUM`, which reduces the database from 352 MB to 909 KB. Two structural fix options to consider: (a) the integration test harness uses an ephemeral test database under a temp directory and never touches the operator database, or (b) every test-registered action carries a `test=true` tag and the supervisor purges all `test=true` actions on startup before resuming the scheduler. Signed-by: mik-tf <mik-tf@noreply.invalid>
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_proc#126
No description provided.