Add JobsApi/RunsApi/ActionsApi::delete_many for batch cleanup paths (N round-trips today) #129

Closed
opened 2026-05-25 03:42:50 +00:00 by sameh-farouk · 1 comment
Member

What

handle_clean_by_tag (and similar bulk-cleanup paths in rpc/action.rs / rpc/run.rs) call db.jobs.delete() / db.runs.delete() in tight per-id loops. Each call is one pool.get().await + interact round-trip. On large cleanups (hundreds of jobs) this is N pool acquisitions, N interact spawns, and N small SQLite transactions — wasteful in both latency and pool pressure.

Pre-#124 vs post-#124

Pre-migration: each delete also took the mutex, but the mutex-acquisition cost was nanoseconds and there was no async yield between iterations. Post-migration: each .await is a yield point; the loop can be interleaved with any other handler at every iteration.

Suggested fix

Add a batch primitive:

  • jobs_model::delete_jobs_by_ids(conn, &[u32]) -> Result<u64> — single DELETE FROM jobs WHERE id IN (...) (or chunked if the id set is large enough to hit SQLite's expression-tree limit, typically 1000 args).
  • JobsApi::delete_many(ids: Vec<u32>) -> Result<u64, FactoryError> — single pool acquisition + one interact running the bulk delete.
  • Same shape for RunsApi::delete_many and ActionsApi::delete_many if they have similar per-id loops.

Update handle_clean_by_tag and any other discovered loop site.

Scope

~50 LOC implementation + ~30 LOC tests + 1-3 call-site swaps.

References

  • Found during external architect review of #124 (re-review v4); reviewer marked as non-blocking follow-up.
## What `handle_clean_by_tag` (and similar bulk-cleanup paths in `rpc/action.rs` / `rpc/run.rs`) call `db.jobs.delete()` / `db.runs.delete()` in tight per-id loops. Each call is one `pool.get().await + interact` round-trip. On large cleanups (hundreds of jobs) this is N pool acquisitions, N interact spawns, and N small SQLite transactions — wasteful in both latency and pool pressure. ## Pre-#124 vs post-#124 Pre-migration: each delete also took the mutex, but the mutex-acquisition cost was nanoseconds and there was no async yield between iterations. Post-migration: each `.await` is a yield point; the loop can be interleaved with any other handler at every iteration. ## Suggested fix Add a batch primitive: - `jobs_model::delete_jobs_by_ids(conn, &[u32]) -> Result<u64>` — single `DELETE FROM jobs WHERE id IN (...)` (or chunked if the id set is large enough to hit SQLite's expression-tree limit, typically 1000 args). - `JobsApi::delete_many(ids: Vec<u32>) -> Result<u64, FactoryError>` — single pool acquisition + one interact running the bulk delete. - Same shape for `RunsApi::delete_many` and `ActionsApi::delete_many` if they have similar per-id loops. Update `handle_clean_by_tag` and any other discovered loop site. ## Scope ~50 LOC implementation + ~30 LOC tests + 1-3 call-site swaps. ## References - Found during external architect review of #124 (re-review v4); reviewer marked as non-blocking follow-up.
Author
Member

Closed by ef6d564 on origin/development.

Summary in commit body: adopted db.transaction() in run.submit / run.create_with_jobs / service_define_inner (#128); added JobsApi/RunsApi/ActionsApi::delete_many and swapped 2 known loop sites (#129); also restored spawn_blocking around git CLI in do_pull_secrets/do_push_secrets (worker-stall regression from #124).

Verification: lib regression 242/0; Stage 6 release daemon clean (run.submit + wipe_all + 50× parallel status_all + 20-job churn all green); integration baseline preserved (62 PASS / 7 FAIL, all pre-existing env-only).

External architect review: APPROVE, 5 non-blocking weaknesses (2 fixed in b2fd17c cleanup before squash: dead actions_model alias, now_ms clock-skew window).

Closed by `ef6d564` on `origin/development`. Summary in commit body: adopted `db.transaction()` in `run.submit` / `run.create_with_jobs` / `service_define_inner` (#128); added `JobsApi/RunsApi/ActionsApi::delete_many` and swapped 2 known loop sites (#129); also restored `spawn_blocking` around git CLI in `do_pull_secrets`/`do_push_secrets` (worker-stall regression from #124). Verification: lib regression 242/0; Stage 6 release daemon clean (run.submit + wipe_all + 50× parallel status_all + 20-job churn all green); integration baseline preserved (62 PASS / 7 FAIL, all pre-existing env-only). External architect review: APPROVE, 5 non-blocking weaknesses (2 fixed in `b2fd17c` cleanup before squash: dead actions_model alias, now_ms clock-skew window).
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_proc#129
No description provided.