Job dependency enforcement: block child jobs when parent fails #64

Open
opened 2026-04-29 04:22:37 +00:00 by despiegk · 0 comments
Owner

Context

A job whose dependency has reached failed should not run — it should be marked failed or cancelled, not wait forever. Today the supervisor's job dependency code does not enforce this: a child whose depends_on parent failed will still execute (or hang in waiting).

A test already exists but is #[ignore]d:

  • tests/integration/tests/jobs.rs:297test_job_failed_dependency_blocks#[ignore = "job dependency enforcement (blocking on failed deps) not yet implemented"]

Repro (from the test)

  1. Create parent job: #!/bin/bash\nexit 1.
  2. Create child job with depends_on: [{action: parent_id, dep_type: requires}].
  3. Wait for parent to reach failed.
  4. Today: child's eventual phase is undefined / runs anyway.
  5. Expected: child phase is waiting → cancelled or waiting → failed, never succeeded, and the child's script does NOT execute.

Acceptance

  • A child job with a requires dependency on a parent that reaches failed is itself transitioned to failed (or cancelled) without running its script.
  • Reason field set to something like dependency <id> failed.
  • Existing test test_job_failed_dependency_blocks is un-ignored and passes.
  • No regression on test_job_graph (the success-path equivalent) or other jobs tests.

Notes / scope

  • This is a correctness bug, not a feature. Today's behavior silently violates the user's stated dependency intent.
  • Probably lives in crates/hero_proc_server/src/supervisor/ — wherever the dep-resolution / scheduling decision is made when a parent transitions to a terminal phase.
  • dep_type is requires for now; if/when other dep types exist (e.g. after) decide their semantics separately.

Follow-up to #56.

## Context A job whose dependency has reached `failed` should not run — it should be marked `failed` or `cancelled`, not wait forever. Today the supervisor's job dependency code does not enforce this: a child whose `depends_on` parent failed will still execute (or hang in `waiting`). A test already exists but is `#[ignore]`d: - `tests/integration/tests/jobs.rs:297` — `test_job_failed_dependency_blocks` — `#[ignore = "job dependency enforcement (blocking on failed deps) not yet implemented"]` ## Repro (from the test) 1. Create parent job: `#!/bin/bash\nexit 1`. 2. Create child job with `depends_on: [{action: parent_id, dep_type: requires}]`. 3. Wait for parent to reach `failed`. 4. Today: child's eventual phase is undefined / runs anyway. 5. Expected: child phase is `waiting → cancelled` or `waiting → failed`, never `succeeded`, and the child's script does NOT execute. ## Acceptance - A child job with a `requires` dependency on a parent that reaches `failed` is itself transitioned to `failed` (or `cancelled`) without running its script. - Reason field set to something like `dependency <id> failed`. - Existing test `test_job_failed_dependency_blocks` is un-ignored and passes. - No regression on `test_job_graph` (the success-path equivalent) or other jobs tests. ## Notes / scope - This is a correctness bug, not a feature. Today's behavior silently violates the user's stated dependency intent. - Probably lives in `crates/hero_proc_server/src/supervisor/` — wherever the dep-resolution / scheduling decision is made when a parent transitions to a terminal phase. - `dep_type` is `requires` for now; if/when other dep types exist (e.g. `after`) decide their semantics separately. Follow-up to #56.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_proc#64
No description provided.