now get required jobs to autostart when we restart hero_proc #23

Closed
opened 2026-03-21 16:30:52 +00:00 by despiegk · 4 comments
Owner

see also #22

when we restart hero_proc it needs to check which of the jobs are coming from action which is is_process
these need to be autostarted, we ofc do health checks and all these things

also check in UI (can also test using browser mcp)
if we delete a job, will it do a stop first?

maybe we need to popup a modal where we show logs how we stop the still running jobs, only needed if its jobs which are running

see also https://forge.ourworld.tf/lhumina_code/hero_proc/issues/22 when we restart hero_proc it needs to check which of the jobs are coming from action which is is_process these need to be autostarted, we ofc do health checks and all these things also check in UI (can also test using browser mcp) if we delete a job, will it do a stop first? maybe we need to popup a modal where we show logs how we stop the still running jobs, only needed if its jobs which are running
Author
Owner

Implementation Spec for Issue #23 — Autostart is_process Jobs on Restart & Safe Job Deletion

Objective

When hero_proc_server restarts, it must automatically restart any jobs that came from services whose actions have is_process = true — these are long-running processes that should always be running. Additionally, when a user deletes a running job from the UI, it must be stopped first, and if the job is actively running a process, a modal showing stop progress/logs should appear.

Requirements

  • On hero_proc_server startup, after recover_running_jobs(), query all jobs that are in a terminal phase (Failed, Cancelled) and have is_process = true and have a non-null service_id and action_id, and the referenced service still exists in the DB → auto-create a new Pending job for each such action
  • Autostart must only create one job per service+action pair (no duplicates if a Pending/Running job already exists)
  • Autostart must log which jobs it created
  • When a user deletes a running job in the UI, show a confirmation modal warning the process will be stopped first
  • For is_process running jobs, show an extra warning that this is a long-running service process
  • After confirmation, call job.cancel first, then job.delete after cancel succeeds
  • Bulk delete of jobs should apply the same stop-first logic for running jobs

Files to Modify

File Purpose
crates/hero_proc_lib/src/db/jobs/model.rs Add list_is_process_terminal_jobs() raw SQL function
crates/hero_proc_lib/src/db/factory.rs Add list_process_jobs_needing_restart() on JobsApi
crates/hero_proc_server/src/supervisor/mod.rs Add autostart_process_jobs() called after recover_running_jobs() in run()
crates/hero_proc_ui/static/js/dashboard.js Update deleteJob(), bulkDeleteJobs(), deleteJobFromModal() to stop-then-delete

Implementation Plan

Step 1 — DB query for is_process terminal jobs

File: crates/hero_proc_lib/src/db/jobs/model.rs and factory.rs

  • Add list_is_process_terminal_jobs() returning jobs where is_process=1 AND phase IN ('failed','cancelled') AND service/action IDs are set
  • Add list_process_jobs_needing_restart() on JobsApi following the running_pids() pattern
    Dependencies: none

Step 2 — Implement autostart_process_jobs() in Supervisor

File: crates/hero_proc_server/src/supervisor/mod.rs

  • Add async fn autostart_process_jobs(&self) that queries terminal is_process jobs, checks for existing non-terminal jobs for same service+action, verifies service/action still exist, creates new Pending job and logs it
  • Wire into run() after recover_running_jobs().await
    Dependencies: Step 1

Step 3 — UI stop-then-delete for running jobs

File: crates/hero_proc_ui/static/js/dashboard.js

  • Refactor deleteJob(id) to fetch job, detect running phase, show appropriate warning modal, call job.cancel then job.delete
  • Add stopAndDeleteJob(id, job) helper
  • Update bulkDeleteJobs() to count and warn about running jobs, stop them first
  • Update deleteJobFromModal() to use same flow
    Dependencies: none (independent of Steps 1-2)

Acceptance Criteria

  • On restart, all is_process jobs in failed/cancelled phase are auto-queued as Pending (services/actions still must exist)
  • No duplicate Pending jobs created
  • Server logs an info message per autostarted job
  • UI: deleting a running job shows a warning modal with stop-first confirmation
  • UI: is_process running jobs get extra warning message
  • UI: bulk delete warns about and stops running jobs first
  • All existing tests pass

Notes

  • Duplicate prevention is critical — check for existing non-terminal jobs before creating new Pending jobs
  • For job creation pattern, follow handle_start in rpc/service.rs
  • force: true in job.delete remains for safety even after cancel
  • Health check runtime evaluation is out of scope for this issue
# Implementation Spec for Issue #23 — Autostart `is_process` Jobs on Restart & Safe Job Deletion ## Objective When `hero_proc_server` restarts, it must automatically restart any jobs that came from services whose actions have `is_process = true` — these are long-running processes that should always be running. Additionally, when a user deletes a running job from the UI, it must be stopped first, and if the job is actively running a process, a modal showing stop progress/logs should appear. ## Requirements - On `hero_proc_server` startup, after `recover_running_jobs()`, query all jobs that are in a terminal phase (`Failed`, `Cancelled`) and have `is_process = true` and have a non-null `service_id` and `action_id`, and the referenced service still exists in the DB → auto-create a new `Pending` job for each such action - Autostart must only create one job per service+action pair (no duplicates if a `Pending`/`Running` job already exists) - Autostart must log which jobs it created - When a user deletes a running job in the UI, show a confirmation modal warning the process will be stopped first - For `is_process` running jobs, show an extra warning that this is a long-running service process - After confirmation, call `job.cancel` first, then `job.delete` after cancel succeeds - Bulk delete of jobs should apply the same stop-first logic for running jobs ## Files to Modify | File | Purpose | |------|---------| | `crates/hero_proc_lib/src/db/jobs/model.rs` | Add `list_is_process_terminal_jobs()` raw SQL function | | `crates/hero_proc_lib/src/db/factory.rs` | Add `list_process_jobs_needing_restart()` on `JobsApi` | | `crates/hero_proc_server/src/supervisor/mod.rs` | Add `autostart_process_jobs()` called after `recover_running_jobs()` in `run()` | | `crates/hero_proc_ui/static/js/dashboard.js` | Update `deleteJob()`, `bulkDeleteJobs()`, `deleteJobFromModal()` to stop-then-delete | ## Implementation Plan ### Step 1 — DB query for is_process terminal jobs File: `crates/hero_proc_lib/src/db/jobs/model.rs` and `factory.rs` - Add `list_is_process_terminal_jobs()` returning jobs where `is_process=1` AND `phase IN ('failed','cancelled')` AND service/action IDs are set - Add `list_process_jobs_needing_restart()` on `JobsApi` following the `running_pids()` pattern Dependencies: none ### Step 2 — Implement `autostart_process_jobs()` in Supervisor File: `crates/hero_proc_server/src/supervisor/mod.rs` - Add `async fn autostart_process_jobs(&self)` that queries terminal is_process jobs, checks for existing non-terminal jobs for same service+action, verifies service/action still exist, creates new `Pending` job and logs it - Wire into `run()` after `recover_running_jobs().await` Dependencies: Step 1 ### Step 3 — UI stop-then-delete for running jobs File: `crates/hero_proc_ui/static/js/dashboard.js` - Refactor `deleteJob(id)` to fetch job, detect running phase, show appropriate warning modal, call `job.cancel` then `job.delete` - Add `stopAndDeleteJob(id, job)` helper - Update `bulkDeleteJobs()` to count and warn about running jobs, stop them first - Update `deleteJobFromModal()` to use same flow Dependencies: none (independent of Steps 1-2) ## Acceptance Criteria - [ ] On restart, all is_process jobs in failed/cancelled phase are auto-queued as Pending (services/actions still must exist) - [ ] No duplicate Pending jobs created - [ ] Server logs an info message per autostarted job - [ ] UI: deleting a running job shows a warning modal with stop-first confirmation - [ ] UI: is_process running jobs get extra warning message - [ ] UI: bulk delete warns about and stops running jobs first - [ ] All existing tests pass ## Notes - Duplicate prevention is critical — check for existing non-terminal jobs before creating new Pending jobs - For job creation pattern, follow `handle_start` in `rpc/service.rs` - `force: true` in `job.delete` remains for safety even after cancel - Health check runtime evaluation is out of scope for this issue
Author
Owner

Test Results

  • Status: FAIL
  • Passed: 0
  • Failed: 5 compile errors (build aborted before tests ran)

Failure Details

The workspace build failed in tests/integration/tests/hero_script.rs due to 5 compiler errors. No test binaries were produced and no tests ran.

File: tests/integration/tests/hero_script.rs

Error Location Description
E0560 line 51 hero_proc_sdk::JobLogsInput has no field attempt
E0609 line 53 No field lines on JobLogsOutput — available field is value
E0433 line 93 Use of undeclared type Interpreter
E0433 line 172 Use of undeclared type Interpreter
E0433 line 217 Use of undeclared type Interpreter

Root Cause

The integration test file tests/integration/tests/hero_script.rs references SDK types/fields that have since been renamed or restructured:

  • JobLogsInput.attempt no longer exists
  • JobLogsOutput.lines was renamed to .value
  • Interpreter type is not imported/declared in scope

The test file needs to be updated to match the current hero_proc_sdk API.

## Test Results - **Status:** FAIL - **Passed:** 0 - **Failed:** 5 compile errors (build aborted before tests ran) ### Failure Details The workspace build failed in `tests/integration/tests/hero_script.rs` due to 5 compiler errors. No test binaries were produced and no tests ran. **File:** `tests/integration/tests/hero_script.rs` | Error | Location | Description | |-------|----------|-------------| | E0560 | line 51 | `hero_proc_sdk::JobLogsInput` has no field `attempt` | | E0609 | line 53 | No field `lines` on `JobLogsOutput` — available field is `value` | | E0433 | line 93 | Use of undeclared type `Interpreter` | | E0433 | line 172 | Use of undeclared type `Interpreter` | | E0433 | line 217 | Use of undeclared type `Interpreter` | ### Root Cause The integration test file `tests/integration/tests/hero_script.rs` references SDK types/fields that have since been renamed or restructured: - `JobLogsInput.attempt` no longer exists - `JobLogsOutput.lines` was renamed to `.value` - `Interpreter` type is not imported/declared in scope The test file needs to be updated to match the current `hero_proc_sdk` API.
Author
Owner

Implementation Complete

Changes Made

crates/hero_proc_lib/src/db/jobs/model.rs

  • Added list_is_process_terminal_jobs() — raw SQL query returning all is_process=1 jobs in failed/cancelled phase with service_id and action_id set

crates/hero_proc_lib/src/db/factory.rs

  • Added JobsApi::list_process_jobs_needing_restart() — delegates to the new model function through the mutex-guarded connection (same pattern as running_pids())

crates/hero_proc_server/src/supervisor/mod.rs

  • Added autostart_process_jobs() async method on Supervisor:
    • Queries terminal is_process jobs
    • Duplicate check: skips if a non-terminal job already exists for the same service_id + action_id
    • Service/action existence check: skips if either was deleted
    • Creates a new Pending job copying context_name, action, description, is_process, service_id, action_id, spec, and script
    • Logs each autostarted job via tracing::info!
  • Wired into run() after recover_running_jobs().await

crates/hero_proc_ui/static/js/dashboard.js

  • Added stopAndDeleteJob(id, job) async helper — detects running phase, shows appropriate warning modal (extra warning for is_process jobs), calls job.cancel then waits 500ms, then job.delete
  • Refactored deleteJob(id) — fetches job, delegates to stopAndDeleteJob
  • Updated bulkDeleteJobs() — counts running jobs in selection, warns in confirm modal, stops them before deleting
  • Updated deleteJobFromModal(id) — uses cachedJobs to avoid redundant fetch, delegates to stopAndDeleteJob

Test Results

  • hero_proc_lib and hero_proc_server packages: all tests pass
  • tests/integration package: ⚠️ pre-existing compile errors in hero_script.rs (unrelated to this change — JobLogsInput/JobLogsOutput field names out of sync with SDK)

Acceptance Criteria

  • On restart, is_process jobs in failed/cancelled phase are auto-queued as Pending
  • No duplicate Pending jobs created
  • Server logs an info message per autostarted job
  • UI: deleting a running job shows a warning modal with stop-first confirmation
  • UI: is_process running jobs get extra warning message
  • UI: bulk delete warns about and stops running jobs first
  • hero_proc_lib and hero_proc_server tests pass
## Implementation Complete ### Changes Made **`crates/hero_proc_lib/src/db/jobs/model.rs`** - Added `list_is_process_terminal_jobs()` — raw SQL query returning all `is_process=1` jobs in `failed`/`cancelled` phase with `service_id` and `action_id` set **`crates/hero_proc_lib/src/db/factory.rs`** - Added `JobsApi::list_process_jobs_needing_restart()` — delegates to the new model function through the mutex-guarded connection (same pattern as `running_pids()`) **`crates/hero_proc_server/src/supervisor/mod.rs`** - Added `autostart_process_jobs()` async method on `Supervisor`: - Queries terminal `is_process` jobs - Duplicate check: skips if a non-terminal job already exists for the same `service_id` + `action_id` - Service/action existence check: skips if either was deleted - Creates a new `Pending` job copying `context_name`, `action`, `description`, `is_process`, `service_id`, `action_id`, `spec`, and `script` - Logs each autostarted job via `tracing::info!` - Wired into `run()` after `recover_running_jobs().await` **`crates/hero_proc_ui/static/js/dashboard.js`** - Added `stopAndDeleteJob(id, job)` async helper — detects running phase, shows appropriate warning modal (extra warning for `is_process` jobs), calls `job.cancel` then waits 500ms, then `job.delete` - Refactored `deleteJob(id)` — fetches job, delegates to `stopAndDeleteJob` - Updated `bulkDeleteJobs()` — counts running jobs in selection, warns in confirm modal, stops them before deleting - Updated `deleteJobFromModal(id)` — uses `cachedJobs` to avoid redundant fetch, delegates to `stopAndDeleteJob` ### Test Results - `hero_proc_lib` and `hero_proc_server` packages: ✅ all tests pass - `tests/integration` package: ⚠️ pre-existing compile errors in `hero_script.rs` (unrelated to this change — `JobLogsInput`/`JobLogsOutput` field names out of sync with SDK) ### Acceptance Criteria - [x] On restart, `is_process` jobs in `failed`/`cancelled` phase are auto-queued as `Pending` - [x] No duplicate `Pending` jobs created - [x] Server logs an `info` message per autostarted job - [x] UI: deleting a running job shows a warning modal with stop-first confirmation - [x] UI: `is_process` running jobs get extra warning message - [x] UI: bulk delete warns about and stops running jobs first - [x] `hero_proc_lib` and `hero_proc_server` tests pass
Author
Owner

Implementation committed: bf4235c

Browse: bf4235c

Implementation committed: `bf4235c` Browse: https://forge.ourworld.tf/lhumina_code/hero_proc/commit/bf4235c
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_proc#23
No description provided.