Services tab: Status column shows desired state, not actual runtime state #33

Closed
opened 2026-03-31 11:04:00 +00:00 by timur · 7 comments
Owner

Problem

The Services tab Status column displays ServiceWantedStatus (the configured desired state: start, stop, ignore, spec) rather than the actual runtime state of the service.

This means every service whose config says "I want to be running" shows start in the Status column, regardless of whether the service is actually running, has failed, or has never been started.

Current behavior

All services show start in the Status column because that is their ServiceWantedStatus — the intent, not the reality.

Expected behavior

The Status column should show the actual runtime state: running, success, failed, exited, or inactive. The service.status RPC already computes this by checking running jobs and last terminal state — it just isn't used in the list view.

Additional context

See comment below for a full terminology guide and implementation notes.

## Problem The Services tab Status column displays `ServiceWantedStatus` (the configured desired state: `start`, `stop`, `ignore`, `spec`) rather than the actual runtime state of the service. This means every service whose config says "I want to be running" shows **start** in the Status column, regardless of whether the service is actually running, has failed, or has never been started. ## Current behavior All services show `start` in the Status column because that is their `ServiceWantedStatus` — the intent, not the reality. ## Expected behavior The Status column should show the **actual runtime state**: `running`, `success`, `failed`, `exited`, or `inactive`. The `service.status` RPC already computes this by checking running jobs and last terminal state — it just isn't used in the list view. ## Additional context See comment below for a full terminology guide and implementation notes.
Author
Owner

hero_proc Terminology Guide

Before diving into the fix, here's how the core concepts relate to each other:

Concepts (bottom-up)

Concept What it is Analogy
Action A script + interpreter + config (timeout, env, deps, health checks). Stored as a spec/template. A recipe
Job A single execution of an Action. Has a phase lifecycle: pending → running → succeeded/failed. Tracks PID, exit code, logs, resource usage. Cooking a dish from the recipe
Run Groups multiple Jobs into one coordinated operation. Tracks overall status: starting → running → ok/error. Created when a service starts. A meal (multiple dishes prepared together)
Service A named container that references one or more Actions and declares a desired state (start/stop/ignore/spec). The supervisor ensures reality matches the desired state. A restaurant order — "keep these dishes on the table"

Flow

User creates Service (desired state = "start", actions = ["server_action"])
  → Supervisor sees desired=start, actual=inactive
  → Supervisor calls service.start
    → Creates a Run
    → Creates a Job per action
    → Supervisor executes each Job (spawns processes)
    → Job phase: pending → running → succeeded/failed
    → Run status: starting → running → ok/error

Quick Services

A quick_service is a convenience API that atomically creates:

  • An Action named {service_name}_action
  • A Service named {service_name} referencing that action

This is why the Actions tab showed _action suffixed entries (fixed in #32).


The Status Bug

Root cause

In dashboard.js renderServices() (line ~2008):

'<td>' + stateBadge(spec.status || 'start') + '</td>'

spec.status is ServiceWantedStatus — the configured desired state, not what the service is actually doing. Every service with wanted=start shows the same "start" badge.

What already exists

The service.status RPC already computes the actual runtime state:

let running = service_running_jobs(db, context, &name);
let state = if !running.is_empty() { "running" }
            else { service_last_terminal_state(db, context, &name) };
// Returns: "running", "success", "failed", "exited", or "inactive"

This is called when viewing service details, but not in the list view.

Proposed fix

Option A — Show actual status in the list (recommended)
After loading service specs, batch-fetch service.status for each service and display the actual runtime state in the Status column. Rename the current column to show both:

  • Status: actual state badge (running/failed/inactive)
  • Desired: wanted state in smaller text or tooltip

Option B — Add a separate column
Keep "Desired" column as-is, add a new "State" column showing actual runtime status.


Questions

  1. Option A or B? Should we replace the Status column with actual runtime state, or add a second column?
  2. Performance: Fetching service.status for each service requires N extra RPC calls (one per service). Should we add a bulk service.status_all endpoint instead?
  3. Is a Service a workflow? Currently a Service is closer to a systemd unit — it declares a desired state and the supervisor enforces it. It's not a workflow in the DAG/pipeline sense (that's closer to a Run with dependent Jobs). Should the mental model change?
## hero_proc Terminology Guide Before diving into the fix, here's how the core concepts relate to each other: ### Concepts (bottom-up) | Concept | What it is | Analogy | |---------|-----------|--------| | **Action** | A script + interpreter + config (timeout, env, deps, health checks). Stored as a spec/template. | A recipe | | **Job** | A single execution of an Action. Has a phase lifecycle: `pending → running → succeeded/failed`. Tracks PID, exit code, logs, resource usage. | Cooking a dish from the recipe | | **Run** | Groups multiple Jobs into one coordinated operation. Tracks overall status: `starting → running → ok/error`. Created when a service starts. | A meal (multiple dishes prepared together) | | **Service** | A named container that references one or more Actions and declares a **desired state** (`start`/`stop`/`ignore`/`spec`). The supervisor ensures reality matches the desired state. | A restaurant order — "keep these dishes on the table" | ### Flow ``` User creates Service (desired state = "start", actions = ["server_action"]) → Supervisor sees desired=start, actual=inactive → Supervisor calls service.start → Creates a Run → Creates a Job per action → Supervisor executes each Job (spawns processes) → Job phase: pending → running → succeeded/failed → Run status: starting → running → ok/error ``` ### Quick Services A **quick_service** is a convenience API that atomically creates: - An Action named `{service_name}_action` - A Service named `{service_name}` referencing that action This is why the Actions tab showed `_action` suffixed entries (fixed in #32). --- ## The Status Bug ### Root cause In `dashboard.js` `renderServices()` (line ~2008): ```javascript '<td>' + stateBadge(spec.status || 'start') + '</td>' ``` `spec.status` is `ServiceWantedStatus` — the **configured desired state**, not what the service is actually doing. Every service with `wanted=start` shows the same "start" badge. ### What already exists The `service.status` RPC **already computes the actual runtime state**: ```rust let running = service_running_jobs(db, context, &name); let state = if !running.is_empty() { "running" } else { service_last_terminal_state(db, context, &name) }; // Returns: "running", "success", "failed", "exited", or "inactive" ``` This is called when viewing service details, but **not** in the list view. ### Proposed fix **Option A — Show actual status in the list (recommended)** After loading service specs, batch-fetch `service.status` for each service and display the actual runtime state in the Status column. Rename the current column to show both: - **Status**: actual state badge (`running`/`failed`/`inactive`) - **Desired**: wanted state in smaller text or tooltip **Option B — Add a separate column** Keep "Desired" column as-is, add a new "State" column showing actual runtime status. --- ## Questions 1. **Option A or B?** Should we replace the Status column with actual runtime state, or add a second column? 2. **Performance**: Fetching `service.status` for each service requires N extra RPC calls (one per service). Should we add a bulk `service.status_all` endpoint instead? 3. **Is a Service a workflow?** Currently a Service is closer to a systemd unit — it declares a desired state and the supervisor enforces it. It's not a workflow in the DAG/pipeline sense (that's closer to a Run with dependent Jobs). Should the mental model change?
Author
Owner
  • well, if a service isnt a workflow, doesnt it define one? because after all isnt a run a played instance of a workflow, which the service defines by defining steps that depent and come after? lets explore this more in depth and respond

  • also see #29, here we created rhai scripts for service lifetimes and also in hero_proc examples. lets make sure these rhai scripts properly use and program against hero_proc by defining services and instantiating runs, not individual actions and instantiating jobs. this brings me to my next concern: the runs tab currently displays a bunch of runs with only 1 job, 1 per _server and 1 per _ui. however if this was a service defined with deps as one action depention on another, it wouldnt be this way, so perhaps rhai scripts may be wrong.

  • go with option a with fix,

- well, if a service isnt a workflow, doesnt it define one? because after all isnt a run a played instance of a workflow, which the service defines by defining steps that depent and come after? lets explore this more in depth and respond - also see https://forge.ourworld.tf/lhumina_code/hero_proc/issues/29, here we created rhai scripts for service lifetimes and also in hero_proc examples. lets make sure these rhai scripts properly use and program against hero_proc by defining services and instantiating runs, not individual actions and instantiating jobs. this brings me to my next concern: the runs tab currently displays a bunch of runs with only 1 job, 1 per _server and 1 per _ui. however if this was a service defined with deps as one action depention on another, it wouldnt be this way, so perhaps rhai scripts may be wrong. - go with option a with fix,
Author
Owner

Does a Service define a workflow?

Yes — a Service is a workflow definition, and a Run is a played instance of it.

The data model already supports this:

Service "my-app"
  actions: ["build", "migrate", "server", "healthcheck"]
  dependencies: { requires: ["postgres"] }  ← inter-service deps

Action "build"       → one-shot, no deps
Action "migrate"     → one-shot, depends_on: ["build"] (requires)
Action "server"      → process, depends_on: ["migrate"] (requires)
Action "healthcheck" → one-shot, depends_on: ["server"] (after)

When service.start("my-app") is called:

  1. A Run is created (the workflow instance)
  2. A Job is created per action (4 jobs in one run)
  3. The supervisor respects depends_on edges → build runs first, then migrate, then server, then healthcheck
  4. All 4 jobs live in the same Run, status reflects aggregate progress

So the hierarchy is:

  • Service = workflow definition (actions + deps + desired state)
  • Run = workflow execution instance
  • Job = single step execution within the workflow
  • Action = step template (reusable across services)

The Rhai script problem

The current example scripts use quick_service_set_full() which creates one action per service — this bypasses the workflow model entirely:

// Current pattern (from hero_runner install_and_run.rhai):
for svc in ["hero_runner_server", "hero_runner_ui"] {
    proc.quick_service_set_full(svc, HERO_BIN + "/" + svc, "exec", "");
    proc.quick_service_start(svc);
}
// Result: 2 separate services, 2 separate runs, each with 1 job

This is why the Runs tab shows many runs with only 1 job each.

What the scripts should do instead

Using the full Service API, we can define hero_runner as a single service with 2 actions where the UI depends on the server:

// Define actions
let server = proc_process_action("hero_runner_server", HERO_BIN + "/hero_runner_server")
    .interpreter("exec");

let ui = proc_process_action("hero_runner_ui", HERO_BIN + "/hero_runner_ui")
    .interpreter("exec")
    .depends_on("hero_runner_server");  // UI waits for server

proc.action_set(server);
proc.action_set(ui);

// Register as one service with both actions
proc.service_register("hero_runner", ["hero_runner_server", "hero_runner_ui"]);
proc.service_start("hero_runner");
// Result: 1 service, 1 run, 2 jobs with dependency ordering

This gives:

  • 1 service (hero_runner) instead of 2
  • 1 run with 2 jobs instead of 2 runs with 1 job each
  • Proper dependency: UI starts only after server is running
  • Correct workflow semantics

However, this is a larger change (issue #29 Rhai migration scripts + the quick_service convenience API would need to be reconsidered). I'll track that separately.


Implementing Option A now

I'll now implement the Status column fix: replace ServiceWantedStatus with actual runtime state from service.status RPC. The approach:

  1. After loading services, batch-fetch service.status for each one
  2. Cache the actual states alongside job counts
  3. Display actual state (running/failed/inactive/etc.) in the Status column
  4. Show wanted state as a smaller indicator or tooltip
## Does a Service define a workflow? Yes — a Service **is** a workflow definition, and a Run **is** a played instance of it. The data model already supports this: ``` Service "my-app" actions: ["build", "migrate", "server", "healthcheck"] dependencies: { requires: ["postgres"] } ← inter-service deps Action "build" → one-shot, no deps Action "migrate" → one-shot, depends_on: ["build"] (requires) Action "server" → process, depends_on: ["migrate"] (requires) Action "healthcheck" → one-shot, depends_on: ["server"] (after) ``` When `service.start("my-app")` is called: 1. A **Run** is created (the workflow instance) 2. A **Job** is created per action (4 jobs in one run) 3. The supervisor respects `depends_on` edges → `build` runs first, then `migrate`, then `server`, then `healthcheck` 4. All 4 jobs live in the same Run, status reflects aggregate progress So the hierarchy is: - **Service** = workflow definition (actions + deps + desired state) - **Run** = workflow execution instance - **Job** = single step execution within the workflow - **Action** = step template (reusable across services) --- ## The Rhai script problem The current example scripts use `quick_service_set_full()` which creates **one action per service** — this bypasses the workflow model entirely: ```rhai // Current pattern (from hero_runner install_and_run.rhai): for svc in ["hero_runner_server", "hero_runner_ui"] { proc.quick_service_set_full(svc, HERO_BIN + "/" + svc, "exec", ""); proc.quick_service_start(svc); } // Result: 2 separate services, 2 separate runs, each with 1 job ``` This is why the Runs tab shows many runs with only 1 job each. ### What the scripts should do instead Using the full Service API, we can define `hero_runner` as a single service with 2 actions where the UI depends on the server: ```rhai // Define actions let server = proc_process_action("hero_runner_server", HERO_BIN + "/hero_runner_server") .interpreter("exec"); let ui = proc_process_action("hero_runner_ui", HERO_BIN + "/hero_runner_ui") .interpreter("exec") .depends_on("hero_runner_server"); // UI waits for server proc.action_set(server); proc.action_set(ui); // Register as one service with both actions proc.service_register("hero_runner", ["hero_runner_server", "hero_runner_ui"]); proc.service_start("hero_runner"); // Result: 1 service, 1 run, 2 jobs with dependency ordering ``` This gives: - 1 service (`hero_runner`) instead of 2 - 1 run with 2 jobs instead of 2 runs with 1 job each - Proper dependency: UI starts only after server is running - Correct workflow semantics **However**, this is a larger change (issue #29 Rhai migration scripts + the `quick_service` convenience API would need to be reconsidered). I'll track that separately. --- ## Implementing Option A now I'll now implement the Status column fix: replace `ServiceWantedStatus` with actual runtime state from `service.status` RPC. The approach: 1. After loading services, batch-fetch `service.status` for each one 2. Cache the actual states alongside job counts 3. Display actual state (`running`/`failed`/`inactive`/etc.) in the Status column 4. Show wanted state as a smaller indicator or tooltip
Author
Owner

Fixed in commit a0adc01 (Option A).

Changes (all in dashboard.js):

  • loadServices() now batch-fetches service.status for each service in parallel alongside job counts
  • renderServices() displays the actual runtime state (running, failed, inactive, success, exited) in the Status column instead of ServiceWantedStatus (start)
  • Control buttons (Start/Stop/Restart/Kill) now react to actual state — running services show Stop/Restart/Kill, inactive services show Start

Remaining from this discussion:

  • Rhai scripts should be refactored to use proper service+action model with dependencies instead of individual quick_service_set_full() calls — tracked in #29
  • Workflow object concept — separate issue to follow
Fixed in commit `a0adc01` (Option A). **Changes** (all in `dashboard.js`): - `loadServices()` now batch-fetches `service.status` for each service in parallel alongside job counts - `renderServices()` displays the **actual runtime state** (`running`, `failed`, `inactive`, `success`, `exited`) in the Status column instead of `ServiceWantedStatus` (`start`) - Control buttons (Start/Stop/Restart/Kill) now react to actual state — running services show Stop/Restart/Kill, inactive services show Start Remaining from this discussion: - Rhai scripts should be refactored to use proper service+action model with dependencies instead of individual `quick_service_set_full()` calls — tracked in #29 - Workflow object concept — separate issue to follow
Author
Owner

Architecture decision: No Workflow object needed

After discussion, we concluded that Rhai scripts are the workflow layer — hero_proc doesn't need a first-class Workflow object.

The model

Concept Role Lifetime
Rhai script Workflow — orchestrates clone, build, install, register, start Finite, runs to completion
Action Executable template (script + interpreter + config) Stored, reusable
Service Supervision unit — desired state + auto-restart (like systemd) Ongoing, supervisor-managed
Run / Job Execution tracking records Transient

Why no Workflow object

  • Rhai scripts already define DAG logic: sequential steps, conditionals, error handling, waits
  • Services are supervision units, not orchestration units — they keep processes alive, they don't run pipelines
  • depends_on on ActionSpec handles the narrow case of intra-service ordering (e.g., start server before UI)
  • Broader orchestration (build pipelines, multi-service deployment) lives in Rhai scripts

What this means for #29

The Rhai scripts in examples/rhai/ should:

  1. Use proc_process_action() + proc.action_set() to define actions with proper depends_on edges
  2. Use proc.service_register() to group related actions under one service
  3. Let the Rhai script itself handle the build/install workflow (which it already does)
  4. Let the service handle ongoing supervision (which it already does)

No new data model changes required — just better use of the existing APIs in the Rhai scripts.

## Architecture decision: No Workflow object needed After discussion, we concluded that **Rhai scripts are the workflow layer** — hero_proc doesn't need a first-class Workflow object. ### The model | Concept | Role | Lifetime | |---------|------|----------| | **Rhai script** | Workflow — orchestrates clone, build, install, register, start | Finite, runs to completion | | **Action** | Executable template (script + interpreter + config) | Stored, reusable | | **Service** | Supervision unit — desired state + auto-restart (like systemd) | Ongoing, supervisor-managed | | **Run / Job** | Execution tracking records | Transient | ### Why no Workflow object - Rhai scripts already define DAG logic: sequential steps, conditionals, error handling, waits - Services are **supervision units**, not orchestration units — they keep processes alive, they don't run pipelines - `depends_on` on ActionSpec handles the narrow case of intra-service ordering (e.g., start server before UI) - Broader orchestration (build pipelines, multi-service deployment) lives in Rhai scripts ### What this means for #29 The Rhai scripts in `examples/rhai/` should: 1. Use `proc_process_action()` + `proc.action_set()` to define actions with proper `depends_on` edges 2. Use `proc.service_register()` to group related actions under one service 3. Let the Rhai script itself handle the build/install workflow (which it already does) 4. Let the service handle ongoing supervision (which it already does) No new data model changes required — just better use of the existing APIs in the Rhai scripts.
Author
Owner

Documented the architecture model in README Core Concepts section (commit 0d3b9b7). Concise table + explanation of each concept and the Rhai-as-workflow pattern.

Documented the architecture model in README Core Concepts section (commit `0d3b9b7`). Concise table + explanation of each concept and the Rhai-as-workflow pattern.
Author
Owner

Resolved

Services tab now shows actual runtime state (running/failed/inactive) instead of desired state (start/stop).

  • Fix: loadServices() batch-fetches service.status for each service — commit a0adc01
  • Architecture documented in README: Services = systemd-like supervision units, Rhai scripts = workflow layer — commit 0d3b9b7
  • All Rhai scripts updated to use proper service model (action_set() + service_register()) — commit 6c59561

Closing.

## Resolved Services tab now shows actual runtime state (running/failed/inactive) instead of desired state (start/stop). - Fix: `loadServices()` batch-fetches `service.status` for each service — commit `a0adc01` - Architecture documented in README: Services = systemd-like supervision units, Rhai scripts = workflow layer — commit `0d3b9b7` - All Rhai scripts updated to use proper service model (`action_set()` + `service_register()`) — commit `6c59561` Closing.
timur closed this issue 2026-04-01 10:35:24 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_proc#33
No description provided.