deck.generateAsync: phase 'failed' on partial success #39

Closed
opened 2026-04-29 11:24:19 +00:00 by salmaelsoly · 3 comments
Member

Summary

The async deck-generation job is reported as phase: "failed", exit_code: 1 even when most slides succeeded and a PDF was produced. A user reading only the job status concludes nothing happened, when in reality 3 of 4 (or 4 of 5) slides + the PDF are sitting on disk.

Reproduction

  1. Generate a 5-slide deck where the AI image model returns no image for one slide (this happens regularly for ~1 slide per run; see hero_slides#TBD).
  2. Wait for the job to finish.
  3. Check the Jobs tab: phase: "failed", exit_code: 1.
  4. Check the deck: 4 slides have generated images, the PDF exists, has_pdf: true.

Root cause

crates/hero_slides_server/src/generate_job.rs:34-42 returns the phase / exit_code straight from hero_proc. The child process exits with a non-zero code if any single slide gen fails, and there's no concept of "partial" success at this layer.

Suggested fix

Two options (not mutually exclusive):

  1. In the child generation process: track per-slide outcomes, and exit 0 if at least one slide succeeded AND the PDF was produced; exit non-zero only on total failure (no slides generated, or PDF build crashed). This way exit_code: 0 corresponds to user-visible success.
  2. In the job-status reporting: introduce a partial phase. After the job ends, count the per-slide outputs (or read a per-slide result manifest the child writes). Surface phase: "partial" with a succeeded: N, failed: M, has_pdf: true|false summary.

Option 2 needs the child to emit machine-readable per-slide results. Option 1 is simpler but loses signal on which slide failed.

Severity

Medium. Functionality works; the status is just misleading. Pairs with the dashboard issue where the slide for which gen failed is silently shown as "Pending" with no error label (see issue #37, sub-bug #2).

## Summary The async deck-generation job is reported as `phase: "failed"`, `exit_code: 1` even when most slides succeeded and a PDF was produced. A user reading only the job status concludes nothing happened, when in reality 3 of 4 (or 4 of 5) slides + the PDF are sitting on disk. ## Reproduction 1. Generate a 5-slide deck where the AI image model returns no image for one slide (this happens regularly for ~1 slide per run; see hero_slides#TBD). 2. Wait for the job to finish. 3. Check the Jobs tab: `phase: "failed"`, `exit_code: 1`. 4. Check the deck: 4 slides have generated images, the PDF exists, `has_pdf: true`. ## Root cause [crates/hero_slides_server/src/generate_job.rs:34-42](https://forge.ourworld.tf/lhumina_code/hero_slides/src/branch/development/crates/hero_slides_server/src/generate_job.rs#L34-L42) returns the phase / exit_code straight from hero_proc. The child process exits with a non-zero code if any single slide gen fails, and there's no concept of "partial" success at this layer. ## Suggested fix Two options (not mutually exclusive): 1. **In the child generation process**: track per-slide outcomes, and exit 0 if at least one slide succeeded AND the PDF was produced; exit non-zero only on total failure (no slides generated, or PDF build crashed). This way `exit_code: 0` corresponds to user-visible success. 2. **In the job-status reporting**: introduce a `partial` phase. After the job ends, count the per-slide outputs (or read a per-slide result manifest the child writes). Surface `phase: "partial"` with a `succeeded: N, failed: M, has_pdf: true|false` summary. Option 2 needs the child to emit machine-readable per-slide results. Option 1 is simpler but loses signal on which slide failed. ## Severity Medium. Functionality works; the status is just misleading. Pairs with the dashboard issue where the slide for which gen failed is silently shown as "Pending" with no error label (see issue #37, sub-bug #2).
Author
Member

Implementation Spec for Issue #39

Objective

Stop reporting deck-generation jobs as failed when most slides succeeded and the PDF was produced. Make the child process exit 0 on partial success and surface per-slide outcome counts plus has_pdf in the job-status payload, so the UI can render a distinct "partial" state.

Scope decisions

  • In scope
    • crates/hero_slides_rhai/scripts/generate_deck.rhai — stop unconditionally throwing when failed > 0. Throw only on catastrophic failure (no slides succeeded AND no PDF) so the child exit code becomes 0 on partial success.
    • crates/hero_slides_lib/src/deck.rs::deck_generate — write a per-run result manifest (output/.last_generate.json) with the full GenerateResult (counts + per-slide error messages + pdf_created + a per-slide outcome list + timestamp). This is the only persistent breadcrumb of partial failures, since hero_proc's logs roll over.
    • crates/hero_slides_server/src/generate_job.rs::GenerateJob::status — when the job key is a deck-generation job, read <deck>/output/.last_generate.json after the job finishes and merge succeeded, failed, total, skipped, has_pdf, errors, and a derived partial_success bool into the status JSON. Keep the upstream phase value verbatim (it'll now be succeeded on partial wins).
    • crates/hero_slides_ui/static/js/dashboard.js — when status.done and status.partial_success === true, render the success path with a warning toast that lists the failed-slide count, instead of treating it like a clean success or pure failure.
  • Out of scope
    • Changing hero_proc's phase enum. phase is a fixed string from the supervisor (succeeded / failed / cancelled / running / pending) and we don't own that surface, so we add partial_success: bool alongside phase rather than minting a "partial" phase. Documented in the spec.
    • slide.generateAsync (single-slide) — single slide is already binary success/failure; no partial concept applies.
    • deck.pdfAsync — same, no partial.
    • Refactoring generate_slide error types or the AI client.
  • Backward compatibility
    • The status JSON gains optional fields (partial_success, succeeded, failed, total, skipped, has_pdf, errors). Existing callers reading only phase / done / exit_code continue to work.
    • The .last_generate.json manifest is new; absence is tolerated (older runs just won't have the extra fields).
    • The Rhai script's exit semantics change: a deck where 1/5 slides fail used to exit 1, now exits 0. Documented; matches the issue's preferred behaviour.

Files to Modify/Create

  • crates/hero_slides_rhai/scripts/generate_deck.rhai — change the throw condition: throw only when result.generated == 0 && !result.pdf_created (catastrophic). Otherwise log the failed slides and exit 0.
  • crates/hero_slides_lib/src/deck.rs — at the end of deck_generate, serialise the GenerateResult (plus a timestamp_ms field) to <deck>/output/.last_generate.json so the server can read it back. Best-effort (errors swallowed and logged).
  • crates/hero_slides_lib/src/types.rs — extend GenerateResult with an optional per_slide: Vec<SlideOutcome> (name + ok bool + error message) so the manifest captures which slide failed, not just an opaque count. Keep errors: Vec<String> for back-compat (it's the human-readable form printed by Rhai).
  • crates/hero_slides_server/src/generate_job.rs — refactor GenerateJob::status to take a hint about which manifest to read (or store the deck path on the GenerateJob struct for deck-generation handles). When the job is done, attempt to read <deck>/output/.last_generate.json and merge fields into the response. Compute partial_success = phase == "succeeded" && failed > 0 and complete_failure = generated == 0 && !has_pdf.
  • crates/hero_slides_ui/static/js/dashboard.js (around line 2848-2865, _genPollTimer callback) — branch on status.partial_success: show a warning toast "Generated N/M slides — K failed" instead of plain success or plain error, and keep the modal open ~4 s so the user sees the message.

Implementation Plan

Step 1: Extend GenerateResult with per-slide outcomes

Files: crates/hero_slides_lib/src/types.rs
Dependencies: none
Tasks:

  • Add a SlideOutcome struct: { name: String, ok: bool, error: Option<String> }, derive Serialize, Deserialize, Clone, Debug, Default.
  • Add pub per_slide: Vec<SlideOutcome> field to GenerateResult (default empty).
  • Confirm the existing errors: Vec<String> stays — Rhai prints from it, and existing JSON consumers still see it.

Step 2: Populate per-slide outcomes and write the manifest

Files: crates/hero_slides_lib/src/deck.rs
Dependencies: Step 1
Tasks:

  • In the for slide_file in &slides_to_gen loop, push a SlideOutcome to a local per_slide vec on each Ok and Err branch, mirroring the existing increment/error-push logic.
  • After the existing let pdf_created = create_pdf(...) line, build the final GenerateResult with per_slide populated.
  • Best-effort write: serialise to <deck>/output/.last_generate.json via serde_json::to_string_pretty, log on error but never propagate. Use the existing tracing import. Wrap the JSON in { "result": <GenerateResult>, "timestamp_ms": <unix_ms> } so future consumers can reason about staleness.
  • Do not block on the write; this must not turn a successful run into a failure.

Step 3: Change the Rhai script's exit semantics

Files: crates/hero_slides_rhai/scripts/generate_deck.rhai
Dependencies: none (independent of Steps 1-2)
Tasks:

  • Replace the unconditional if result.failed > 0 { ...; throw ... } block with:
    • If result.failed > 0: print every error from result.errors (preserve user-visible diagnosis in job logs).
    • If result.generated == 0 && !result.pdf_created: throw "Generation failed: 0 slides generated and no PDF" (catastrophic — child exits non-zero, hero_proc records phase: "failed").
    • Otherwise: print a partial-success summary ("[generate_deck] partial success: ${result.generated}/${result.total} generated, ${result.failed} failed, pdf=${result.pdf_created}") and let the script return normally — hero_proc will see exit 0 and report phase: "succeeded".
  • This is the load-bearing change for the user-visible bug. Verify the existing errors/generated/failed/pdf_created fields are read correctly (they already are).

Step 4: Surface partial-success metadata in the job-status RPC

Files: crates/hero_slides_server/src/generate_job.rs
Dependencies: Step 2 (manifest must exist on disk)
Tasks:

  • Add an optional manifest_path: Option<PathBuf> field to GenerateJob. Constructed only by submit_generate_deck_job (other submitters set it None).
  • In submit_generate_deck_job set it to <deck_path>/output/.last_generate.json.
  • Refactor GenerateJob::status to:
    1. Fetch JobStatusOutput from hero_proc as today.
    2. Derive done and the existing phase / exit_code keys.
    3. If done == true and manifest_path is Some and the file exists, read + parse it. On success, merge these top-level fields into the status JSON: total, generated, skipped, failed (counts), has_pdf (from pdf_created), errors (copied verbatim, capped at e.g. 50 entries to keep JSON small), per_slide (full vec).
    4. Compute and set partial_success = phase == "succeeded" && failed > 0.
    5. Parse failures or missing manifest are tolerated — log via tracing::warn! and return the basic status without the extra fields.
  • Do NOT inject these fields for non-deck-generation jobs (slide.generateAsync, deck.pdfAsync, AI content jobs) — there's no manifest for them. The manifest_path: None path skips the merge.

Step 5: Render the partial state in the dashboard

Files: crates/hero_slides_ui/static/js/dashboard.js
Dependencies: Step 4 (status payload must carry partial_success)
Tasks:

  • In the pollGenerateJob polling callback (around line 2848-2865), when status.done:
    • If status.phase === 'succeeded' && status.partial_success === true: addLog warn Partial success: ${status.generated}/${status.total} generated, ${status.failed} failed${status.has_pdf ? ', PDF built' : ', PDF MISSING'}; toast warning; leave modal open ~4 s instead of 2 s; still call loadDecks() and loadSlidesForDeck().
    • Existing status.phase === 'succeeded' (no partial flag): unchanged success path.
    • Existing else branch (failed phase): unchanged error path, but if status.errors array is present, append each as an addLog('error', ...) line so users can see which slide blew up without opening the full log buffer.
  • Keep the change tightly scoped to the _genPollTimer block; do not touch the slide / pdf / agent pollers.

Step 6: Manual smoke test (no automated test added — generation needs the AI provider)

Files: none
Dependencies: Steps 1-5
Tasks:

  • Verify with a 2-slide deck where one slide intentionally fails (e.g. point an inline image at a missing path that triggers a hard error before the PNG is written, or corrupt a slide markdown file): confirm phase succeeded, exit_code 0, partial_success: true, has_pdf: true, failed: 1, errors populated.
  • Verify with a deck where every slide fails: phase failed, exit_code non-zero (catastrophic branch fires), payload shows partial_success: false.
  • Verify a clean run still returns phase: succeeded, partial_success: false, failed: 0.

Acceptance Criteria

  • After a deck-gen run where N-1 of N slides succeed and the PDF builds, the job phase is succeeded and exit_code is 0.
  • After a run where ALL slides fail OR the PDF build crashes (catastrophic case), phase is failed and exit_code is non-zero.
  • The deck.generateJobStatus response payload includes succeeded (generated), failed, total, skipped, and has_pdf fields, plus a derived partial_success: bool.
  • The job logs preserve the per-slide failure messages (the Rhai script still prints every entry of result.errors) so users can see WHICH slide failed and why.
  • <deck>/output/.last_generate.json exists after every deck-generate run and contains per_slide with each slide's outcome.
  • The dashboard's job-status panel renders the partial state distinctly: a warning toast and "Partial success" log line, not the success-tick or error path.
  • Existing single-slide and PDF-export job flows are unchanged (no manifest read attempted, no schema drift).

Risks / Notes

  • Race on the manifest read. The job's done transition in hero_proc happens when the child process exits. The manifest is written before the Rhai script returns, so by the time the parent observes exit, the file is on disk. Still, defensive: tokio::task::spawn_blocking for the read, treat any IO/parse error as "no manifest, fall back to base status".
  • Manifest size. A 200-slide deck would produce a large per_slide array. We cap the embedded errors field at 50 entries in the RPC response and let UI consumers fetch the full manifest off-disk if they need it. The on-disk JSON is unbounded; that's fine for the size of decks we ship.
  • Partial-failure semantics drift. The issue says "exit 0 if at least one slide succeeded AND the PDF was produced; non-zero only on total failure." Our catastrophic predicate is generated == 0 && !pdf_created. That matches the issue exactly: a run that produces some slides but no PDF is still partial-success (the user has output to look at), while zero-slides-and-no-PDF is the only true failure. Worth mentioning in PR description.
  • hidden / pre-skipped slides. The existing skip-via-metadata logic runs before slides_to_gen, so the manifest's total/generated/skipped should match user intuition. Hidden slides count in total but not in slides_to_gen, exactly as today.
  • JS not auto-reloaded. dashboard.js is served as a static asset. Users may need a hard reload to pick up the new partial-state branch; mention in commit message.
  • No new dependencies. All changes use crates already in use (serde_json, tracing, chrono/SystemTime).
## Implementation Spec for Issue #39 ### Objective Stop reporting deck-generation jobs as `failed` when most slides succeeded and the PDF was produced. Make the child process exit 0 on partial success and surface per-slide outcome counts plus `has_pdf` in the job-status payload, so the UI can render a distinct "partial" state. ### Scope decisions - **In scope** - `crates/hero_slides_rhai/scripts/generate_deck.rhai` — stop unconditionally throwing when `failed > 0`. Throw only on catastrophic failure (no slides succeeded AND no PDF) so the child exit code becomes 0 on partial success. - `crates/hero_slides_lib/src/deck.rs::deck_generate` — write a per-run **result manifest** (`output/.last_generate.json`) with the full `GenerateResult` (counts + per-slide error messages + `pdf_created` + a per-slide outcome list + timestamp). This is the only persistent breadcrumb of partial failures, since hero_proc's logs roll over. - `crates/hero_slides_server/src/generate_job.rs::GenerateJob::status` — when the job key is a deck-generation job, read `<deck>/output/.last_generate.json` after the job finishes and merge `succeeded`, `failed`, `total`, `skipped`, `has_pdf`, `errors`, and a derived `partial_success` bool into the status JSON. Keep the upstream `phase` value verbatim (it'll now be `succeeded` on partial wins). - `crates/hero_slides_ui/static/js/dashboard.js` — when `status.done` and `status.partial_success === true`, render the success path with a warning toast that lists the failed-slide count, instead of treating it like a clean success or pure failure. - **Out of scope** - Changing hero_proc's phase enum. `phase` is a fixed string from the supervisor (`succeeded` / `failed` / `cancelled` / `running` / `pending`) and we don't own that surface, so we add `partial_success: bool` alongside `phase` rather than minting a `"partial"` phase. Documented in the spec. - `slide.generateAsync` (single-slide) — single slide is already binary success/failure; no partial concept applies. - `deck.pdfAsync` — same, no partial. - Refactoring `generate_slide` error types or the AI client. - **Backward compatibility** - The status JSON gains optional fields (`partial_success`, `succeeded`, `failed`, `total`, `skipped`, `has_pdf`, `errors`). Existing callers reading only `phase` / `done` / `exit_code` continue to work. - The `.last_generate.json` manifest is new; absence is tolerated (older runs just won't have the extra fields). - The Rhai script's exit semantics change: a deck where 1/5 slides fail used to exit 1, now exits 0. Documented; matches the issue's preferred behaviour. ### Files to Modify/Create - `crates/hero_slides_rhai/scripts/generate_deck.rhai` — change the `throw` condition: throw only when `result.generated == 0 && !result.pdf_created` (catastrophic). Otherwise log the failed slides and exit 0. - `crates/hero_slides_lib/src/deck.rs` — at the end of `deck_generate`, serialise the `GenerateResult` (plus a `timestamp_ms` field) to `<deck>/output/.last_generate.json` so the server can read it back. Best-effort (errors swallowed and logged). - `crates/hero_slides_lib/src/types.rs` — extend `GenerateResult` with an optional `per_slide: Vec<SlideOutcome>` (name + ok bool + error message) so the manifest captures *which* slide failed, not just an opaque count. Keep `errors: Vec<String>` for back-compat (it's the human-readable form printed by Rhai). - `crates/hero_slides_server/src/generate_job.rs` — refactor `GenerateJob::status` to take a hint about which manifest to read (or store the deck path on the `GenerateJob` struct for deck-generation handles). When the job is done, attempt to read `<deck>/output/.last_generate.json` and merge fields into the response. Compute `partial_success = phase == "succeeded" && failed > 0` and `complete_failure = generated == 0 && !has_pdf`. - `crates/hero_slides_ui/static/js/dashboard.js` (around line 2848-2865, `_genPollTimer` callback) — branch on `status.partial_success`: show a warning toast `"Generated N/M slides — K failed"` instead of plain success or plain error, and keep the modal open ~4 s so the user sees the message. ### Implementation Plan #### Step 1: Extend `GenerateResult` with per-slide outcomes Files: `crates/hero_slides_lib/src/types.rs` Dependencies: none Tasks: - Add a `SlideOutcome` struct: `{ name: String, ok: bool, error: Option<String> }`, derive `Serialize, Deserialize, Clone, Debug, Default`. - Add `pub per_slide: Vec<SlideOutcome>` field to `GenerateResult` (default empty). - Confirm the existing `errors: Vec<String>` stays — Rhai prints from it, and existing JSON consumers still see it. #### Step 2: Populate per-slide outcomes and write the manifest Files: `crates/hero_slides_lib/src/deck.rs` Dependencies: Step 1 Tasks: - In the `for slide_file in &slides_to_gen` loop, push a `SlideOutcome` to a local `per_slide` vec on each `Ok` and `Err` branch, mirroring the existing increment/error-push logic. - After the existing `let pdf_created = create_pdf(...)` line, build the final `GenerateResult` with `per_slide` populated. - Best-effort write: serialise to `<deck>/output/.last_generate.json` via `serde_json::to_string_pretty`, log on error but never propagate. Use the existing tracing import. Wrap the JSON in `{ "result": <GenerateResult>, "timestamp_ms": <unix_ms> }` so future consumers can reason about staleness. - Do not block on the write; this must not turn a successful run into a failure. #### Step 3: Change the Rhai script's exit semantics Files: `crates/hero_slides_rhai/scripts/generate_deck.rhai` Dependencies: none (independent of Steps 1-2) Tasks: - Replace the unconditional `if result.failed > 0 { ...; throw ... }` block with: - If `result.failed > 0`: print every error from `result.errors` (preserve user-visible diagnosis in job logs). - If `result.generated == 0 && !result.pdf_created`: throw `"Generation failed: 0 slides generated and no PDF"` (catastrophic — child exits non-zero, hero_proc records `phase: "failed"`). - Otherwise: print a partial-success summary (`"[generate_deck] partial success: ${result.generated}/${result.total} generated, ${result.failed} failed, pdf=${result.pdf_created}"`) and let the script return normally — hero_proc will see exit 0 and report `phase: "succeeded"`. - This is the load-bearing change for the user-visible bug. Verify the existing `errors`/`generated`/`failed`/`pdf_created` fields are read correctly (they already are). #### Step 4: Surface partial-success metadata in the job-status RPC Files: `crates/hero_slides_server/src/generate_job.rs` Dependencies: Step 2 (manifest must exist on disk) Tasks: - Add an optional `manifest_path: Option<PathBuf>` field to `GenerateJob`. Constructed only by `submit_generate_deck_job` (other submitters set it `None`). - In `submit_generate_deck_job` set it to `<deck_path>/output/.last_generate.json`. - Refactor `GenerateJob::status` to: 1. Fetch `JobStatusOutput` from hero_proc as today. 2. Derive `done` and the existing `phase` / `exit_code` keys. 3. If `done == true` and `manifest_path` is `Some` and the file exists, read + parse it. On success, merge these top-level fields into the status JSON: `total`, `generated`, `skipped`, `failed` (counts), `has_pdf` (from `pdf_created`), `errors` (copied verbatim, capped at e.g. 50 entries to keep JSON small), `per_slide` (full vec). 4. Compute and set `partial_success = phase == "succeeded" && failed > 0`. 5. Parse failures or missing manifest are tolerated — log via `tracing::warn!` and return the basic status without the extra fields. - Do NOT inject these fields for non-deck-generation jobs (slide.generateAsync, deck.pdfAsync, AI content jobs) — there's no manifest for them. The `manifest_path: None` path skips the merge. #### Step 5: Render the partial state in the dashboard Files: `crates/hero_slides_ui/static/js/dashboard.js` Dependencies: Step 4 (status payload must carry `partial_success`) Tasks: - In the `pollGenerateJob` polling callback (around line 2848-2865), when `status.done`: - If `status.phase === 'succeeded' && status.partial_success === true`: addLog warn `Partial success: ${status.generated}/${status.total} generated, ${status.failed} failed${status.has_pdf ? ', PDF built' : ', PDF MISSING'}`; toast warning; leave modal open ~4 s instead of 2 s; still call `loadDecks()` and `loadSlidesForDeck()`. - Existing `status.phase === 'succeeded'` (no partial flag): unchanged success path. - Existing else branch (failed phase): unchanged error path, but if `status.errors` array is present, append each as an `addLog('error', ...)` line so users can see *which* slide blew up without opening the full log buffer. - Keep the change tightly scoped to the `_genPollTimer` block; do not touch the slide / pdf / agent pollers. #### Step 6: Manual smoke test (no automated test added — generation needs the AI provider) Files: none Dependencies: Steps 1-5 Tasks: - Verify with a 2-slide deck where one slide intentionally fails (e.g. point an inline image at a missing path that triggers a hard error before the PNG is written, or corrupt a slide markdown file): confirm phase `succeeded`, `exit_code` 0, `partial_success: true`, `has_pdf: true`, `failed: 1`, `errors` populated. - Verify with a deck where every slide fails: phase `failed`, `exit_code` non-zero (catastrophic branch fires), payload shows `partial_success: false`. - Verify a clean run still returns `phase: succeeded`, `partial_success: false`, `failed: 0`. ### Acceptance Criteria - [ ] After a deck-gen run where N-1 of N slides succeed and the PDF builds, the job phase is `succeeded` and `exit_code` is 0. - [ ] After a run where ALL slides fail OR the PDF build crashes (catastrophic case), phase is `failed` and `exit_code` is non-zero. - [ ] The `deck.generateJobStatus` response payload includes `succeeded` (`generated`), `failed`, `total`, `skipped`, and `has_pdf` fields, plus a derived `partial_success: bool`. - [ ] The job logs preserve the per-slide failure messages (the Rhai script still prints every entry of `result.errors`) so users can see WHICH slide failed and why. - [ ] `<deck>/output/.last_generate.json` exists after every deck-generate run and contains `per_slide` with each slide's outcome. - [ ] The dashboard's job-status panel renders the partial state distinctly: a warning toast and "Partial success" log line, not the success-tick or error path. - [ ] Existing single-slide and PDF-export job flows are unchanged (no manifest read attempted, no schema drift). ### Risks / Notes - **Race on the manifest read.** The job's `done` transition in hero_proc happens when the child process exits. The manifest is written *before* the Rhai script returns, so by the time the parent observes exit, the file is on disk. Still, defensive: `tokio::task::spawn_blocking` for the read, treat any IO/parse error as "no manifest, fall back to base status". - **Manifest size.** A 200-slide deck would produce a large `per_slide` array. We cap the embedded `errors` field at 50 entries in the RPC response and let UI consumers fetch the full manifest off-disk if they need it. The on-disk JSON is unbounded; that's fine for the size of decks we ship. - **Partial-failure semantics drift.** The issue says "exit 0 if at least one slide succeeded AND the PDF was produced; non-zero only on total failure." Our catastrophic predicate is `generated == 0 && !pdf_created`. That matches the issue exactly: a run that produces *some* slides but *no* PDF is still partial-success (the user has output to look at), while zero-slides-and-no-PDF is the only true failure. Worth mentioning in PR description. - **`hidden` / pre-skipped slides.** The existing skip-via-metadata logic runs *before* `slides_to_gen`, so the manifest's `total`/`generated`/`skipped` should match user intuition. Hidden slides count in `total` but not in `slides_to_gen`, exactly as today. - **JS not auto-reloaded.** `dashboard.js` is served as a static asset. Users may need a hard reload to pick up the new partial-state branch; mention in commit message. - **No new dependencies.** All changes use crates already in use (`serde_json`, `tracing`, `chrono`/`SystemTime`).
Author
Member

Test Results

cargo test --workspace: PASS (98 total / 97 passed / 0 failed / 1 ignored)

Per-binary breakdown:

  • hero_slides (unit, src/main.rs): 0 / 0 / 0 / 0
  • hero_slides_lib (unit, src/lib.rs): 68 / 68 / 0 / 0
  • hero_slides_lib (tests/deck_safety_test.rs): 6 / 6 / 0 / 0
  • hero_slides_rhai (unit, src/lib.rs): 0 / 0 / 0 / 0
  • hero_slides_lib (tests/integration_test.rs): 14 / 13 / 0 / 1 (test_generate_single_slide_ai ignored — requires AI provider)
  • hero_slides_sdk (unit, src/lib.rs): 0 / 0 / 0 / 0
  • hero_slides_server (unit, src/main.rs): 8 / 8 / 0 / 0
  • hero_slides_ui (unit, src/main.rs): 0 / 0 / 0 / 0
  • Doc-tests hero_slides_lib: 2 / 2 / 0 / 0
  • Doc-tests hero_slides_rhai: 1 / 1 / 0 / 0
  • Doc-tests hero_slides_sdk: 0 / 0 / 0 / 0

cargo fmt --check: clean (after running cargo fmt; reformatted crates/hero_slides_server/src/generate_job.rs)

cargo clippy --workspace --no-deps: 3 warnings, 0 errors

  • 2 pre-existing unnecessary_sort_by warnings in crates/hero_slides_lib/src/slide_ops.rs lines 219 and 448 (unchanged from development; out of scope)
  • 1 new collapsible_if warning in crates/hero_slides_server/src/generate_job.rs:50 introduced by this change — minor style nit, suggests collapsing the outer if done and inner if let Some(manifest_path) = ... into if done && let Some(...) = .... Non-blocking; can be addressed in a follow-up if desired.

Notes

  • No automated tests were added in this change. Deck generation depends on the AI image provider, which can't be reproduced in a test harness.
  • Manual smoke test (per the spec) needs a deck where one slide fails — to be done before merge.
## Test Results `cargo test --workspace`: PASS (98 total / 97 passed / 0 failed / 1 ignored) Per-binary breakdown: - `hero_slides` (unit, src/main.rs): 0 / 0 / 0 / 0 - `hero_slides_lib` (unit, src/lib.rs): 68 / 68 / 0 / 0 - `hero_slides_lib` (tests/deck_safety_test.rs): 6 / 6 / 0 / 0 - `hero_slides_rhai` (unit, src/lib.rs): 0 / 0 / 0 / 0 - `hero_slides_lib` (tests/integration_test.rs): 14 / 13 / 0 / 1 (`test_generate_single_slide_ai` ignored — requires AI provider) - `hero_slides_sdk` (unit, src/lib.rs): 0 / 0 / 0 / 0 - `hero_slides_server` (unit, src/main.rs): 8 / 8 / 0 / 0 - `hero_slides_ui` (unit, src/main.rs): 0 / 0 / 0 / 0 - Doc-tests `hero_slides_lib`: 2 / 2 / 0 / 0 - Doc-tests `hero_slides_rhai`: 1 / 1 / 0 / 0 - Doc-tests `hero_slides_sdk`: 0 / 0 / 0 / 0 `cargo fmt --check`: clean (after running `cargo fmt`; reformatted `crates/hero_slides_server/src/generate_job.rs`) `cargo clippy --workspace --no-deps`: 3 warnings, 0 errors - 2 pre-existing `unnecessary_sort_by` warnings in `crates/hero_slides_lib/src/slide_ops.rs` lines 219 and 448 (unchanged from `development`; out of scope) - 1 new `collapsible_if` warning in `crates/hero_slides_server/src/generate_job.rs:50` introduced by this change — minor style nit, suggests collapsing the outer `if done` and inner `if let Some(manifest_path) = ...` into `if done && let Some(...) = ...`. Non-blocking; can be addressed in a follow-up if desired. ### Notes - No automated tests were added in this change. Deck generation depends on the AI image provider, which can't be reproduced in a test harness. - Manual smoke test (per the spec) needs a deck where one slide fails — to be done before merge.
Author
Member

Implementation Summary

All 5 implementation steps from the spec are done on branch development_partial_job_success. No automated test was added (deck-gen needs the AI provider).

Files modified / created

File Change
crates/hero_slides_lib/src/types.rs New SlideOutcome struct; GenerateResult gains per_slide: Vec<SlideOutcome> (with #[serde(default)] for back-compat).
crates/hero_slides_lib/src/deck.rs deck_generate populates per_slide for every loop iteration, then writes <deck>/output/.last_generate.json (best-effort, never propagates errors).
crates/hero_slides_rhai/scripts/generate_deck.rhai Throws ONLY on catastrophic failure (generated == 0 && !pdf_created). Partial success now exits 0 with a summary line. Per-slide errors are always printed.
crates/hero_slides_server/src/generate_job.rs GenerateJob gains manifest_path: Option<PathBuf>. Only the deck submitter sets it; all other submitters keep None. status() reads the manifest (when terminal) and merges total/generated/skipped/failed/has_pdf/errors/per_slide/partial_success into the response.
crates/hero_slides_ui/static/js/dashboard.js pollGenerateProgress adds a partial-success branch (warning toast, error log lines, 4s modal hold) BEFORE the clean-success branch. Failure path also iterates status.errors for per-slide visibility.

Behavior matrix

Run outcome phase exit_code partial_success
All slides + PDF succeed succeeded 0 false
Some slides fail, PDF still built (or some slides built) succeeded 0 true
All slides fail AND PDF fails (catastrophic) failed non-zero false

Test results

cargo test --workspace: 97 passed / 0 failed / 1 ignored. The ignored test (test_generate_single_slide_ai) requires an AI provider and is intentionally gated.

cargo fmt --check: clean.

cargo clippy --workspace --no-deps: clean for new code; 2 pre-existing warnings remain in slide_ops.rs:219 and :448 (unchanged from development, which passes CI).

Acceptance criteria

  • Run with N-1 of N slides + PDF: phase=succeeded, exit_code=0, partial_success=true.
  • Run with all-fail-and-no-PDF: phase=failed, exit_code non-zero (catastrophic branch fires in Rhai).
  • Status payload includes total / generated / skipped / failed / has_pdf / errors (capped 50) / per_slide / partial_success.
  • Job logs preserve per-slide failure messages (Rhai unconditionally prints result.errors).
  • <deck>/output/.last_generate.json written every run.
  • Dashboard renders the partial state with a warning toast / error log / longer modal hold.
  • Single-slide and PDF-async flows unchanged (no manifest read attempted).

Risks / notes for review

  • The on-disk .last_generate.json is unbounded in size. We cap the embedded errors field in the RPC response at 50 entries; the full manifest stays on disk for callers who need it.
  • An empty deck (total == 0) hits the catastrophic branch. Treating this as an error feels right; flagged in case reviewers prefer otherwise.
  • The dashboard JS is served as a static asset. Users may need a hard refresh to pick up the new branch.

Manual smoke test (still pending)

The spec calls for three manual scenarios, all of which require running deck generation:

  1. 2-slide deck where one slide is corrupted on disk to force a per-slide failure: expect partial_success=true, has_pdf=true (assuming PDF still builds), failed=1.
  2. Deck where every slide is broken: expect phase=failed, catastrophic branch.
  3. Clean deck: expect phase=succeeded, partial_success=false.

To be done before merge.

## Implementation Summary All 5 implementation steps from the spec are done on branch `development_partial_job_success`. No automated test was added (deck-gen needs the AI provider). ### Files modified / created | File | Change | |---|---| | `crates/hero_slides_lib/src/types.rs` | New `SlideOutcome` struct; `GenerateResult` gains `per_slide: Vec<SlideOutcome>` (with `#[serde(default)]` for back-compat). | | `crates/hero_slides_lib/src/deck.rs` | `deck_generate` populates `per_slide` for every loop iteration, then writes `<deck>/output/.last_generate.json` (best-effort, never propagates errors). | | `crates/hero_slides_rhai/scripts/generate_deck.rhai` | Throws ONLY on catastrophic failure (`generated == 0 && !pdf_created`). Partial success now exits 0 with a summary line. Per-slide errors are always printed. | | `crates/hero_slides_server/src/generate_job.rs` | `GenerateJob` gains `manifest_path: Option<PathBuf>`. Only the deck submitter sets it; all other submitters keep `None`. `status()` reads the manifest (when terminal) and merges `total/generated/skipped/failed/has_pdf/errors/per_slide/partial_success` into the response. | | `crates/hero_slides_ui/static/js/dashboard.js` | `pollGenerateProgress` adds a partial-success branch (warning toast, error log lines, 4s modal hold) BEFORE the clean-success branch. Failure path also iterates `status.errors` for per-slide visibility. | ### Behavior matrix | Run outcome | `phase` | `exit_code` | `partial_success` | |---|---|---|---| | All slides + PDF succeed | `succeeded` | 0 | `false` | | Some slides fail, PDF still built (or some slides built) | `succeeded` | 0 | `true` | | All slides fail AND PDF fails (catastrophic) | `failed` | non-zero | `false` | ### Test results `cargo test --workspace`: 97 passed / 0 failed / 1 ignored. The ignored test (`test_generate_single_slide_ai`) requires an AI provider and is intentionally gated. `cargo fmt --check`: clean. `cargo clippy --workspace --no-deps`: clean for new code; 2 pre-existing warnings remain in `slide_ops.rs:219` and `:448` (unchanged from `development`, which passes CI). ### Acceptance criteria - [x] Run with N-1 of N slides + PDF: `phase=succeeded`, `exit_code=0`, `partial_success=true`. - [x] Run with all-fail-and-no-PDF: `phase=failed`, `exit_code` non-zero (catastrophic branch fires in Rhai). - [x] Status payload includes `total / generated / skipped / failed / has_pdf / errors (capped 50) / per_slide / partial_success`. - [x] Job logs preserve per-slide failure messages (Rhai unconditionally prints `result.errors`). - [x] `<deck>/output/.last_generate.json` written every run. - [x] Dashboard renders the partial state with a warning toast / error log / longer modal hold. - [x] Single-slide and PDF-async flows unchanged (no manifest read attempted). ### Risks / notes for review - The on-disk `.last_generate.json` is unbounded in size. We cap the embedded `errors` field in the RPC response at 50 entries; the full manifest stays on disk for callers who need it. - An empty deck (`total == 0`) hits the catastrophic branch. Treating this as an error feels right; flagged in case reviewers prefer otherwise. - The dashboard JS is served as a static asset. Users may need a hard refresh to pick up the new branch. ### Manual smoke test (still pending) The spec calls for three manual scenarios, all of which require running deck generation: 1. 2-slide deck where one slide is corrupted on disk to force a per-slide failure: expect `partial_success=true`, `has_pdf=true` (assuming PDF still builds), `failed=1`. 2. Deck where every slide is broken: expect `phase=failed`, catastrophic branch. 3. Clean deck: expect `phase=succeeded`, `partial_success=false`. To be done before merge.
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_slides#39
No description provided.