deck.generateAsync: phase 'failed' on partial success #39
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
The async deck-generation job is reported as
phase: "failed",exit_code: 1even when most slides succeeded and a PDF was produced. A user reading only the job status concludes nothing happened, when in reality 3 of 4 (or 4 of 5) slides + the PDF are sitting on disk.Reproduction
phase: "failed",exit_code: 1.has_pdf: true.Root cause
crates/hero_slides_server/src/generate_job.rs:34-42 returns the phase / exit_code straight from hero_proc. The child process exits with a non-zero code if any single slide gen fails, and there's no concept of "partial" success at this layer.
Suggested fix
Two options (not mutually exclusive):
exit_code: 0corresponds to user-visible success.partialphase. After the job ends, count the per-slide outputs (or read a per-slide result manifest the child writes). Surfacephase: "partial"with asucceeded: N, failed: M, has_pdf: true|falsesummary.Option 2 needs the child to emit machine-readable per-slide results. Option 1 is simpler but loses signal on which slide failed.
Severity
Medium. Functionality works; the status is just misleading. Pairs with the dashboard issue where the slide for which gen failed is silently shown as "Pending" with no error label (see issue #37, sub-bug #2).
Implementation Spec for Issue #39
Objective
Stop reporting deck-generation jobs as
failedwhen most slides succeeded and the PDF was produced. Make the child process exit 0 on partial success and surface per-slide outcome counts plushas_pdfin the job-status payload, so the UI can render a distinct "partial" state.Scope decisions
crates/hero_slides_rhai/scripts/generate_deck.rhai— stop unconditionally throwing whenfailed > 0. Throw only on catastrophic failure (no slides succeeded AND no PDF) so the child exit code becomes 0 on partial success.crates/hero_slides_lib/src/deck.rs::deck_generate— write a per-run result manifest (output/.last_generate.json) with the fullGenerateResult(counts + per-slide error messages +pdf_created+ a per-slide outcome list + timestamp). This is the only persistent breadcrumb of partial failures, since hero_proc's logs roll over.crates/hero_slides_server/src/generate_job.rs::GenerateJob::status— when the job key is a deck-generation job, read<deck>/output/.last_generate.jsonafter the job finishes and mergesucceeded,failed,total,skipped,has_pdf,errors, and a derivedpartial_successbool into the status JSON. Keep the upstreamphasevalue verbatim (it'll now besucceededon partial wins).crates/hero_slides_ui/static/js/dashboard.js— whenstatus.doneandstatus.partial_success === true, render the success path with a warning toast that lists the failed-slide count, instead of treating it like a clean success or pure failure.phaseis a fixed string from the supervisor (succeeded/failed/cancelled/running/pending) and we don't own that surface, so we addpartial_success: boolalongsidephaserather than minting a"partial"phase. Documented in the spec.slide.generateAsync(single-slide) — single slide is already binary success/failure; no partial concept applies.deck.pdfAsync— same, no partial.generate_slideerror types or the AI client.partial_success,succeeded,failed,total,skipped,has_pdf,errors). Existing callers reading onlyphase/done/exit_codecontinue to work..last_generate.jsonmanifest is new; absence is tolerated (older runs just won't have the extra fields).Files to Modify/Create
crates/hero_slides_rhai/scripts/generate_deck.rhai— change thethrowcondition: throw only whenresult.generated == 0 && !result.pdf_created(catastrophic). Otherwise log the failed slides and exit 0.crates/hero_slides_lib/src/deck.rs— at the end ofdeck_generate, serialise theGenerateResult(plus atimestamp_msfield) to<deck>/output/.last_generate.jsonso the server can read it back. Best-effort (errors swallowed and logged).crates/hero_slides_lib/src/types.rs— extendGenerateResultwith an optionalper_slide: Vec<SlideOutcome>(name + ok bool + error message) so the manifest captures which slide failed, not just an opaque count. Keeperrors: Vec<String>for back-compat (it's the human-readable form printed by Rhai).crates/hero_slides_server/src/generate_job.rs— refactorGenerateJob::statusto take a hint about which manifest to read (or store the deck path on theGenerateJobstruct for deck-generation handles). When the job is done, attempt to read<deck>/output/.last_generate.jsonand merge fields into the response. Computepartial_success = phase == "succeeded" && failed > 0andcomplete_failure = generated == 0 && !has_pdf.crates/hero_slides_ui/static/js/dashboard.js(around line 2848-2865,_genPollTimercallback) — branch onstatus.partial_success: show a warning toast"Generated N/M slides — K failed"instead of plain success or plain error, and keep the modal open ~4 s so the user sees the message.Implementation Plan
Step 1: Extend
GenerateResultwith per-slide outcomesFiles:
crates/hero_slides_lib/src/types.rsDependencies: none
Tasks:
SlideOutcomestruct:{ name: String, ok: bool, error: Option<String> }, deriveSerialize, Deserialize, Clone, Debug, Default.pub per_slide: Vec<SlideOutcome>field toGenerateResult(default empty).errors: Vec<String>stays — Rhai prints from it, and existing JSON consumers still see it.Step 2: Populate per-slide outcomes and write the manifest
Files:
crates/hero_slides_lib/src/deck.rsDependencies: Step 1
Tasks:
for slide_file in &slides_to_genloop, push aSlideOutcometo a localper_slidevec on eachOkandErrbranch, mirroring the existing increment/error-push logic.let pdf_created = create_pdf(...)line, build the finalGenerateResultwithper_slidepopulated.<deck>/output/.last_generate.jsonviaserde_json::to_string_pretty, log on error but never propagate. Use the existing tracing import. Wrap the JSON in{ "result": <GenerateResult>, "timestamp_ms": <unix_ms> }so future consumers can reason about staleness.Step 3: Change the Rhai script's exit semantics
Files:
crates/hero_slides_rhai/scripts/generate_deck.rhaiDependencies: none (independent of Steps 1-2)
Tasks:
if result.failed > 0 { ...; throw ... }block with:result.failed > 0: print every error fromresult.errors(preserve user-visible diagnosis in job logs).result.generated == 0 && !result.pdf_created: throw"Generation failed: 0 slides generated and no PDF"(catastrophic — child exits non-zero, hero_proc recordsphase: "failed")."[generate_deck] partial success: ${result.generated}/${result.total} generated, ${result.failed} failed, pdf=${result.pdf_created}") and let the script return normally — hero_proc will see exit 0 and reportphase: "succeeded".errors/generated/failed/pdf_createdfields are read correctly (they already are).Step 4: Surface partial-success metadata in the job-status RPC
Files:
crates/hero_slides_server/src/generate_job.rsDependencies: Step 2 (manifest must exist on disk)
Tasks:
manifest_path: Option<PathBuf>field toGenerateJob. Constructed only bysubmit_generate_deck_job(other submitters set itNone).submit_generate_deck_jobset it to<deck_path>/output/.last_generate.json.GenerateJob::statusto:JobStatusOutputfrom hero_proc as today.doneand the existingphase/exit_codekeys.done == trueandmanifest_pathisSomeand the file exists, read + parse it. On success, merge these top-level fields into the status JSON:total,generated,skipped,failed(counts),has_pdf(frompdf_created),errors(copied verbatim, capped at e.g. 50 entries to keep JSON small),per_slide(full vec).partial_success = phase == "succeeded" && failed > 0.tracing::warn!and return the basic status without the extra fields.manifest_path: Nonepath skips the merge.Step 5: Render the partial state in the dashboard
Files:
crates/hero_slides_ui/static/js/dashboard.jsDependencies: Step 4 (status payload must carry
partial_success)Tasks:
pollGenerateJobpolling callback (around line 2848-2865), whenstatus.done:status.phase === 'succeeded' && status.partial_success === true: addLog warnPartial success: ${status.generated}/${status.total} generated, ${status.failed} failed${status.has_pdf ? ', PDF built' : ', PDF MISSING'}; toast warning; leave modal open ~4 s instead of 2 s; still callloadDecks()andloadSlidesForDeck().status.phase === 'succeeded'(no partial flag): unchanged success path.status.errorsarray is present, append each as anaddLog('error', ...)line so users can see which slide blew up without opening the full log buffer._genPollTimerblock; do not touch the slide / pdf / agent pollers.Step 6: Manual smoke test (no automated test added — generation needs the AI provider)
Files: none
Dependencies: Steps 1-5
Tasks:
succeeded,exit_code0,partial_success: true,has_pdf: true,failed: 1,errorspopulated.failed,exit_codenon-zero (catastrophic branch fires), payload showspartial_success: false.phase: succeeded,partial_success: false,failed: 0.Acceptance Criteria
succeededandexit_codeis 0.failedandexit_codeis non-zero.deck.generateJobStatusresponse payload includessucceeded(generated),failed,total,skipped, andhas_pdffields, plus a derivedpartial_success: bool.result.errors) so users can see WHICH slide failed and why.<deck>/output/.last_generate.jsonexists after every deck-generate run and containsper_slidewith each slide's outcome.Risks / Notes
donetransition in hero_proc happens when the child process exits. The manifest is written before the Rhai script returns, so by the time the parent observes exit, the file is on disk. Still, defensive:tokio::task::spawn_blockingfor the read, treat any IO/parse error as "no manifest, fall back to base status".per_slidearray. We cap the embeddederrorsfield at 50 entries in the RPC response and let UI consumers fetch the full manifest off-disk if they need it. The on-disk JSON is unbounded; that's fine for the size of decks we ship.generated == 0 && !pdf_created. That matches the issue exactly: a run that produces some slides but no PDF is still partial-success (the user has output to look at), while zero-slides-and-no-PDF is the only true failure. Worth mentioning in PR description.hidden/ pre-skipped slides. The existing skip-via-metadata logic runs beforeslides_to_gen, so the manifest'stotal/generated/skippedshould match user intuition. Hidden slides count intotalbut not inslides_to_gen, exactly as today.dashboard.jsis served as a static asset. Users may need a hard reload to pick up the new partial-state branch; mention in commit message.serde_json,tracing,chrono/SystemTime).Test Results
cargo test --workspace: PASS (98 total / 97 passed / 0 failed / 1 ignored)Per-binary breakdown:
hero_slides(unit, src/main.rs): 0 / 0 / 0 / 0hero_slides_lib(unit, src/lib.rs): 68 / 68 / 0 / 0hero_slides_lib(tests/deck_safety_test.rs): 6 / 6 / 0 / 0hero_slides_rhai(unit, src/lib.rs): 0 / 0 / 0 / 0hero_slides_lib(tests/integration_test.rs): 14 / 13 / 0 / 1 (test_generate_single_slide_aiignored — requires AI provider)hero_slides_sdk(unit, src/lib.rs): 0 / 0 / 0 / 0hero_slides_server(unit, src/main.rs): 8 / 8 / 0 / 0hero_slides_ui(unit, src/main.rs): 0 / 0 / 0 / 0hero_slides_lib: 2 / 2 / 0 / 0hero_slides_rhai: 1 / 1 / 0 / 0hero_slides_sdk: 0 / 0 / 0 / 0cargo fmt --check: clean (after runningcargo fmt; reformattedcrates/hero_slides_server/src/generate_job.rs)cargo clippy --workspace --no-deps: 3 warnings, 0 errorsunnecessary_sort_bywarnings incrates/hero_slides_lib/src/slide_ops.rslines 219 and 448 (unchanged fromdevelopment; out of scope)collapsible_ifwarning incrates/hero_slides_server/src/generate_job.rs:50introduced by this change — minor style nit, suggests collapsing the outerif doneand innerif let Some(manifest_path) = ...intoif done && let Some(...) = .... Non-blocking; can be addressed in a follow-up if desired.Notes
Implementation Summary
All 5 implementation steps from the spec are done on branch
development_partial_job_success. No automated test was added (deck-gen needs the AI provider).Files modified / created
crates/hero_slides_lib/src/types.rsSlideOutcomestruct;GenerateResultgainsper_slide: Vec<SlideOutcome>(with#[serde(default)]for back-compat).crates/hero_slides_lib/src/deck.rsdeck_generatepopulatesper_slidefor every loop iteration, then writes<deck>/output/.last_generate.json(best-effort, never propagates errors).crates/hero_slides_rhai/scripts/generate_deck.rhaigenerated == 0 && !pdf_created). Partial success now exits 0 with a summary line. Per-slide errors are always printed.crates/hero_slides_server/src/generate_job.rsGenerateJobgainsmanifest_path: Option<PathBuf>. Only the deck submitter sets it; all other submitters keepNone.status()reads the manifest (when terminal) and mergestotal/generated/skipped/failed/has_pdf/errors/per_slide/partial_successinto the response.crates/hero_slides_ui/static/js/dashboard.jspollGenerateProgressadds a partial-success branch (warning toast, error log lines, 4s modal hold) BEFORE the clean-success branch. Failure path also iteratesstatus.errorsfor per-slide visibility.Behavior matrix
phaseexit_codepartial_successsucceededfalsesucceededtruefailedfalseTest results
cargo test --workspace: 97 passed / 0 failed / 1 ignored. The ignored test (test_generate_single_slide_ai) requires an AI provider and is intentionally gated.cargo fmt --check: clean.cargo clippy --workspace --no-deps: clean for new code; 2 pre-existing warnings remain inslide_ops.rs:219and:448(unchanged fromdevelopment, which passes CI).Acceptance criteria
phase=succeeded,exit_code=0,partial_success=true.phase=failed,exit_codenon-zero (catastrophic branch fires in Rhai).total / generated / skipped / failed / has_pdf / errors (capped 50) / per_slide / partial_success.result.errors).<deck>/output/.last_generate.jsonwritten every run.Risks / notes for review
.last_generate.jsonis unbounded in size. We cap the embeddederrorsfield in the RPC response at 50 entries; the full manifest stays on disk for callers who need it.total == 0) hits the catastrophic branch. Treating this as an error feels right; flagged in case reviewers prefer otherwise.Manual smoke test (still pending)
The spec calls for three manual scenarios, all of which require running deck generation:
partial_success=true,has_pdf=true(assuming PDF still builds),failed=1.phase=failed, catastrophic branch.phase=succeeded,partial_success=false.To be done before merge.