HTML generation pipeline: validator strips image slots + diagram label suppression needs rework #95
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
The HTML slide generation pipeline has accumulated several interacting issues that are causing images to disappear entirely from generated slides.
Root cause
In attempting to suppress CSS-drawn diagram labels (short uppercase text placed around image slots by the LLM, e.g. "INTENT", "GUARDRAILS"), a validation pass was added that now runs on every slide containing image slots. This validation pass is an LLM rewriting the full HTML, and it is stripping the
data-image-slotelements — either by misidentifying legitimate section labels near images as "diagram annotations", or by corrupting the attribute during its full-document rewrite. Since image injection happens after the validator, stripped slots = no images.Specific issues
1. Universal validator is wrong tool for diagram labels
The validator (
system_validate_slide.md) was originally scoped to SVG-only. It was broadened to fire on all slides with image slots. Running a full LLM HTML rewrite just to strip a few text labels is expensive, fragile, and unreliable — the LLM cannot reliably distinguish a diagram label from a legitimate section label positioned near an image.2. Diagram label suppression belongs at content level or via regex
The LLM renders labels because the slide content asks for a labeled diagram. Suppressing the output after generation is fighting the model on every run. A deterministic regex pass would be more reliable: remove
position:absoluteelements that are siblings of adata-image-slotcontainer, have <5 words of text, and sit within ~200px of the slot.3. Two prompts in conflict
system_generate_slide.mdsays "put images in slots".system_validate_slide.mdsays "preserve slots, but remove labels near slots". These are hard to reconcile reliably with an LLM.4. Late image injection is fragile
Images are injected after 3 separate AI passes (generation → validation → overflow check). Any pass that corrupts
data-image-slotattributes causes silent image loss with no error surfaced to the user.5. Animations interact with validator
Every element starts at
opacity:0due to entrance animations. The validator LLM reasons about a CSS-animated state it cannot see, potentially making wrong corrections.Recommended fix
Branch
This regression was introduced on branch
development_pptx_mockupduring the animation + prompt extraction work (commit9426df8and subsequent commits).Implementation Spec — Issue #95
HTML Generation Pipeline: Validator Strips Image Slots + Diagram Label Suppression Rework
Objective
Restore reliable image slot preservation through the HTML generation pipeline by:
<svg>elements (its original scope), removing thedata-image-slotbranch that causes the slot-stripping regression.position:absolutesibling elements near image slots that match the short-uppercase-label heuristic.system_validate_slide.mdto remove Violation #3 (diagram label annotations), since that concern is now handled deterministically.Requirements
validate_slide_html) must only fire when<svg>is present in the HTML, not whendata-image-slotis present.strip_diagram_labelsmust run after the validator (and afterreplace_fake_logos_with_slots) to remove short-text absolute-positioned elements that are siblings ofdata-image-slotcontainers.strip_diagram_labels:<div>or<span>(or<p>) withposition:absolutein its style attribute.data-image-slot.top/left/right/bottomCSS values place it within 200px of adata-image-slotcontainer's bounding box OR the element is a sibling inside the same parent container as adata-image-slotdiv.data-image-slot="true".info!immediately after each pipeline stage: (a)replace_fake_logos_with_slots, (b)validate_slide_html, (c)strip_diagram_labels, (d)extract_image_slot_descriptions.system_validate_slide.mdmust have Violation #3 ("DIAGRAM LABEL ANNOTATIONS") removed; the remaining violations must be preserved and renumbered.system_generate_slide.mdis left unchanged.Files to Modify
crates/hero_slides_lib/src/generator.rsstrip_diagram_labelsfunction; (3) Insert call in pipeline; (4) Add slot-countinfo!logs.crates/hero_slides_lib/src/prompts/system_validate_slide.mdStep-by-Step Implementation Plan
Step 1 — Revert validator trigger to SVG-only
File:
crates/hero_slides_lib/src/generator.rsDependencies: none | Parallelisable with: Steps 2, 4
Change the
needs_validationcondition:Update the adjacent comment to note that image-slot-only slides are handled by
strip_diagram_labelsinstead.Step 2 — Add slot-count logging at each pipeline stage
File:
crates/hero_slides_lib/src/generator.rsDependencies: none | Parallelisable with: Steps 1, 4
Insert
info!log lines after each of these calls:replace_fake_logos_with_slotsvalidate_slide_htmlstrip_diagram_labels(to be added in Step 3)Format:
"generate_slide_html {}: after <stage>, {} slot(s) present"Step 3 — Implement
strip_diagram_labelsand integrate into pipelineFile:
crates/hero_slides_lib/src/generator.rsDependencies: Steps 1 and 2
Add a new private function
strip_diagram_labels(html: &str) -> Stringthat:top/leftestimates for alldata-image-slotelements.position:absoluteelements, extracts their inner text, counts words.data-image-slot.Uses string scanning only (no new crate dependencies). Insert the call in
generate_slide_htmlafter the validator block and beforeextract_image_slot_descriptions.Step 4 — Update
system_validate_slide.mdFile:
crates/hero_slides_lib/src/prompts/system_validate_slide.mdDependencies: none | Parallelisable with: Steps 1, 2
Remove the entire Violation #3 block ("DIAGRAM LABEL ANNOTATIONS") and renumber the remaining violations so they are sequential.
Acceptance Criteria
data-image-slotbut no<svg>does NOT trigger the LLM validator pass.<svg>still triggers the validator pass.strip_diagram_labels.data-image-slot="true") is never removed bystrip_diagram_labels.system_validate_slide.mdno longer contains the "DIAGRAM LABEL ANNOTATIONS" section.cargo build -p hero_slides_libsucceeds with no new warnings.cargo test -p hero_slides_lib).Notes
replace_fake_logos_with_slots,replace_html_tag_with_slot, etc.). No new crate dependency is needed or justified.browser_pptx.rschanges are unrelated: The current uncommitted whitespace diff inbrowser_pptx.rsis not part of this issue.system_validate_slide.mdis embedded viainclude_str!, editing the file is sufficient. Users with local prompt overrides will need to update their copies.Implementation Spec — Issue #95
HTML Generation Pipeline: Validator Strips Image Slots + Diagram Label Suppression Rework
Objective
Restore reliable image slot preservation through the HTML generation pipeline by:
<svg>elements (its original scope), removing thedata-image-slotbranch that causes the slot-stripping regression.position:absolutesibling elements near image slots that match the short-uppercase-label heuristic.system_validate_slide.mdto remove Violation #3 (diagram label annotations), since that concern is now handled deterministically.Requirements
validate_slide_html) must only fire when<svg>is present in the HTML, not whendata-image-slotis present.strip_diagram_labelsmust run after the validator (and afterreplace_fake_logos_with_slots) to remove short-text absolute-positioned elements that are siblings ofdata-image-slotcontainers.strip_diagram_labels:<div>or<span>(or<p>) withposition:absolutein its style attribute.data-image-slot.top/left/right/bottomCSS values place it within 200px of adata-image-slotcontainer's bounding box OR the element is a sibling inside the same parent container as adata-image-slotdiv.data-image-slot="true".info!immediately after each pipeline stage: (a)replace_fake_logos_with_slots, (b)validate_slide_html, (c)strip_diagram_labels, (d)extract_image_slot_descriptions.system_validate_slide.mdmust have Violation #3 ("DIAGRAM LABEL ANNOTATIONS") removed; the remaining violations must be preserved and renumbered.system_generate_slide.mdis left unchanged.Files to Modify
crates/hero_slides_lib/src/generator.rsstrip_diagram_labelsfunction; (3) Insert call in pipeline; (4) Add slot-countinfo!logs.crates/hero_slides_lib/src/prompts/system_validate_slide.mdStep-by-Step Implementation Plan
Step 1 — Revert validator trigger to SVG-only
File:
crates/hero_slides_lib/src/generator.rsDependencies: none | Parallelisable with: Steps 2, 4
Change the
needs_validationcondition:Update the adjacent comment to note that image-slot-only slides are handled by
strip_diagram_labelsinstead.Step 2 — Add slot-count logging at each pipeline stage
File:
crates/hero_slides_lib/src/generator.rsDependencies: none | Parallelisable with: Steps 1, 4
Insert
info!log lines after each of these calls:replace_fake_logos_with_slotsvalidate_slide_htmlstrip_diagram_labels(to be added in Step 3)Format:
"generate_slide_html {}: after <stage>, {} slot(s) present"Step 3 — Implement
strip_diagram_labelsand integrate into pipelineFile:
crates/hero_slides_lib/src/generator.rsDependencies: Steps 1 and 2
Add a new private function
strip_diagram_labels(html: &str) -> Stringthat:top/leftestimates for alldata-image-slotelements.position:absoluteelements, extracts their inner text, counts words.data-image-slot.Uses string scanning only (no new crate dependencies). Insert the call in
generate_slide_htmlafter the validator block and beforeextract_image_slot_descriptions.Step 4 — Update
system_validate_slide.mdFile:
crates/hero_slides_lib/src/prompts/system_validate_slide.mdDependencies: none | Parallelisable with: Steps 1, 2
Remove the entire Violation #3 block ("DIAGRAM LABEL ANNOTATIONS") and renumber the remaining violations so they are sequential.
Acceptance Criteria
data-image-slotbut no<svg>does NOT trigger the LLM validator pass.<svg>still triggers the validator pass.strip_diagram_labels.data-image-slot="true") is never removed bystrip_diagram_labels.system_validate_slide.mdno longer contains the "DIAGRAM LABEL ANNOTATIONS" section.cargo build -p hero_slides_libsucceeds with no new warnings.cargo test -p hero_slides_lib).Notes
replace_fake_logos_with_slots,replace_html_tag_with_slot, etc.). No new crate dependency is needed or justified.browser_pptx.rschanges are unrelated: The current uncommitted whitespace diff inbrowser_pptx.rsis not part of this issue.system_validate_slide.mdis embedded viainclude_str!, editing the file is sufficient. Users with local prompt overrides will need to update their copies.Test Results
Build: success
Details
All 3 test suites passed:
Finished in ~1m 14s (including full dependency compilation from scratch).
Implementation Complete
Changes Made
crates/hero_slides_lib/src/generator.rsneeds_validationnow checkshtml.contains("<svg")only; the|| html.contains("data-image-slot")branch that caused slot stripping is removed.strip_diagram_labels(html: &str) -> String— a deterministic string-scanning function that removesposition:absoluteelements with fewer than 5 words of inner text that are positioned within 200px of anydata-image-slotelement. Never removes slot elements themselves.strip_diagram_labelsintogenerate_slide_htmlafter the validation pass and beforeextract_image_slot_descriptions.info!slot-count logs after each pipeline stage: logo-replacement, validation, label-stripping.crates/hero_slides_lib/src/prompts/system_validate_slide.mdstrip_diagram_labels.Test Results
All existing tests pass. Build clean.
Acceptance Criteria
data-image-slotbut no<svg>does not trigger the LLM validator pass.<svg>still triggers the validator pass.strip_diagram_labels.data-image-slot="true") is never removed bystrip_diagram_labels.system_validate_slide.mdno longer contains the "DIAGRAM LABEL ANNOTATIONS" section.cargo build -p hero_slides_libsucceeds with no new warnings.