feat(test): E2E test suite via browser-deployed SPA — multi-user, multi-project, real-time #3

Closed
opened 2026-05-01 02:14:23 +00:00 by mik-tf · 1 comment
Owner

Why

Phase 16 (session 24) demonstrated that the SPA path (_ui --dist + socat TCP→UDS bridge) is fully drivable from a headless Chromium. This unlocks the missing top of the test pyramid: actual user-experience tests that exercise enrollment, ticket flows, presence, live updates, attachments, and cross-project isolation against the real wire stack — not just unit + integration tests against the JSON-RPC layer.

Goal: "OK it works" confidence. Catch regressions like Phase 17's WebSocket-reactive-dep bug or session-23's Bootstrap-load-mechanism breakage at PR time, not in the user's hands.

Sequencing — depends on #5 (Phase 18)

This issue is Phase 19 in the roadmap, gated by #5 (Phase 18: vision + e2e_checklist + architecture + testing docs). Reason: Phase 18 lands the row-by-row spec matrix (freezone-pattern), and every Playwright spec written here will reference a row in that checklist. Writing tests before the spec rows risks orphan tests with no clear acceptance criterion.

Demo / test infrastructure (8083-8086 scheme)

Per session-24's rebrand:

URL Company Role Sign-in email
8083 Mycelium support support@mycelium.com
8084 Mycelium customer alice@example.com
8085 Hero support support@hero.com
8086 Hero customer bob@example.com

Each port = separate origin = isolated localStorage = independent logged-in identity (drivable from a single Chromium with multiple BrowserContexts).

Add carol@example.com / dave@example.com per project for two-customer presence-comparison tests.

Deliverables

  1. Seed simplification — port the rebrand from session-24's stopgap rebrand.py into crates/hero_assistance_server/examples/phase13_seed.rs. Rename acme profile → mycelium and globexhero. Update workspace names + user emails to the simplified scheme.
  2. Demo orchestrator — extend scripts/phase13_demo.sh to also start the 2 _ui --dist instances + 4 socat bridges; new --browser flag (default --app for backward compat).
  3. Test framework: Playwright (recommended over hero_browser MCP — parallel drive is more reliable; supports BrowserContext-per-origin natively; built-in trace/video). Layer at tests/e2e/ with playwright.config.ts + per-scenario .spec.ts files.
  4. Phase 1 scenarios:
    • Enrollment per role (support + customer) — sign-in flow; localStorage["hero_assistance.identity"] persists; F5 keeps you signed in
    • Single-user ticket lifecycle — file → comment → optimistic insert reconciled with server row
    • Two-user real-time — customer + support both view the same ticket; presence shows in "Viewing now" sidebar; back-and-forth comments arrive live without refresh
    • Cross-project isolation — Brian (Hero) posts a comment; Alice (Mycelium) does NOT see it
    • Image attachment — synthesize a small PNG via Uint8ArrayFile → fire programmatic paste event; verify upload + inline thumbnail render
    • Modal flow — "New ticket" Modal opens, submits, dismisses on success
    • Error states — wrong magic-link token shows -32002 "Request a new link" affordance
    • Responsive — 375 / 768 / 1920 viewports; no horizontal overflow
  5. CI integrationmake e2e target; Forgejo workflow runs E2E on every PR (or nightly if too slow).
  6. Visual regression — pixelmatch screenshot comparison, <1% perceptual diff threshold per D-09. Replaces the manual baseline-recapture step in Phase 16b.
  7. Documentationdocs/dev/testing.md (test-pyramid layering); docs/dev/demo.md (manual exploration runbook for the 4-port browser scheme).
  8. New decision file D-XX-e2e-test-architecture.md — Playwright + multi-context-per-origin pattern; the seed-rebrand convention; visual-regression policy.

Out of scope

  • Auth split (separate issue — see #4)
  • Visual polish beyond Phase 16 (separate issue — see #2)
  • Vision + executable-spec docs (separate issue — see #5; this issue references those rows)

Estimated effort

2-3 sessions:

  • Session A: Playwright bootstrap + seed rename + 3-4 baseline scenarios
  • Session B: cross-project + image-paste + responsive scenarios
  • Session C: CI integration + visual regression + docs + decision file

Success criterion

make e2e exits 0 on a clean checkout. Sanity check: intentionally introduce a regression (e.g., revert the Phase 17 WebSocket-reactive-dep fix) and confirm the suite goes red.

## Why Phase 16 (session 24) demonstrated that the SPA path (`_ui --dist` + socat TCP→UDS bridge) is fully drivable from a headless Chromium. This unlocks the missing top of the test pyramid: actual user-experience tests that exercise enrollment, ticket flows, presence, live updates, attachments, and cross-project isolation against the real wire stack — not just unit + integration tests against the JSON-RPC layer. Goal: **"OK it works"** confidence. Catch regressions like Phase 17's WebSocket-reactive-dep bug or session-23's Bootstrap-load-mechanism breakage at PR time, not in the user's hands. ## Sequencing — depends on #5 (Phase 18) This issue is **Phase 19** in the roadmap, gated by #5 (Phase 18: vision + e2e_checklist + architecture + testing docs). Reason: Phase 18 lands the row-by-row spec matrix (freezone-pattern), and every Playwright spec written here will reference a row in that checklist. Writing tests before the spec rows risks orphan tests with no clear acceptance criterion. ## Demo / test infrastructure (8083-8086 scheme) Per session-24's rebrand: | URL | Company | Role | Sign-in email | |---|---|---|---| | 8083 | Mycelium | support | `support@mycelium.com` | | 8084 | Mycelium | customer | `alice@example.com` | | 8085 | Hero | support | `support@hero.com` | | 8086 | Hero | customer | `bob@example.com` | Each port = separate origin = isolated localStorage = independent logged-in identity (drivable from a single Chromium with multiple BrowserContexts). Add `carol@example.com` / `dave@example.com` per project for two-customer presence-comparison tests. ## Deliverables 1. **Seed simplification** — port the rebrand from session-24's stopgap `rebrand.py` into `crates/hero_assistance_server/examples/phase13_seed.rs`. Rename `acme` profile → `mycelium` and `globex` → `hero`. Update workspace names + user emails to the simplified scheme. 2. **Demo orchestrator** — extend `scripts/phase13_demo.sh` to also start the 2 `_ui --dist` instances + 4 socat bridges; new `--browser` flag (default `--app` for backward compat). 3. **Test framework: Playwright** (recommended over hero_browser MCP — parallel drive is more reliable; supports BrowserContext-per-origin natively; built-in trace/video). Layer at `tests/e2e/` with `playwright.config.ts` + per-scenario `.spec.ts` files. 4. **Phase 1 scenarios:** - Enrollment per role (support + customer) — sign-in flow; `localStorage["hero_assistance.identity"]` persists; F5 keeps you signed in - Single-user ticket lifecycle — file → comment → optimistic insert reconciled with server row - Two-user real-time — customer + support both view the same ticket; presence shows in "Viewing now" sidebar; back-and-forth comments arrive live without refresh - Cross-project isolation — Brian (Hero) posts a comment; Alice (Mycelium) does NOT see it - Image attachment — synthesize a small PNG via `Uint8Array` → `File` → fire programmatic `paste` event; verify upload + inline thumbnail render - Modal flow — "New ticket" Modal opens, submits, dismisses on success - Error states — wrong magic-link token shows `-32002` "Request a new link" affordance - Responsive — 375 / 768 / 1920 viewports; no horizontal overflow 5. **CI integration** — `make e2e` target; Forgejo workflow runs E2E on every PR (or nightly if too slow). 6. **Visual regression** — pixelmatch screenshot comparison, `<1%` perceptual diff threshold per D-09. Replaces the manual baseline-recapture step in Phase 16b. 7. **Documentation** — `docs/dev/testing.md` (test-pyramid layering); `docs/dev/demo.md` (manual exploration runbook for the 4-port browser scheme). 8. **New decision file** `D-XX-e2e-test-architecture.md` — Playwright + multi-context-per-origin pattern; the seed-rebrand convention; visual-regression policy. ## Out of scope - Auth split (separate issue — see #4) - Visual polish beyond Phase 16 (separate issue — see #2) - Vision + executable-spec docs (separate issue — see #5; this issue references those rows) ## Estimated effort 2-3 sessions: - Session A: Playwright bootstrap + seed rename + 3-4 baseline scenarios - Session B: cross-project + image-paste + responsive scenarios - Session C: CI integration + visual regression + docs + decision file ## Success criterion `make e2e` exits 0 on a clean checkout. Sanity check: intentionally introduce a regression (e.g., revert the Phase 17 WebSocket-reactive-dep fix) and confirm the suite goes red.
Author
Owner

v0.3.0 shipped — Phase 18c-release closes this issue.

Commit: 39d85e5 on development; tag v0.3.0 (annotated).

What landed in s29:

  1. Un-skipped tests/playwright/regression/enrol.spec.ts — three concrete fixes (none of which were the s28-theorised hydration race): log-grep path was relative to test cwd not repo root; Token-step submit button accessible name has a leading space from the icon glyph (regex /^sign in$/i/sign in/i); parsed.email assertion replaced with user_id > 0 + role.length > 0 since IdentityState::Authenticated.email is intentionally None post-enrolment per enrollment.rs:226. Also added a defensive clickAndWaitForRpc helper that gates click+RPC on page.waitForResponse with a method-name post-data filter and 5-attempt retry — hydration-race insurance even though hydration wasn't the root cause.

  2. Un-skipped tests/playwright/regression/breadcrumb.spec.ts — root cause was stale WASM dist (target/dx/hero_assistance_ui_wasm/release/web/public was Apr 30 18:50, predating s27's W8 Breadcrumb landing on May 1). The running _ui --dist served the prior ← Back to tickets ad-hoc link instead of the typed Breadcrumb component, so nav[aria-label="breadcrumb"] never matched. Resolved by make dist; spec body unchanged.

  3. Landed 6 L7 SPA baseline PNGs via scripts/capture-spa-baselines.sh against the running 4-port demo + hero_browser_server on :8884:

    • tests/baselines/spa-enroll-{375,768,1920}.png (EnrollmentModal)
    • tests/baselines/spa-home-{375,768,1920}.png (Anonymous CTA)
      Detail-page capture deferred per script comment — needs identity pre-seed that respects Dioxus 0.7 hydration order.
  4. Flipped 31 Browser?/MCP? cells in docs/dev/e2e_checklist.md Sections A/C/D/G/L with <test_file>:<test_name> evidence per §Process invariants (every on a Wired/Browser/MCP cell points at a specific test or capture). Plus 8 Wired? enhancements with tests/e2e_journey.sh:Phase<N> evidence on Section L methods. Plus 6 new rows in Section M (M-15..M-20) for the SPA captures with MCP via hero_browser MCP capture.

  5. Updated docs/dev/testing.md §2 layer counts: L5 9 active+2 skipped → 11 passing; L7 14 baselines → 20 baselines; total ~381 → ~389.

Test posture:

  • L5 Playwright regression: 11 passing (was 9 active + 2 skipped)
  • L6 Playwright adversarial: 7 passing (unchanged)
  • L7 Visual regression: 20 baselines (5 desktop manual + 9 askama-auto + 6 SPA-auto from s29)
  • L1 unit / L2 smoke / L3 integration / L4 E2E: unchanged (237 / 16 / 80 / 18)
  • 3 L-03 inherited failures unchanged; 1 phase10 transient flake unchanged; 0 ignored Playwright (was 2 in s28)

No new decisions; no new limitations. B.5 skip held (doc + spec polish; no wire/protocol/invariant change).

Closing — substrate is engineering-complete pre-customer at the test-pyramid layer.

**v0.3.0 shipped — Phase 18c-release closes this issue.** Commit: `39d85e5` on `development`; tag `v0.3.0` (annotated). **What landed in s29:** 1. **Un-skipped `tests/playwright/regression/enrol.spec.ts`** — three concrete fixes (none of which were the s28-theorised hydration race): log-grep path was relative to test cwd not repo root; Token-step submit button accessible name has a leading space from the icon glyph (regex `/^sign in$/i` → `/sign in/i`); `parsed.email` assertion replaced with `user_id > 0` + `role.length > 0` since `IdentityState::Authenticated.email` is intentionally `None` post-enrolment per `enrollment.rs:226`. Also added a defensive `clickAndWaitForRpc` helper that gates click+RPC on `page.waitForResponse` with a method-name post-data filter and 5-attempt retry — hydration-race insurance even though hydration wasn't the root cause. 2. **Un-skipped `tests/playwright/regression/breadcrumb.spec.ts`** — root cause was stale WASM dist (`target/dx/hero_assistance_ui_wasm/release/web/public` was Apr 30 18:50, predating s27's W8 Breadcrumb landing on May 1). The running `_ui --dist` served the prior `← Back to tickets` ad-hoc link instead of the typed `Breadcrumb` component, so `nav[aria-label="breadcrumb"]` never matched. Resolved by `make dist`; spec body unchanged. 3. **Landed 6 L7 SPA baseline PNGs** via `scripts/capture-spa-baselines.sh` against the running 4-port demo + `hero_browser_server` on `:8884`: - `tests/baselines/spa-enroll-{375,768,1920}.png` (EnrollmentModal) - `tests/baselines/spa-home-{375,768,1920}.png` (Anonymous CTA) Detail-page capture deferred per script comment — needs identity pre-seed that respects Dioxus 0.7 hydration order. 4. **Flipped 31 Browser?/MCP? cells** in `docs/dev/e2e_checklist.md` Sections A/C/D/G/L with `<test_file>:<test_name>` evidence per §Process invariants (every ✅ on a Wired/Browser/MCP cell points at a specific test or capture). Plus 8 Wired? enhancements with `tests/e2e_journey.sh:Phase<N>` evidence on Section L methods. Plus 6 new rows in Section M (M-15..M-20) for the SPA captures with MCP ✅ via `hero_browser` MCP capture. 5. **Updated `docs/dev/testing.md`** §2 layer counts: L5 9 active+2 skipped → **11 passing**; L7 14 baselines → **20 baselines**; total ~381 → ~389. **Test posture:** - L5 Playwright regression: **11 passing** (was 9 active + 2 skipped) - L6 Playwright adversarial: **7 passing** (unchanged) - L7 Visual regression: **20 baselines** (5 desktop manual + 9 askama-auto + 6 SPA-auto from s29) - L1 unit / L2 smoke / L3 integration / L4 E2E: unchanged (237 / 16 / 80 / 18) - 3 L-03 inherited failures unchanged; 1 phase10 transient flake unchanged; 0 ignored Playwright (was 2 in s28) **No new decisions; no new limitations.** B.5 skip held (doc + spec polish; no wire/protocol/invariant change). Closing — substrate is engineering-complete pre-customer at the test-pyramid layer.
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_assistance#3
No description provided.