[META] Hero OS demo Phase 2 — Forge SSO + complete e2e UX #237
Labels
No labels
meeting-notes
meeting-transcript
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/home#237
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Post-closure clarification, 2026-05-27: user-tested SSO confirms the perimeter is correct: cockpit and deployer paths on
https://hcockpit.gent01.qa.grid.tfare restricted until Forge login atforge.ourworld.tf. This issue remains closed for the Phase 2 auth substrate: Forge SSO, admin allowlist, and persisted OAuth-token use. The screenshot after login still shows the Hero Cockpit Admin scaffold, so full admin/tester UX completion is now tracked separately in home#238.Session 166 closed on 2026-05-27: fixed hero_proxy#56 gateway-listener env seeding and recorded the post-SSO admin/tester UX audit in e2e_checklist.md; next work is deployer admin provisioning UX hardening.
Session 165 closed Phase 2 on 2026-05-26: Forge SSO, persisted OAuth token use, cockpit repo widget, and admin allowlist denial were live-verified on the public QA admin VM; follow-up issues are linked from the closure comment.
Hero OS demo Phase 2 - v1 Hero tester environment
Session 163.5 (planning refresh, 2026-05-26) finalized the closure plan for session 164. Admin VM
0062was torn down, QA node 5 freshly re-rented, dual-admin design adopted (mik-tf + scott inADMIN_FORGE_USERSand on the admin VM SSH authorized_keys). Full s164 plan is in the latest comment on this issue (closure-plan comment id 37068).Session 160 opened this issue (2026-05-26) after closing home#235 as Phase 1 shipped. Phase 2 session 1 closed at session 161 as an investigation that surfaced the OAuth implementation gap. Implementation began at session 162 (substrate in hero_proxy, D-31). Session 163 shipped the cockpit consumer side (D-32 + L-10, hero_proxy@1d54373 + hero_cockpit@386c412, full pre-merge gate green on both repos). Session 164 redeploys admin VM 0062 + live-verifies the full SSO walk + closes this arc.
Phase 1 (home#235) shipped the substrate. Public URL live at https://hcockpit.gent01.qa.grid.tf/, admin VM provisions tester VMs end-to-end on TFGrid, cockpit and services management work, deployer admin UI for users and VMs is committed.
Phase 2 closes the loop into the v1 Hero tester environment: a polished self-service flow where a tester can walk top to bottom without operator hand-holding, gated by their existing forge.ourworld.tf account.
Executive summary
Today the demo works only with someone guiding the tester. Phase 2 removes the hand-holding by gating both admin and user surfaces behind Forge SSO (same login as forge.ourworld.tf), with the admin allowlist controlling who can act as an operator. End-state: an operator types a username, delivers the temp password to the tester out of band (Telegram is fine for v1), the tester signs into Forge once, comes to the cockpit URL, authorizes Hero Cockpit on the standard OAuth consent screen, and lands on their personal Hero OS environment where they can use Books, Slides, and Planner. No passwords pasted into cockpit, no admin URL exposed to outsiders, no email-sender service required.
The v1 walk (the complete e2e UX)
What a non-engineer should be able to walk top to bottom:
What changes vs Phase 1
The OAuth integration shape
We use the standard Forgejo OAuth 2.0 flow with PKCE-S256. On consent, Forge issues both an access_token (short-lived) and a refresh_token (long-lived). The cockpit persists both in a per-user secret slot and refreshes the access_token automatically when it expires. All downstream Forge API calls the cockpit makes on the user'''s behalf (writing workspace data back to Forge, syncing books, posting feedback) use this token. The user never has to manually generate or paste a personal access token. Token rotation is invisible.
Where the OAuth implementation lives.
hero_proxyalready ships a complete OAuth 2.0 + OIDC implementation (PKCE-S256, CSRF state store with TTL sweep, well-known provider URL presets including Forge, session cookies with the __Host- prefix, database-backed provider config, callback handler, claims-based per-method authorization). Phase 2 uses this existing implementation directly. Forge is registered as a provider via the existingoauth.set_providerRPC. Backend admin panels and cockpit consume identity fromhero_proxyvia the canonical injected-claims headers (X-Hero-User,X-Hero-Claims,X-Hero-Context) per the workspace authorization model. The only net-new behaviour Phase 2 adds is (a) persisting the OAuth access plus refresh tokens to a per-user secret slot at callback time (today the existing flow drops the upstream token after building the session), and (b) the admin allowlist gating layer that reads from a secret slot. No new shared OAuth library is introduced; the workspace already has the OAuth substrate in the right place.Implementation plan (4 sessions)
hero_proxy(oauth + oidc modules, ~750 LOC, production-complete with PKCE-S256 plus OIDC plus Forge preset plus DB-backed provider config plus session cookies). No new shared library is needed. The session-161 standalone crate experiment was archived as off-pattern relative to the canonical workspace shared-library structure. Phase 2 implementation begins at session 162.hero_proxyusing the existing OAuth implementation: seed Forge as a provider on startup fromcockpit/FORGE_OAUTH_CLIENT_IDandcockpit/FORGE_OAUTH_CLIENT_SECRETsecret slots, extend the existing callback handler to persist access plus refresh tokens to per-user secret slots, add an admin allowlist middleware readingdeployer/ADMIN_FORGE_USERS, inject identity headers (X-Hero-User,X-Hero-Claims) downstream to cockpit and admin handlers.Total: 3 implementation sessions remaining (162, 163, 164), roughly 7 to 10 hours focused work.
The current admin VM at hcockpit.gent01.qa.grid.tf continues to serve the placeholder admin scaffold (no real admin UI exposed) until session 4 redeploys it. The session 160 admin UI commits are on the development branch but intentionally not deployed yet, because deploying them today would expose privileged actions to the open internet.
Existing substrate to reuse
hero_proxy/crates/hero_proxy_server/src/oauth.rs(458 LOC) plusoidc.rs(278 LOC) plusauth.rsplusauthz.rs. PKCE-S256 with state-store TTL sweep, OIDC id_token validation, Forge preset inknown_provider_urls, callback handler atGET /oauth/callback, session cookie utilities, per-method claims authorization. Phase 2 extends this in-place. A parallel inlined implementation lives inhero_onboarding/crates/hero_onboarding_server/src/forge_oauth.rs(~542 LOC, locked under hero_onboarding'''s own dual-auth design); both implementations stay; Phase 2 does not consolidate them since the Phase 2 consumer ishero_proxy, not hero_onboarding.hero_routerinjectsX-Hero-Claims,X-Hero-Context,X-Hero-Userheaders; downstream services do authorization against the injected claims and do not authenticate users themselves. Cockpit and admin handlers receive identity this way at runtime.Apps in scope for v1
User-facing apps available from the cockpit on a v1 tester VM:
That alone is a meaningful start for the tester walk. The cockpit also surfaces services management, settings, and the per-service manual.
Out of scope for v1 (filed as separate sub-trackers when scoped)
These are real gaps in the executable checklist but each has its own substrate and wants its own arc framing:
Acceptance criteria
Cross-links
Session 162 closed — Phase 2 session 2 shipped
TL;DR: hero_proxy now seeds the Forge OAuth provider on boot from the operator-set secrets, persists per-IdP-identity OAuth tokens to per-user secret slots, and ships an admin allowlist module ready for s163 wiring. The session-161 standalone
hero_forge_oauthcrate was archived as off-pattern relative to the canonical Hero shared-library structure (3 repos: hero_lib + hero_rpc + hero_proc). OAuth implementation lives only in hero_proxy from now on; consumers receive identity via the canonical injected headers.Commits shipped on origin/development
forge_token_store.rs,admin_allowlist.rs,forge_oauth_seed.rs. Extensions tooauth.rs(token capture insideexchange_code_for_identity) andproxy.rs(callback handler spawns token persistence). 5 new integration tests + 9 new unit tests pass. Drive-by repairs two pre-existing parse/clippy failures from the1adb9farefactor on origin/development.forge-oauth-setup.mdrewritten to referencehero_proxy's existing OAuth surface (redirect URI is/oauth/callback); runtime secret slots renamed to the per-IdP-identity key shape.e2e_checklist.mdflips B-5 (cockpit URL OAuth gate) and CC-4 (single auth identity across cockpit-routed services) from Blocked to Have.Architecture decision
D-31 locked: OAuth substrate stays in
hero_proxyonly; no shared OAuth library. Cockpit and deployer_admin consume identity via injectedX-Hero-User+X-Hero-Claims+X-Hero-Contextheaders per the canonical authorization model. On-behalf-of API tokens persist in per-IdP-identity hero_proc secret slots keyed on(provider, sha256(external_id)[..16])— never on any name field. The local-username namespace is strictly broader than the IdP namespace because of the collision-resolution suffix ladder, so name-keyed token slots would risk letting a deleted-and-recreated local user inherit a previous tenant's still-valid tokens.Pre-merge gate
cargo fmt --checkpluscargo clippy --workspace --all-targets -- -D warningspluscargo build --workspace --releaseall clean. 49 of 49 hero_proxy_server library unit tests pass. 5 of 5 new phase2_oauth integration tests pass. The 6 pre-existing failures on the integration suite reproduce unchanged on origin/development; they are not regressions from this session.Operator action queued before s164
Register the Hero Cockpit OAuth application at forge.ourworld.tf admin for hostname
hcockpit.gent01.qa.grid.tfwith redirect URIhttps://hcockpit.gent01.qa.grid.tf/oauth/callback. Pushclient_idandclient_secretinto the admin VM viahero_proc secret set --context cockpit FORGE_OAUTH_CLIENT_ID/SECRET. Set the admin allowlist viahero_proc secret set --context deployer ADMIN_FORGE_USERS "username1,username2". Full step-by-step inlhumina_code/home/docs/channels/free/forge-oauth-setup.md.Next session
s163: cockpit drops the paste-token onboarding flow in favor of the SSO-injected identity headers. Switches downstream Forge API calls from the legacy
cockpit/USER_FORGE_TOKENslot to the new per-IdP-identity OAuth slots, with refresh-on-expiry logic inlined in cockpit. Estimated 2-3 hours. After that, s164 redeploys the admin VM with the full v1 stack and live-verifies the complete e2e walk top to bottom.Session 163 complete (2026-05-26)
Phase 2 consumer side shipped. Two squash-merges on origin/development:
inject_authenticated_identity(proxy.rs:98) now emitsX-Hero-Provider+X-Hero-External-IdalongsideX-Hero-Userfor OIDC-provisioned users. 2 new unit tests; existingstrip_proxy_headerscovers the new headers viax-hero-*prefix. Locked by D-32.forge_session.rsextractor + newforge_api_client.rson-behalf-of API client (slot read via hero_proc_sdk, refresh-on-expiry with 60s leeway, process-global tokio Mutex for refresh race, bearer_for orchestration). Root/redirect is now SSO-aware./welcomeredirects SSO users to/;POST /welcome/save-tokenreturns HTTP 409 under SSO (closes the cross-tenant write surface flagged in Phase B.5). Drive-by clippy + fmt repairs in hero_cockpit_server mirroring the s162 pattern.Pre-merge gate green on both repos: fmt + workspace clippy
-D warnings+ workspace release build + 51/51 hero_proxy_server lib + 5/5 phase2_oauth integration + 12/12 hero_cockpit_web lib tests all pass.Decisions and limitations:
users.external_idcolumn so it matches the SHA-256 hash forge_token_store uses on the substrate side.cockpit/USER_FORGE_TOKENstays single-tenant for the headless/scripted-user fallback path; SSO users are gated out at the handler layer.State at /stop:
Session 164 (NEXT, ~2-3h, arc closure):
hcockpit.gent01.qa.grid.tf.cockpit/FORGE_OAUTH_CLIENT_ID+cockpit/FORGE_OAUTH_CLIENT_SECRET+cockpit/FORGE_OAUTH_ISSUERon admin VM 0062.lab build --download --install.Estimated remaining: ~2-3 hours.
Signed-by: mik-tf mik-tf@noreply.invalid
Phase 2 closure plan — clean-slate redeploy with dual-admin live-verify
Planning refresh between session 163 and session 164. The remaining work is operator-runbook + a live-verification walk. No more code to write on the substrate side.
What changed since the issue body was written
hcockpit.gent01.qa.grid.tfon session-158 binaries) has been torn down. All old TFGrid contracts on QA twin 703 were cancelled, the dedicated QA node was unreserved and freshly re-rented under a new RentContract. A throwaway test VM was provisioned on the re-rented node to confirm the operator workstation SSH key actually lands via the deployer flow, then destroyed. Public URL is intentionally offline until session 164.ADMIN_FORGE_USERSallowlist AND on the admin VM's root SSH authorized_keys. This exercises the multi-admin path that the original runbook glossed over.Session 164 plan (about 2.5 to 3 hours)
Stage 1, provision the new admin VM via the deployer flow (about 30 min). Mint a fresh Forge user, upload the first admin operator's SSH pubkey to it, then
deployer.provision_vmon the re-rented QA node. SSH in with the operator key, append the second admin's pubkey to authorized_keys. The first admin's key lands via the documented flow (this proves the runbook); the second admin's key is appended manually post-provision (call out as the "second operator SSH key extension step" in the runbook).Stage 2, install + bring up services with the session-162 and session-163 binaries (about 25 min). Standard rebuild recipe from the topology memory: scp binaries from workstation, set 9 baseline secrets, add the IPv6 hero_proxy listener, deploy the web gateway with backend on the new VM. Confirm
https://hcockpit.gent01.qa.grid.tf/returns 200.Stage 3, register the Forge OAuth app + set Phase 2 secrets + ship one demo surface + dual-admin live-verify (about 75 to 90 min). Register the OAuth app for the callback URL, set
cockpit/FORGE_OAUTH_CLIENT_ID+cockpit/FORGE_OAUTH_CLIENT_SECRET, setdeployer/ADMIN_FORGE_USERS=mik-tf,scott. Restart services, confirm theforge_oauth_seedran. Ship ONE minimal on-behalf-of API call site ("List my Forge repos" widget on the cockpit landing, about 80 LOC, one commit) so acceptance criterion 5 is actually live-verifiable. Walk the e2e flow as admin #1 in an incognito browser. Ping admin #2 to walk the same flow from his side. Per-admin token isolation gets cross-validated by counting the OAuth secret slots after both walks (6 slots expected: 3 per admin). Run a negative test with a Forge account NOT in the allowlist to confirm the 403 path fires cleanly.Stage 4, close (about 15 min). Flip e2e_checklist rows B-37 / B-38 / B-39 to Have with both validators named in the audit-log line. Close this issue with a summary comment listing session-162 + session-163 + session-164 commit SHAs and the demo URL.
Risks not yet covered
gent01.qa.grid.tfhostname; if the gateway zone changes, the OAuth app needs re-registration. Likely tracked as L-11 if it surfaces at the live walk.ADMIN_FORGE_USERSmutations may require a hero_proxy restart to propagate (worth verifying at session 164 walk; if so, file follow-up issue for hot-reload).Acceptance criteria coverage at session 164 close
All six acceptance criteria from the issue body are addressed by the plan above. Criterion 7 (operator runbook documents the registration step + the allowlist secret + the temp-password delivery convention) gets exercised AS THE PROVISIONING PATH for the new admin VM, so the runbook ships alongside the deployment instead of being written after the fact.
Ready to execute at session 164 start.
Signed-by: mik-tf mik-tf@noreply.invalid
Phase 2 is complete. The public QA admin VM is live at https://hcockpit.gent01.qa.grid.tf/hero_cockpit/web/ and the Forge SSO flow has been verified end to end. The shipped work includes the Forge OAuth provider and per-user token persistence in hero_proxy 3eb1b57, downstream identity headers in hero_proxy 1d54373, enforced Forge admin allowlist in hero_proxy cb55f24, cockpit SSO identity consumption in hero_cockpit 386c412, the Forge identity-backed cockpit widget in hero_cockpit 38b759b, and the runbook plus checklist closure updates in home
2dd806f. Live validation: mik-tf and scott both signed in through Forge and landed in Hero Cockpit without the legacy paste-token prompt; the cockpit widget rendered after a persisted OAuth bearer call to Forge; the VM was cleaned to one valid OAuth token triple; a temporary Forge user outside deployer/ADMIN_FORGE_USERS was denied with a clean allowlist message and then deleted. Operational follow-ups were filed or updated: hero_compute#128 comment 37070 for deploy_vm name validation, hero_os_tfgrid_deployer#11 for the default image mismatch, hero_proxy#56 for the gateway listener seed row, hero_cockpit#9 for smoke-test diagnostics, and hero_codescalers#37 for the fresh-install startup failure. Closing this tracker.Clarifying the closure boundary after the live screenshot: Phase 2 is perfect for the SSO perimeter, because cockpit and deployer paths on
https://hcockpit.gent01.qa.grid.tfare restricted until Forge login. It is not the final tester-ready UX yet: the post-login admin screen is still scaffold-level. Continuing that productization work in home#238, with the e2e checklist as the acceptance gate.