Deployer admin: a tester whose web gateway domain was not recorded shows a blank Cockpit URL, cannot be repaired from the admin, and is served with no login #253
Labels
No labels
meeting-notes
meeting-transcript
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/home#253
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
When a tester VM is provisioned, the deployer creates its web gateway (the per-tester domain like name.gent01.qa.grid.tf) in the same step. The gateway can come up correctly on the grid, with the domain resolving and serving the tester cockpit, while the deployer still fails to record that domain back into its own database, because the grid call returned without the final domain string at that moment and the deployer never polls again for it. When that happens, three things follow. First, the admin page shows a blank Cockpit URL (just a dash) for that tester even though the URL actually works, so the operator has no link to copy. Second, there is no button anywhere in the admin to record or retry the gateway afterward, so the blank stays blank and the only recovery today is manual work on the server or a full destroy and reprovision. Third, and most important, the per-tester login protection is skipped during provisioning whenever the domain was not recorded, so that tester cockpit is reachable on the internet without the Forge sign in step that every other tester requires. We hit this with the weynandsandbox tester: its cockpit is live and serves the apps page with no login, while the admin shows no Cockpit URL for it. The fix is to make the deployer re-check and save the gateway domain once it is ready, add an admin action to record or retry the gateway for any running VM that has no domain yet, and create the per-tester login protection as part of that same retry so a tester can never end up reachable without it. We will know it is fixed when a tester that came up without a recorded domain can be repaired from the admin with one click, ending up with a working Cockpit URL and the login step enforced.
Signed-by: mik-tf mik-tf@noreply.invalid
Deployer admin: a tester with no web gateway shows a blank Cockpit URL and the gateway cannot be created from the adminto Deployer admin: a tester whose web gateway domain was not recorded shows a blank Cockpit URL, cannot be repaired from the admin, and is served with no loginDeployer side fixed on main (commit 6641a6a). The install step now self-heals and fails closed: if a VM has no web gateway domain recorded, install re-runs the gateway setup, creates the per-tester OAuth app, and then refuses to finish unless a full login gate is in place, so a tester can no longer be published reachable without sign-in. The admin shows a "Set up gateway & install" button for a ready VM with no Cockpit URL, so an affected tester is repaired with one click instead of a destroy and reprovision. CI is republishing the latest build. The remaining piece is the underlying daemon behavior (the gateway create returning an error after the gateway is already live, and no way to look up an existing gateway), filed at lhumina_code/hero_compute#133. Next: update the deployer build on the admin VM and verify the full add, provision, install flow on a fresh tester.
Signed-by: mik-tf mik-tf@noreply.invalid
Verified end to end, deployed and live. The deployer fix is running on the admin server (updated and restarted). The weynandsandbox tester was repaired in place without deleting it: its gateway domain was recorded, the install self-heal created its per-tester OAuth app, and the reinstall pushed the login protection to the tester. It now redirects to the Forge sign in with its own per-tester OAuth client (matching the other testers) instead of serving the cockpit unauthenticated, and the admin now shows its Cockpit URL. One residual: this tester's gateway id is still unknown to the deployer (the gateway was created on the grid but never reported back), so a future delete will not reap that gateway automatically until the daemon adds a gateway lookup (lhumina_code/hero_compute#133). New testers are protected from this whole class by the fail closed install.
Signed-by: mik-tf mik-tf@noreply.invalid