[urgent] Fresh tester installs failing: lab cannot start services against the current published hero_proc (404 /api/rpc) #268
Labels
No labels
meeting-notes
meeting-transcript
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/home#268
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
A fresh tester install is currently failing partway through because lab cannot start the services on the new machine. Every service start (including the proxy and the assistant) returns "HTTP 404 Not Found from /api/rpc" when lab talks to hero_proc, so the proxy never comes up, the tester site returns a gateway error instead of the login page, and the install is marked failed. This started today: a tester installed this morning came up fine, but one installed this afternoon fails this way, so a recently published build of lab and/or hero_proc are now incompatible on the management API path. The effect is that onboarding any new tester is blocked right now (it also blocked verifying an unrelated install reliability fix this session). The resolution is to reconcile the published builds so lab and hero_proc agree on the management API again, by pinning or republishing a compatible pair. This is a concrete and urgent instance of the build from main health problem tracked at #240 .
Signed-by: mik-tf mik-tf@noreply.invalid
Root cause and fix.
The rolling
latestrelease in several repositories was being published from the development branch instead of main. A fresh install therefore pulled in-progress development builds of the build tool (lab) and the process supervisor (hero_proc) that no longer agreed on the management API, first on the request path and then on the method names, which is why every service start returned a 404 and later a method-not-found error and the proxy never came up.The fix is to make every repository publish
latestfrom main and a separatelatest-devprerelease from development, and to republish lab, hero_proc, and hero_router latest from main. With that in place a fresh tester install now completes and the cockpit returns its redirect to single sign-on. The same main-publish rule is being applied across the remaining repositories so the whole sandbox runs on stable main builds (related to #240).Verified by a fresh throwaway install that reached ready state and served the login gate. Closing as resolved.
Signed-by: mik-tf mik-tf@noreply.invalid