[deployer/admin] Admin VM operability: daemon health on Nodes, logs tab, meta admin links, gated terminal #281
Labels
No labels
meeting-notes
meeting-sensitive
meeting-transcript
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/home#281
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Tonight a compute daemon on the admin VM lost its RPC socket while its process kept running. The deployer dashboard showed "0 of 0 nodes online, no dedicated nodes configured" (an empty fleet) instead of a backend failure, and the diagnosis plus the restart had to be done over SSH, even though every needed log line was already captured by hero_proc and the admin VM's own Cockpit could have restarted the daemon from the browser. The pieces exist in the platform; they are not wired into the admin surfaces. Scope:
Done means: all items live on the admin VM (https://hcockpit.gent01.qa.grid.tf/hero_tfgrid_deployer/admin/) and verified in the browser, then this issue closes.
Signed-by: mik-tf mik-tf@noreply.invalid
Four of five items are live on the admin VM (hero_os_tfgrid_deployer integration commits 8d172d3, ff3f35b, aa49126; binaries deployed and verified over the admin socket).
What changed: deployer.list_nodes now reports ok/error per chain daemon plus a daemons_unreachable count, and the Nodes page and overview strip render an unreachable daemon as a red warning with the daemon's last error and a link to the admin services page, instead of an empty fleet. Verified live by briefly stopping the qa compute daemon: the response showed qa ok=false with the socket error and daemons_unreachable=1, then 3 of 3 nodes after restart. The new Logs page (navbar) embeds the shared logs viewer against a read-only relay to the supervisor's log store with quick-pick source buttons; write methods are refused by the relay. Control has an "Admin VM services" tile opening the admin Cockpit services page. The Users page channel selector is now labeled "Default for new installs" and the subtitle says updates preserve each service's recorded channel.
Remaining: the gated hero_router terminal exposure, which needs a nod from the router and proxy owners before wiring.
Signed-by: mik-tf mik-tf@noreply.invalid