feat: Upgrade-aware install script — preserve running state across binary updates #77
Labels
No labels
prio_critical
prio_low
type_bug
type_contact
type_issue
type_lead
type_question
type_story
type_task
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/hero_compute#77
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
The install script replaces binaries and restarts hero_proc/mycelium, but does not gracefully handle an already-running hero_compute deployment. After upgrading:
hero_compute --start --mode <mode>~/hero/var/compute/data/) still has the node record and all VMsNo data is lost (the sled database survives), but the user experience is confusing and error-prone.
Objective
Make
install.shseamlessly upgrade a running deployment — stop gracefully, replace binaries, restart with the same configuration.Current Behavior
hero_compute --start"Desired Behavior
~/hero/var/hero_compute.envto capture current mode, master IP, and other confighero_compute --stop(notpkill)Implementation Plan
1. Pre-upgrade state capture
Before stopping anything, read
~/hero/var/hero_compute.envand extract:HERO_COMPUTE_UI_MODE→ the running mode (local/master/worker)MASTER_IP→ master IP if worker modeEXPLORER_ADDRESSES→ explorer configStore these in shell variables for post-upgrade restart.
2. Graceful stop
Replace raw
pkillwith:This tells hero_proc to stop all hero_compute services cleanly, giving VMs time to shut down properly.
3. Auto-restart with saved config
After binary replacement:
4. Post-upgrade health check
After restart, verify:
~/hero/var/sockets/hero_compute/rpc.sockcurl -s http://localhost:9001/health5. Skip-start flag
Add
--no-startflag for cases where the user wants to upgrade binaries without restarting:Edge Cases
--startis passedhero_compute --startRelated
Acceptance Criteria
install.shon a live master node upgrades binaries and restarts in master mode automaticallyinstall.shon a live worker node restarts with the correct master IPhero_compute --startneeded after upgrade--no-startflag available to skip auto-restart