Shared (non dedicated) node support for tester VMs: badge, per network placement policy, drift detection, orphan contract reaper #24
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The deployer already places tester VMs on shared grid nodes in practice (QA node 2 hosts two live testers today with no rent contract, since shared nodes accept per deployment contracts from anyone), but the UI and the placement model assume every managed node is a dedicated rented node and the Nodes page labels everything as dedicated. Proposal: show a Dedicated or Shared badge per node from the live rent status, add a per network placement policy (shared allowed on QA and testnet, mainnet dedicated only unless a shared node is explicitly opted in, because shared mainnet deployments spend real TFT per deployment and their capacity can be taken by other users at any time), and wire the existing Find Nodes search so adding a found node registers it as shared.
Two maintenance features belong with this: the Nodes page should flag registry entries that drift from the chain (for example a node we no longer rent), and the deployer should periodically diff its twin's on chain contracts against the VMs and gateways it tracks and offer to cancel orphans. While auditing this we found about twenty orphaned gateway and name contracts from old provisioning attempts billing the ops wallet on mainnet; they were cancelled manually today.
Not scheduled for now. Current focus is proving the tester onboarding flow end to end on the two node setup (QA node 5 and mainnet node 8072).
Bumping this with findings from the zos-light work. This is no longer just a UI and maintenance item, it is the unlock for using light nodes at all. Across mainnet and QA there is exactly one alive zos-light node per network (mainnet 8072, QA 62) and both are non-dedicated; every dedicated zos-light node on the grid is registered as rentable but is actually dead (offline for hours to months). So we cannot prove or use light deployment on real hardware without shared-node support, unless a farmer brings a dedicated light node online, which we do not control (a rented standby node only wakes if the farm runs a healthy farmerbot, and the one we tried never woke). One requirement to add to the placement work here: filter candidate nodes by liveness (healthy and a recent last-seen timestamp), not by the rentable or status flags. A dead node still advertises rentable=true; we rented one this week that had not been seen in about seven months and it never came online. Good news for scope: the daemon already deploys on a shared node without a rent contract (confirmed again this week by registering the live shared node and getting a real on-chain deployment), so the gap is the surrounding model (shared-aware capacity from live free resources rather than a fixed exclusive catalog, skip the rent step, contention handling) and not the deploy itself. Doing this also gives us a live light node to finish debugging the light deploy, which currently times out waiting for the network workload, see lhumina_code/hero_compute#135 . The generation-aware selection layer for that already landed on hero_compute integration.
Plan locked with the operator. Sequenced across a few sessions so each one lands something proven.
This session: shared node support, proven on standard (classic) zos where deploys already work, kept separate from the light deploy work.
rentableorstatusflags. A dead node still advertisesrentable=true; we recently rented one that never woke, so liveness is the load bearing rule.Also folded in from this issue: the Dedicated or Shared badge, per network placement policy, registry drift detection, and the orphan contract reaper.
Next session: finish the zos light deploy. With shared support giving a stable live light node to test against, we diff our light workload against what the dashboard submits and add path and timing logging plus a clear deploy error message map. Tracked at lhumina_code/hero_compute#135 .
Later: pre warm pool so a node already has Hero and a VM deployed and onboarding only sets the user login and gateway name, which makes onboarding much faster and moves the fresh VM network lag off the critical path. The gateway name resolution built this session is kept in one place so the pre warm flow can reuse it unchanged. Tracked at lhumina_code/home#266 .
Signed-by: mik-tf mik-tf@noreply.invalid