Expand tfgrid deployer os to more than 1 dedicated node #264
Labels
No labels
meeting-notes
meeting-transcript
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/home#264
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Notes
Plan agreed. The compute layer already supports running many dedicated nodes and even renting new ones, so this work is about surfacing that in the admin dashboard. The dashboard will coordinate more than one node, show total free capacity across the fleet, and when setting up a tester it will auto-pick the node with the most room while still letting you choose a specific node from a dropdown (which also removes the old confusing free-text node field).
Because everything talks over the mycelium overlay, the fleet can span more than one network at once (for example keep the current node and add one on the main network), with the dashboard coordinating one controller per network behind the scenes. Farms are not all equally reliable, so the default prefers FreeFarm and a small set of trusted farms, and any other farm is confirmed and health-checked before we rely on it.
Delivery is in two steps: first see and use multiple nodes plus a nicer dashboard, then add a dedicated node from the dashboard with farm choice and an automatic prompt when onboarding hits no free capacity. The current test farm is out of rentable dedicated capacity, so the first live proof will be on the main network while the existing setup stays untouched.
Signed-by: mik-tf mik-tf@noreply.invalid
Multi-node provisioning is now live on the admin machine and proven end to end. The deployer aggregates the dedicated nodes managed by a fleet of compute controllers (one per network); the dashboard gained a Nodes page and a node picker on the Add form (auto by default, or pick a specific node), reading live capacity so an operator can never choose a node that does not exist. It deployed with no disruption to the existing testers, and a test user was provisioned onto a second node and then removed, with both the create and the delete routing to the right node. A useful find while testing: the admin machine already manages a second dedicated node on the same free test farm that the old single-node screen was hiding, with room for about nine more testers, so there is spare capacity right now. Running the fleet across more than one network at once (keep the current node, add one on the main network) is built and ready, but was deliberately left for an attended run since it involves a main-network rental and a second controller. Self-service add-a-node from the dashboard is tracked at #274 .
Signed-by: mik-tf mik-tf@noreply.invalid
Multi chain proven live from a single admin VM. A second compute daemon bound to mainnet now runs co located on the admin VM (no second admin VM): it reads its own wallet and network from a separate config context, and its node store is scoped per chain so each daemon only lists nodes on its own network. A mainnet node was rented (JimboTFT), the deployer fleet aggregates QA and mainnet with real capacity, and a tester VM was deployed and is running on the mainnet node end to end. Remaining gap is only the public web address: the freshly provisioned mainnet VM is not yet reachable over the mycelium overlay, so the gateway cannot set up the public URL. That is the known mainnet route propagation delay already reported to ThreeFold, not a deployer issue; the VM is left running for them to investigate. Code is on main in hero_compute and hero_os_tfgrid_deployer.