[deployer] Explore removing the single-admin-VM single point of failure #276

Open
opened 2026-06-09 13:03:27 +00:00 by mik-tf · 0 comments
Owner

Today the whole deployer fleet runs on one admin VM: the control database, the admin dashboard, the shared embedder and voice engines, and the per network compute workers. If that one VM is lost, every tester on every network loses its control plane and shared engines at once. This is acceptable for the current sandbox and investor demos, but before any wider use we should remove this single point of failure. Options to explore when we get there, not a priority now: keep the durable control state (the database and secrets) on a resilient backing such as an admin account or repository on forge.ourworld.tf or a replicated store, run the control plane as a small clustered service across more than one node, or run a second admin VM on another dedicated node that shares the same database. Filing so it is tracked. Context: part of the multi node and multi chain build out at #264 .

Today the whole deployer fleet runs on one admin VM: the control database, the admin dashboard, the shared embedder and voice engines, and the per network compute workers. If that one VM is lost, every tester on every network loses its control plane and shared engines at once. This is acceptable for the current sandbox and investor demos, but before any wider use we should remove this single point of failure. Options to explore when we get there, not a priority now: keep the durable control state (the database and secrets) on a resilient backing such as an admin account or repository on forge.ourworld.tf or a replicated store, run the control plane as a small clustered service across more than one node, or run a second admin VM on another dedicated node that shares the same database. Filing so it is tracked. Context: part of the multi node and multi chain build out at https://forge.ourworld.tf/lhumina_code/home/issues/264 .
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/home#276
No description provided.