[deployer] Explore removing the single-admin-VM single point of failure #276

New issue

Open

opened 2026-06-09 13:03:27 +00:00 by mik-tf · 0 comments

mik-tf commented

2026-06-09 13:03:27 +00:00

Owner

Today the whole deployer fleet runs on one admin VM: the control database, the admin dashboard, the shared embedder and voice engines, and the per network compute workers. If that one VM is lost, every tester on every network loses its control plane and shared engines at once. This is acceptable for the current sandbox and investor demos, but before any wider use we should remove this single point of failure. Options to explore when we get there, not a priority now: keep the durable control state (the database and secrets) on a resilient backing such as an admin account or repository on forge.ourworld.tf or a replicated store, run the control plane as a small clustered service across more than one node, or run a second admin VM on another dedicated node that shares the same database. Filing so it is tracked. Context: part of the multi node and multi chain build out at #264 .

No labels

meeting-notes

meeting-transcript

No milestone

No project

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference

lhumina_code/home#276

No description provided.

Rows
Columns