deploy_webgateway + deploy_vm should validate workload names client-side (TFGrid rejects dashes silently at substrate layer) #128

Closed
opened 2026-05-25 16:17:49 +00:00 by mik-tf · 2 comments
Owner

TFGrid ZOS rejects workload names containing characters outside [a-zA-Z0-9] with backend error: name X is invalid: unsupported character in workload name. This error is silent at the GridError display layer and only visible in TFGRID_DEBUG=1 trace_step output.

Live repro at s158: deploy_webgateway(name=\"hero-os-s158\", ...) succeeded substrate-side (name + node contracts minted) but the ZOS workload push failed 8 retries with unsupported character in workload name. The daemon timed out at 300s and rolled back via the new D-27 path (hero_compute@8f7a2b7) cleanly cancelling both contracts. But: that 300s timeout is pure waste, and without TFGRID_DEBUG enabled the failure looks like a generic substrate timeout.

Fix: add an input-validation guard at the top of deploy_webgateway (and deploy_vm — same constraint likely applies) that rejects names with non-alphanumeric chars before any substrate work fires. Same shape as the existing gateway name must not be empty validation at crates/my_compute_zos_server/src/cloud/rpc.rs:1942-1947.

Bonus: surface the constraint to the cockpit deploy modal so end-users see the validation before clicking Provision.

TFGrid ZOS rejects workload names containing characters outside `[a-zA-Z0-9]` with `backend error: name X is invalid: unsupported character in workload name`. This error is silent at the GridError display layer and only visible in `TFGRID_DEBUG=1` trace_step output. Live repro at s158: `deploy_webgateway(name=\"hero-os-s158\", ...)` succeeded substrate-side (name + node contracts minted) but the ZOS workload push failed 8 retries with `unsupported character in workload name`. The daemon timed out at 300s and rolled back via the new D-27 path (hero_compute@8f7a2b7) cleanly cancelling both contracts. But: that 300s timeout is pure waste, and without TFGRID_DEBUG enabled the failure looks like a generic substrate timeout. Fix: add an input-validation guard at the top of `deploy_webgateway` (and `deploy_vm` — same constraint likely applies) that rejects names with non-alphanumeric chars before any substrate work fires. Same shape as the existing `gateway name must not be empty` validation at `crates/my_compute_zos_server/src/cloud/rpc.rs:1942-1947`. Bonus: surface the constraint to the cockpit deploy modal so end-users see the validation before clicking Provision.
Author
Owner

Closed by hero_compute 29988f6validate_webgateway_name() helper in crate::util (next to validate_vm_name) enforces lowercase + digits only, 1-63 chars. Called from deploy_webgateway at rpc.rs:1949 before any substrate writes; rejects with InvalidInput instead of the 300s D-27 rollback round-trip. 4 new unit tests + 16/16 integration tests pass. Direct-push squash did not auto-close.

Closed by [hero_compute `29988f6`](https://forge.ourworld.tf/lhumina_code/hero_compute/commit/29988f6) — `validate_webgateway_name()` helper in `crate::util` (next to `validate_vm_name`) enforces lowercase + digits only, 1-63 chars. Called from `deploy_webgateway` at `rpc.rs:1949` before any substrate writes; rejects with `InvalidInput` instead of the 300s D-27 rollback round-trip. 4 new unit tests + 16/16 integration tests pass. Direct-push squash did not auto-close.
Author
Owner

Live admin VM provisioning showed the same workload-name validation gap on deploy_vm that was previously fixed for deploy_webgateway. When the deployer let an auto-generated name with underscores reach compute, ZOS rejected it with the VM-name character rule after the request had already entered the deployment path. Please apply the same client-side fail-fast validation to deploy_vm names before any substrate write.

Live admin VM provisioning showed the same workload-name validation gap on deploy_vm that was previously fixed for deploy_webgateway. When the deployer let an auto-generated name with underscores reach compute, ZOS rejected it with the VM-name character rule after the request had already entered the deployment path. Please apply the same client-side fail-fast validation to deploy_vm names before any substrate write.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_compute#128
No description provided.