VM deploy: SSH login fails because /root, /root/.ssh, and authorized_keys are owned by ubuntu instead of root #100
Labels
No labels
prio_critical
prio_low
type_bug
type_contact
type_issue
type_lead
type_question
type_story
type_task
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/hero_compute#100
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
After deploying a VM via
hero_compute, SSH intoroot@<vm-ip>with a provisioned key fails withPermission denied (publickey). Root cause is that the VM's/roottree (including/root/.ssh/authorized_keys) is owned by theubuntuuser/group instead ofroot, sosshdrefuses to read the keys when logging in asroot.Reproduction
hero_compute.ssh root@<vm-ip>with the SSH key that was injected during provisioning.Observed permissions inside the VM
Modes are correct (
700/600) but the owner/group isubuntu:ubuntu. sshd'sStrictModescheck requires/root/.sshandauthorized_keysto be owned byrootwhen logging in asroot.ssh -v output
Expected
After deployment,
/root,/root/.ssh, and/root/.ssh/authorized_keysshould be owned byroot:rootso thatssh root@<vm-ip>using the injected key works out of the box.Likely cause
The cloud-init / provisioning step that writes the
rootuser'sauthorized_keysis running as theubuntuuser (or mounts the image as that UID), leaving/rootand its contents owned byubuntu:ubuntu.Workaround
Inside the VM (via console or as
ubuntu+ sudo):Suggested fix
In the VM provisioning path in
hero_compute_server(cloud / cloud-init setup), ensure that when we inject the root SSH key we alsochown root:root /root /root/.ssh /root/.ssh/authorized_keys(or run the write step as root), and that no part of the pipeline leaves/rootowned byubuntu.Implementation Spec for Issue #100
Objective
Ensure that after
deploy_vmand every subsequent SSH key injection, the VM's/root,/root/.ssh, and/root/.ssh/authorized_keysare owned byroot:root, sosshdaccepts root logins with the provisioned keys (currently they are owned byubuntu:ubuntu, causingPermission denied (publickey)undersshd'sStrictModes).Requirements
deploy_vmcompletes successfully,ssh root@<vm-ip>must succeed for every key invm.ssh_keys./rootmust be mode0700and owned byroot:root./root/.sshmust be mode0700and owned byroot:root./root/.ssh/authorized_keysmust be mode0600and owned byroot:root.inject_ssh_keysRPC (on-demand injection on a running VM).is_valid_ssh_key.authorized_keysso disabled keys are actually removed.Root Cause Summary
inject_ssh_keys_to_vmincrates/hero_compute_server/src/cloud/rpc.rsruns a shell script viaHypervisorDriver::vm_exec. In practice, inside the Ubuntu cloud image the exec channel runs under the defaultubuntuuser (UID 1000). When themkdir/catruns asubuntu, the resulting files end up owned byubuntu:ubuntu. sshd'sStrictModesthen refuses the keys when logging in asroot.The correct fix is to (a) invoke the script with root privilege inside the guest via
sudo -nand (b) explicitlychown -R root:root /rootafter writing.Files to Modify/Create
crates/hero_compute_server/src/cloud/rpc.rs- Updateinject_ssh_keys_to_vmto build the authorized_keys script via root (sudo inside the guest) and append achown -R root:root /rootstep; also updateset_vm_hostnamefor consistency.crates/hero_compute_server/src/cloud/tests.rs- Add unit tests that pin the shape of the generated injection script.Implementation Plan
Step 1: Rewrite
inject_ssh_keys_to_vmso the guest-side script runs as root and fixes ownershipFiles:
crates/hero_compute_server/src/cloud/rpc.rsbuild_inject_ssh_keys_script(keys_content: &str) -> String./root/.sshwith mode 700.<< 'HEROKEYS'.chmod 600 /root/.ssh/authorized_keys.chown -R root:root /root/.ssh /root.mkdir -p /run/sshd.sudo -n sh -c '<body>'with a fallback to plainsh -c '<body>'for images lacking sudo.Dependencies: none
Step 2: Mirror the same root-privilege wrapping in
set_vm_hostnameFiles:
crates/hero_compute_server/src/cloud/rpc.rs(sudo -n sh -c '...') || sh -c '...'.safehostname is already filtered to[a-zA-Z0-9.-]so splicing is safe.Dependencies: Step 1
Step 3: Add unit tests for the generated script shape
Files:
crates/hero_compute_server/src/cloud/tests.rstest_build_inject_ssh_keys_script_contains_chown_root_root- assertschown -R root:root /root.test_build_inject_ssh_keys_script_writes_mode_600- assertschmod 600 /root/.ssh/authorized_keys.test_build_inject_ssh_keys_script_writes_mode_700- assertschmod 700 /root/.ssh.test_build_inject_ssh_keys_script_uses_quoted_heredoc- asserts<< 'HEROKEYS'.test_build_inject_ssh_keys_script_embeds_keys_verbatim- asserts the supplied key appears verbatim.pub(crate) modpattern mirroringhypervisor_probe.Dependencies: Step 1
Step 4: Manual verification plan (no code change)
cargo build --features cloud -p hero_compute_server.cargo test --features cloud -p hero_compute_server.ssh root@<mycelium-ip>should succeed; verifystat -c '%U:%G %n' /root /root/.ssh /root/.ssh/authorized_keysreturnsroot:rooton all three.inject_ssh_keyson a live VM; ownership must remainroot:root.Dependencies: Steps 1-3
Acceptance Criteria
stat -c '%U:%G' /rootinside a newly-deployed VM returnsroot:root.stat -c '%U:%G' /root/.sshreturnsroot:root, mode700.stat -c '%U:%G' /root/.ssh/authorized_keysreturnsroot:root, mode600.ssh root@<vm-ip>with any of the provisioned keys succeeds.inject_ssh_keysRPC on a live VM still yieldsroot:rootownership.cargo test --features cloud -p hero_compute_serverpasses including the new tests.my_hypervisordoas/chown-back behavior introduced on thefix_ssh_permbranch.Notes
fix_ssh_permalready has a host-side permission fix (my_hypervisor state dir ownership). This issue is a separate, guest-side fix; do not revert that work.sudomay be absent on minimal images; the|| sh -c '<body>'fallback preserves existing behavior.is_valid_ssh_keyrejecting',`,$,;,|,&,\n,\r. Cross-reference that coupling in both functions.vm_execpath.Test Results
Build:
cargo build --features cloud -p hero_compute_server— passTests:
cargo test --features cloud -p hero_compute_server— passNew tests added for this fix
Implementation Summary
Root cause
The guest-side shell script that writes
/root/.ssh/authorized_keysand/etc/hostnameruns under the defaultubuntuuser inside the Ubuntu cloud image (the exec channel is not rooted). The resulting files end up owned byubuntu:ubuntu, andsshd'sStrictModesthen refuses the keys when logging in asroot, producingPermission denied (publickey).Fix
Wrap the guest-side script with a
sudo -n sh-then-plain-shfallback and explicitlychown -R root:root /root/.ssh /rootafter writing the keys. The script body is base64-encoded at the Rust layer so no shell-quoting conflicts arise when routing the heredoc through two layers ofsh -c.Files changed
crates/hero_compute_server/Cargo.toml— addedbase64 = "0.22".crates/hero_compute_server/src/cloud/rpc.rspub(crate) fn build_inject_ssh_keys_script(keys_content: &str) -> String.pub(crate) fn build_set_hostname_script(safe_hostname: &str) -> String.inject_ssh_keys_to_vmto invoke the script viaecho <b64> | base64 -d | sudo -n sh 2>/dev/null || echo <b64> | base64 -d | shand to runchown -R root:root /root/.ssh /root.set_vm_hostnameto use the same wrapping for consistency.tracing::{info,warn}!log messages,ssh_keys.is_empty()early return,is_valid_ssh_keyfiltering.crates/hero_compute_server/src/cloud/tests.rs— added 8 unit tests pinning the shape of the generated scripts.Test results
cargo build --features cloud -p hero_compute_server— pass.cargo test --features cloud -p hero_compute_server— 46/46 passed, 2 doctests ignored, 0 failed.New tests
test_build_inject_ssh_keys_script_contains_chown_root_roottest_build_inject_ssh_keys_script_writes_mode_600test_build_inject_ssh_keys_script_writes_mode_700test_build_inject_ssh_keys_script_uses_quoted_heredoctest_build_inject_ssh_keys_script_embeds_keys_verbatimtest_build_inject_ssh_keys_script_creates_run_sshdtest_build_set_hostname_script_writes_etc_hostnametest_build_set_hostname_script_calls_hostname_cmdManual verification (still pending on TFGrid)
stat -c '%U:%G' /root /root/.ssh /root/.ssh/authorized_keysshould all returnroot:root.ssh root@<vm-ip>with any provisioned key should succeed withoutPermission denied (publickey).inject_ssh_keyson a running VM should keep ownershiproot:root.Notes
my_hypervisorstate-dir permission fix (commit225e3f2onfix_ssh_perm) is untouched; this change is complementary and stays on the same branch.sudo -npath is the happy path; the bareshfallback preserves existing (buggy) behavior only on images that ship withoutsudo, which is no worse than the status quo.base64(the crate) is only used at the Rust layer;base64(the binary) is incoreutils/util-linuxand is present on every Ubuntu cloud image.