Resource Cleanup on Panic/Crash #26
No reviewers
Labels
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
geomind_code/my_hypervisor!26
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "development_network_cleanup"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
related issues: #11
Change 1: Auto-cleanup in refresh_status()
Added teardown_network + cleanup_storage() to refresh_status() when a dead VM process is detected
Previously it only marked the VM as Stopped — now it cleans up network interfaces and storage before updating state
Change 2: chvm doctor --gc
Added --gc flag to the doctor command for manual garbage collection
Level 1 (state-aware): Uses refresh_status() to detect crashed VMs and clean their resources
Level 2 (stateless orphan detection): Scans for tap-* interfaces not belonging to any running VM and removes them. Scans iptables nat rules referencing chvm-subnet IPs (192.168.200.X) not owned by a running VM and removes them
Other
Added install-global Makefile target to copy binaries to /usr/local/bin
How to test
Test 1: refresh_status() auto-cleanup (via chvm list)
1. Start a VM with TAP networking
doas chvm run -d --name test-gc alpine:latest
2. Note the cloud-hypervisor PID
doas chvm inspect test-gc | grep vmm_pid
3. Verify TAP device and iptables rules exist
ip link show | grep tap-
doas iptables -t nat -S | grep 192.168.200
4. Kill the cloud-hypervisor process directly (simulate crash)
doas kill -9 <vmm_pid>
5. Run chvm list — refresh_status should detect the dead process, tear down TAP device and iptables rules, then mark VM as Stopped
doas chvm list -a
6. Verify TAP device and iptables rules are gone
ip link show | grep tap-
doas iptables -t nat -S | grep 192.168.200
Test 2: chvm doctor --gc orphan cleanup
1. Create an orphaned TAP device manually (simulates leftover from crash)
doas ip tuntap add dev tap-deadbeef mode tap
doas ip link set tap-deadbeef up
2. Verify it exists
ip link show | grep tap-deadbeef
3. Run doctor --gc
doas chvm doctor --gc
4. Verify it was removed
ip link show | grep tap-deadbeef # should be gone
Test 3: Orphaned iptables rules (no running VMs)
1. Make sure no VMs are running
doas chvm stop $(doas chvm list -q) 2>/dev/null
2. Add a fake orphaned iptables rule
doas iptables -t nat -A PREROUTING -p tcp --dport 9999
-j DNAT --to-destination 192.168.200.99:80
3. Verify it exists
doas iptables -t nat -S | grep 192.168.200.99
4. Run doctor --gc
doas chvm doctor --gc
5. Verify it was removed
doas iptables -t nat -S | grep 192.168.200.99 # should be gone
can we have tests may be integration test for whole flow and unit test for functions like extract_chvm_ip, can we update docs as well to reflect new flag usage?
@ -124,0 +158,4 @@let cleaned = running_before.len() - running_after.len();if cleaned > 0 {println!("Cleaned up {} crashed VM(s)", cleaned);during refresh status there can be call for more running vm so running after maybe bigger thah running before len( ) returns usize in this case it can panic i think we can use saturating sub
In practice running_after <= running_before always holds since refresh_status() only moves VMs from Running to Stopped, not the other way
but i added the suturating sub for extra safety
@ -124,0 +169,4 @@.map(|id| id[..std::cmp::min(8, id.len())].to_string()).collect();// Collect IPs of running VMs for iptables orphan detectioncan we extract this 8 in a constant?
@ -124,0 +273,4 @@Ok(out) if out.status.success() => removed += 1,_ => {}}}i think we can add warn log here in case of failed deletions