Make garbage collection ownership-aware before deleting host network resources #45
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
The current garbage-collection logic for host networking is too heuristic-driven. It can delete resources based on naming and subnet guesses instead of verified ownership.
On a host where other tools or manual admin work also create TAP devices or NAT rules, that is not safe enough.
Current behavior
TAP cleanup
The cleanup logic scans network interfaces, looks for names starting with
tap-, extracts the suffix, and treats that suffix as a VM prefix.If that prefix is not in the current set of running VMs, the interface is treated as orphaned and deleted.
This means the decision is based on:
NAT rule cleanup
The iptables cleanup logic scans
iptables -t nat -S, filters rules that mention the project subnet, skips broad bridge-level rules, extracts a per-VM-looking IP, and deletes any rule whose IP is not currently attached to a running VM.Again, that is based on heuristic matching rather than ownership.
A rule can match the subnet shape and still belong to:
Why this matters
A garbage-collection command must be conservative. If it can delete unrelated host networking state, it stops being a maintenance tool and becomes a host-risking operation.
The current behavior is especially risky on:
Suggested implementation
Ownership model
Good examples of ownership evidence would be:
Cleanup policy
Failure handling
Acceptance criteria
tap-*devices created outside this project.Affected areas
doctor --gcImplementation Spec for Issue #45: Ownership-Aware GC for Host Network Resources
Objective
Replace the heuristic-based GC logic in
doctor --gcwith an ownership-aware system that persists metadata tying each TAP device and iptables rule to a specific VM. The default cleanup path must only remove resources it can prove belong to this project and are associated with a stopped/crashed VM. Heuristic cleanup becomes a separate emergency mode. A dry-run mode is added for safe inspection.Requirements
--dry-runflag ondoctor --gcshows what would be removed and why, without actually removing anything--force-heuristicflag preserves the current heuristic cleanup as an explicit emergency optiontap-*devices and NAT rules in the same subnet that were not created by this project are never touched by the default GC pathFiles to Modify
crates/my_hypervisor-lib/src/vm/state.rsNetworkOwnershipstruct toRuntimeStatecrates/my_hypervisor-lib/src/network/traits.rsiptables_rulestoNetworkResultcrates/my_hypervisor-lib/src/network/ops/mod.rsNatOps::add_port_forwardreturn type to include rule stringscrates/my_hypervisor-lib/src/network/local/nat.rscrates/my_hypervisor-lib/src/network/mosnet/nat.rscrates/my_hypervisor-lib/src/network/backends/tap_backend.rsNetworkResultcrates/my_hypervisor-lib/src/vm/manager.rsgc_owned_resources, clear on teardowncrates/my_hypervisor-cli/src/cli.rs--dry-runand--force-heuristicflags toDoctorArgscrates/my_hypervisor-cli/src/commands/doctor.rsImplementation Plan
Step 1: Add
NetworkOwnershipstruct toRuntimeStateinstate.rsStep 2: Extend
NetworkResultandNatOpstrait to surface created iptables rulesStep 3: Update
LocalOpsNAT implementation to return rule stringsStep 4: Propagate ownership through
TapNetwork::setupStep 5: Persist ownership metadata in
VmManager::setup_networkStep 6: Add ownership-aware GC method
gc_owned_resourcestoVmManagerStep 7: Add CLI flags and wire up in doctor command
Step 8: Clear ownership on teardown in
refresh_status,stop,removeStep 9: Add unit tests and integration tests
Acceptance Criteria
tap-*devices created outside this project--dry-runmode shows what would be removed and why--force-heuristicflagTest Results
All existing tests pass. 4 new tests added:
network_ownership_serialization_roundtripnetwork_ownership_default_is_emptyruntime_state_backward_compat_without_network_ownershipruntime_state_with_network_ownershipImplementation Summary
Changes Made
Files modified (9):
crates/my_hypervisor-lib/src/vm/state.rsNetworkOwnershipstruct and field onRuntimeStatewith serde backward compatcrates/my_hypervisor-lib/src/network/traits.rsiptables_rules: Vec<String>toNetworkResultcrates/my_hypervisor-lib/src/network/ops/mod.rsNatOps::add_port_forwardreturn type toResult<Vec<String>>crates/my_hypervisor-lib/src/network/local/nat.rscrates/my_hypervisor-lib/src/network/mosnet/nat.rscrates/my_hypervisor-lib/src/network/backends/tap_backend.rsNetworkResultcrates/my_hypervisor-lib/src/vm/manager.rssetup_network, clears onstop/refresh_status, addsgc_owned_resourcesmethod withGcAction/GcResult/GcWarningtypescrates/my_hypervisor-cli/src/cli.rs--dry-runand--force-heuristicflags toDoctorArgscrates/my_hypervisor-cli/src/commands/doctor.rsgc_cleanup_ownedfunction; default GC is now ownership-aware, heuristic available via--force-heuristicHow It Works
setup_networkrecords TAP device, bridge, IP, and exact iptables rules inNetworkOwnershiponRuntimeState, persisted tostate.jsonstop()andrefresh_status()clearnetwork_ownershipafter successful teardowndoctor --gc): Default path reads ownership metadata from stopped/crashed VMs, only deletes resources it can prove it owns. Skips VMs without ownership metadata (with a warning)--dry-run: Shows what would be removed without acting--force-heuristic: Falls back to the original heuristic cleanup for emergency usenetwork_ownershipdeserialize fine viaserde(default)Test Results