Fix false-green validation and failed-start cleanup paths #67

Merged
rawan merged 3 commits from development_fix_test_summary into development 2026-03-26 12:45:30 +00:00
Member

#44

#44
fix: handle cleanup on stop and start, fix integration test results
All checks were successful
Unit and Integration Test / test (push) Successful in 2m23s
70fbd33da5
Merge branch 'development' into development_fix_test_summary
All checks were successful
Unit and Integration Test / test (push) Successful in 1m48s
6c224bbe3c
@ -617,0 +607,4 @@
info!(vm_id = %state.id, "Detected dead VM process, cleaning up resources");
if state.vm_config.network.mode != NetworkMode::None {
let _ = self.teardown_network(&state).await;
Owner

why ignoring error here?

why ignoring error here?
rawan marked this conversation as resolved
@ -1153,0 +1169,4 @@
let _ = std::fs::remove_file(vm_dir.join("vsock.sock"));
if state.vm_config.network.mode != NetworkMode::None {
let _ = self.teardown_network(state).await;
Owner

here too

here too
rawan marked this conversation as resolved
@ -567,0 +565,4 @@
if state.status == VmState::Running && state.vmm_pid > 0 && !process_running(state.vmm_pid)
{
self.reconcile_dead_vm(state).await;
Owner

race condition here: Between process_running() returning true and the actual API socket call, the process can die. This was always possible, but the new early-return path creates a false sense of safety — callers may stop handling socket-level errors gracefully. The API call below still needs to handle ECONNREFUSED/BrokenPipe and trigger reconciliation, otherwise a process that dies in that window produces an unreconciled opaque error.

race condition here: `Between process_running() returning true and the actual API socket call, the process can die. This was always possible, but the new early-return path creates a false sense of safety — callers may stop handling socket-level errors gracefully. The API call below still needs to handle ECONNREFUSED/BrokenPipe and trigger reconciliation, otherwise a process that dies in that window produces an unreconciled opaque error.`
rawan marked this conversation as resolved
@ -616,1 +603,4 @@
/// Clean up resources and mark a VM as Stopped when its hypervisor process
/// has died out-of-band. Used by both `stats` and `refresh_status`.
async fn reconcile_dead_vm(&mut self, state: RuntimeState) {
Owner

takes ownership but callers sometimes hold a borrow

takes ownership but callers sometimes hold a borrow
Author
Member

both callers create their own clones and pass them to the function, the function needs ownership because it needs to mutate the stats in:

        let mut updated = state;
        updated.status = VmState::Stopped;
        updated.stopped_at = Some(Utc::now());
both callers create their own clones and pass them to the function, the function needs ownership because it needs to mutate the stats in: ```rust let mut updated = state; updated.status = VmState::Stopped; updated.stopped_at = Some(Utc::now()); ```
rawdaGastan marked this conversation as resolved
fix: remove silent errors, fixed race condition
All checks were successful
Unit and Integration Test / test (push) Successful in 1m48s
cd1f59522c
@ -656,6 +693,11 @@ impl VmManager {
.unwrap_or(MyceliumHealth::Unknown)
};
if health != MyceliumHealth::Healthy && !process_running(state.vmm_pid) {
Owner

shouldn't we check after the update?

shouldn't we check after the update?
Author
Member

after what update?

after what update?
Owner

// Update persisted state

// Update persisted state
Author
Member

what current order does:

1- If unhealthy + process dead → reconcile, skip the persist, return Unknown
2- Otherwise → persist the health update (healthy or unhealthy but process still alive)

Moving the check after the persist would mean writing state that gets immediately discarded by reconciliation

what current order does: 1- If unhealthy + process dead → reconcile, skip the persist, return Unknown 2- Otherwise → persist the health update (healthy or unhealthy but process still alive) Moving the check after the persist would mean writing state that gets immediately discarded by reconciliation
rawan merged commit 243e61f133 into development 2026-03-26 12:45:30 +00:00
rawan deleted branch development_fix_test_summary 2026-03-26 12:45:30 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
geomind_code/my_hypervisor!67
No description provided.