Skip to content

🔧 fix: reap the vmhost child so mb down is fast#56

Open
yeazelm wants to merge 1 commit into
mainfrom
apple_vm/masterblaster
Open

🔧 fix: reap the vmhost child so mb down is fast#56
yeazelm wants to merge 1 commit into
mainfrom
apple_vm/masterblaster

Conversation

@yeazelm

@yeazelm yeazelm commented Jun 27, 2026

Copy link
Copy Markdown
Collaborator

Summary

mb down took ~45s even though the guest powered off in ~1s. The daemon spawns
each vmhost with Setsid and then cmd.Process.Release(), never wait()ing
on it — so when vmhost exits it becomes a zombie of the daemon, and the
mb down wait loop checks liveness with processAlive (kill -0), which reports
a zombie as alive. The loop then blocked for its full 45s ceiling.

Fix: reap the child with go cmd.Wait() instead of releasing it. vmhost still
runs in its own session, so it survives daemon exit and a restart inherits it via
init; while this daemon is alive it is the OS parent and must wait() to avoid
the zombie. With the PID reaped promptly, mb down returns in ~1s.

Test plan

  • gofmt / go vet / golangci-lint clean (0 issues)
  • Verified end to end with the applevirt backend: mb up then mb down
    drops from ~45s to ~1s; confirmed no lingering <defunct> vmhost
  • vmhost still survives a daemon restart (Setsid; reattached on restart)

Part of PCC-777.

The daemon spawned each vmhost with Setsid and then `cmd.Process.Release()`,
never wait()ing on it. When vmhost exits, it becomes a zombie of the daemon, and
the `mb down` wait loop checks liveness with `processAlive` (kill -0) — which
reports a zombie as alive. So `mb down` blocked for its full 45s ceiling even
though the guest had powered off and vmhost exited within ~1s.

Reap the child with `go cmd.Wait()` instead of releasing it. vmhost still runs in
its own session, so it survives daemon exit and a daemon restart inherits it via
init; while this daemon is alive it is the OS parent and must wait() on the child
to avoid the zombie. With the PID reaped promptly, `mb down` returns in ~1s.

Part of PCC-777.
@yeazelm yeazelm requested a review from a team June 27, 2026 03:45
@linear-code

linear-code Bot commented Jun 27, 2026

Copy link
Copy Markdown

PCC-777

@greptile-apps

greptile-apps Bot commented Jun 27, 2026

Copy link
Copy Markdown

Greptile Summary

This PR fixes a 45-second mb down latency regression caused by zombie processes. After vmhost exits, it became a zombie because the daemon never called wait() on it; processAlive() (using kill -0) treats zombies as alive, so the mb down poll loop ran for its entire 45s ceiling.

  • Replaces cmd.Process.Release() with go func() { _ = cmd.Wait() }() so the daemon reaps the child when it exits, allowing processAlive() to correctly return false and unblocking mb down in ~1s.
  • Setsid is preserved, ensuring vmhost still survives daemon restarts (orphaned to init when the daemon exits while cmd.Wait() is still pending).

Confidence Score: 4/5

Safe to merge — the change is a targeted one-liner that correctly fixes a well-understood zombie-process bug and has been verified end-to-end.

The fix is mechanically sound: goroutine-based cmd.Wait() reaps the child promptly, Setsid is unchanged so daemon-restart survival is preserved, and there are no new race conditions since cmd is local to spawnVMHost. The only gap is that vmhost crash exit codes are silently swallowed, which doesn't affect correctness but loses diagnostic signal.

pkg/daemon/daemon.go — only file changed; the go cmd.Wait() goroutine and the inst.Name capture in the suggested log line are the areas worth a second look.

Important Files Changed

Filename Overview
pkg/daemon/daemon.go Replaces cmd.Process.Release() with go cmd.Wait() in spawnVMHost to reap the vmhost child and eliminate zombie-induced mb down delays; exit status is silently discarded.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant MB as mb down
    participant D as Daemon
    participant VH as vmhost (Setsid)
    participant OS as OS / init

    Note over D,VH: Before fix — Release() path
    D->>VH: cmd.Start() + cmd.Process.Release()
    VH->>VH: VM runs
    MB->>D: StopVM RPC
    D->>VH: Stop signal
    VH->>OS: exit (becomes zombie — parent never waited)
    loop processAlive() — kill -0 returns 0 for zombies
        MB->>D: poll (45s ceiling)
    end
    OS->>OS: zombie reaped after 45s timeout

    Note over D,VH: After fix — go cmd.Wait() path
    D->>VH: cmd.Start()
    D->>D: go cmd.Wait() (goroutine blocks)
    VH->>VH: VM runs
    MB->>D: StopVM RPC
    D->>VH: Stop signal
    VH->>OS: exit
    OS-->>D: SIGCHLD / wait() unblocks goroutine
    D->>D: zombie reaped immediately
    MB->>D: processAlive() → false (~1s)
    Note over VH,OS: If daemon exits first, init inherits vmhost (Setsid)
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant MB as mb down
    participant D as Daemon
    participant VH as vmhost (Setsid)
    participant OS as OS / init

    Note over D,VH: Before fix — Release() path
    D->>VH: cmd.Start() + cmd.Process.Release()
    VH->>VH: VM runs
    MB->>D: StopVM RPC
    D->>VH: Stop signal
    VH->>OS: exit (becomes zombie — parent never waited)
    loop processAlive() — kill -0 returns 0 for zombies
        MB->>D: poll (45s ceiling)
    end
    OS->>OS: zombie reaped after 45s timeout

    Note over D,VH: After fix — go cmd.Wait() path
    D->>VH: cmd.Start()
    D->>D: go cmd.Wait() (goroutine blocks)
    VH->>VH: VM runs
    MB->>D: StopVM RPC
    D->>VH: Stop signal
    VH->>OS: exit
    OS-->>D: SIGCHLD / wait() unblocks goroutine
    D->>D: zombie reaped immediately
    MB->>D: processAlive() → false (~1s)
    Note over VH,OS: If daemon exits first, init inherits vmhost (Setsid)
Loading
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
pkg/daemon/daemon.go:414
The exit status from `cmd.Wait()` is silently discarded. When `vmhost` crashes or exits non-zero, the daemon has no record of it — the only signal is the socket going dark. Logging the error here would make crash investigations easier without changing any behavior.

```suggestion
	go func() {
		if err := cmd.Wait(); err != nil {
			d.logger.Printf("vmhost for %q exited: %v", inst.Name, err)
		}
	}()
```

Reviews (1): Last reviewed commit: "🔧 fix: reap the vmhost child so mb down..." | Re-trigger Greptile

Comment thread pkg/daemon/daemon.go
// (kill -0) still reports it as running, so `mb down` blocks for its full
// 45s timeout even though the VM stopped in ~1s. If the daemon exits first,
// init inherits and reaps vmhost instead.
go func() { _ = cmd.Wait() }()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The exit status from cmd.Wait() is silently discarded. When vmhost crashes or exits non-zero, the daemon has no record of it — the only signal is the socket going dark. Logging the error here would make crash investigations easier without changing any behavior.

Suggested change
go func() { _ = cmd.Wait() }()
go func() {
if err := cmd.Wait(); err != nil {
d.logger.Printf("vmhost for %q exited: %v", inst.Name, err)
}
}()
Prompt To Fix With AI
This is a comment left during a code review.
Path: pkg/daemon/daemon.go
Line: 414

Comment:
The exit status from `cmd.Wait()` is silently discarded. When `vmhost` crashes or exits non-zero, the daemon has no record of it — the only signal is the socket going dark. Logging the error here would make crash investigations easier without changing any behavior.

```suggestion
	go func() {
		if err := cmd.Wait(); err != nil {
			d.logger.Printf("vmhost for %q exited: %v", inst.Name, err)
		}
	}()
```

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant