Skip to content

Make the build agent and buildkitd immune to the OOM killer#134

Open
robstolarz wants to merge 1 commit into
mainfrom
rob/dep-5151-oom-immune-agent
Open

Make the build agent and buildkitd immune to the OOM killer#134
robstolarz wants to merge 1 commit into
mainfrom
rob/dep-5151-oom-immune-agent

Conversation

@robstolarz

Copy link
Copy Markdown

What

Before launching buildkitd, the agent sets its own oom_score_adj to -1000. buildkitd is spawned as a child and inherits the value, so both the agent and buildkitd become immune to the kernel OOM killer.

Why

Part of DEP-5144 (survive RUN-step OOMs without killing the build interaction). The kernel scores per-process and reaps the single highest-scoring one. buildkitd is often the fattest process on the box (it holds the LLB graph + cache), so today an OOM can take down buildkitd itself and the build interaction hangs forever.

Making the agent + buildkitd immune guarantees a parent survives to report the death. Build steps are biased positive (buildkit's oomScoreAdj default of +500), so they stay the preferred victim and the OOM lands on something we can afford to lose. Because runc sets each step's adj from the OCI spec, steps don't inherit the daemon's immunity.

This stands alone, and composes with the buildkit-private change (depot/buildkit-private#57) that biases steps positive. Deploy ordering: roll the buildkit-private binary to builders before this, so steps are already being reset to +500 when the daemon goes immune.

The agent runs as root, so the negative adjustment is permitted. The write is best-effort — a non-Linux/dev environment without a writable /proc logs and continues rather than failing the task.

Test

  • pnpm type-check clean.

DEP-5151

Before launching buildkitd, set the agent's own oom_score_adj to -1000 so
the kernel OOM killer never picks the agent or the buildkitd it spawns
(which inherits the value across the fork) as its victim under memory
pressure.

When a RUN step exhausts memory, the kernel scores per process and reaps
the single highest-scoring one. buildkitd is often the fattest process on
the box (it holds the LLB graph and cache), so today it can be the victim
and the whole build interaction hangs. Making the agent and buildkitd
immune guarantees a parent survives to report the death; build steps are
biased positive (buildkit's oomScoreAdj default), so they remain the
preferred victim and the OOM lands on something we can afford to lose.

The agent runs as root, so the negative adjustment is permitted. The write
is best-effort — a non-Linux/dev environment without a writable /proc logs
and continues rather than failing the task.

DEP-5151

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@linear-code

linear-code Bot commented Jun 24, 2026

Copy link
Copy Markdown

DEP-5151

DEP-5144

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants