fix: Deploy hangs indefinitely on Ubuntu 24.04 during apt install#11
Open
mehowz wants to merge 2 commits into
Open
fix: Deploy hangs indefinitely on Ubuntu 24.04 during apt install#11mehowz wants to merge 2 commits into
mehowz wants to merge 2 commits into
Conversation
The Deploy flow calls apt, wget, tar, git, and go build via Process.runSync with no timeout, no DEBIAN_FRONTEND=noninteractive, no streamed stdout. On Ubuntu 24.04 the first `apt -y install linux-kernel-headers` call can block indefinitely on a dpkg lock (unattended-upgrades) or an interactive prompt, and the operator sees no output between "Git installation detected" and the hang. Reproduced on a fresh Hetzner 24.04 host; only workaround was to pre-install `golang-go build-essential` manually and rebuild znnd from source. This change: - Adds `_runStreaming` that uses `Process.start`, streams stdout/ stderr live, accepts a timeout, and SIGTERM→SIGKILL if exceeded - Adds `_isDpkgLocked` precheck via `fuser /var/lib/dpkg/lock-frontend` so the deploy aborts with an actionable message instead of blocking - Adds `_hasCommand` / `_hasDebPackage` helpers so already-installed tools (git, build-essential, linux-libc-dev, wget, Go) are skipped cleanly — a host with manual `apt install golang-go build-essential` now sails through prereqs instead of re-running every apt step - Routes all apt invocations through `_aptInstall` with DEBIAN_FRONTEND=noninteractive + confold/confdef options so dpkg never waits on config-file prompts - Replaces `linux-kernel-headers` (transitional name, may not exist on Ubuntu 22.04+) with `linux-libc-dev`, accepting either package in the skip check for legacy compatibility - Makes `_buildFromSource` pick `go` from PATH if /usr/local/go is absent, matching the detection in the prereq step - 15-minute timeout on `go build` (slow hosts), 10-min on apt, 5-min on git clone / wget, 2-min on tar extract Caller in main() awaits both async functions — both were bool, now Future<bool>. No other call sites.
…r bound - goLinuxDlUrl bumped to go1.22.12 with verified go.dev/dl SHA256. go-zenon's go.mod still says `go 1.20`, so 1.20.x would technically compile it, but Go only maintains security updates for the two most recent minor versions. 1.22 is the current LTS floor. - pubspec.yaml sdk constraint relaxed from '>=2.14.0 <3.0.0' to '>=2.14.0 <4.0.0'. The existing constraint prevents `dart pub get` on any Dart 3.x release, which is what `dart-lang/setup-dart@v1.5.0` (used by the release workflow) installs by default. dcli 3.0.2 targets Dart 3.0 exactly; keep that implicit via pubspec.lock. Verified in Docker (Ubuntu 24.04 target): - target-prepped (golang-go + build-essential pre-installed): controller detects each prereq, skips all apt calls, advances to `git clone` + `go build` with streamed output. Reproduces the zenonorg5 operator's working scenario. - target-locked (flock holding /var/lib/dpkg/lock-frontend): controller aborts in ~2s with the actionable lock-contention error instead of hanging. No apt call issued.
This was referenced Apr 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The
Deploymenu option hangs forever on fresh Ubuntu 24.04 hosts.After the line
Git installation detected: git version 2.43.0the toolsits with no output for hours; the operator has no indication of what
failed or how to proceed. Reproducible on any stock 24.04 host.
Reproduction
Deploy.Installing Linux prerequisites ... / Git installation detected: git version 2.43.0, then hangs with zero further output.Root cause
`_installLinuxPrerequisites()` chains three `Process.runSync('apt', ['-y',
'install', ...], runInShell: true)` calls with:
On Ubuntu 24.04 the first call (`apt -y install linux-kernel-headers`)
hits at least one of three failure modes:
ships `linux-libc-dev` instead. The transitional shim can prompt.
`confdef` not set) when a package replaces an existing config.
which is enabled by default on a fresh 24.04 image.
`Process.runSync` blocks forever with no telemetry, so the operator
sees nothing.
Fix
tee, timeout + `SIGTERM`→`SIGKILL` fallback, configurable env merge.
so Deploy aborts in <2s with an actionable message when another
apt/dpkg job is running, instead of blocking.
present (`git`, `build-essential`, `linux-libc-dev`,
`linux-kernel-headers`, `wget`, Go via `/usr/local/go` or system
`go`), the apt call is skipped entirely. Makes the tool work on
operator-prepped hosts without fighting them.
`DEBIAN_FRONTEND=noninteractive` and
`-o DPkg::Options::=--force-confold -o DPkg::Options::=--force-confdef`
so dpkg never waits on config prompts.
name); the detection check accepts either so legacy hosts still work.
absent, matching the detection in the prereq step.
Every one is long enough to never falsely trip in normal operation
and short enough to bound operator frustration.
Bonus changes (same PR for atomicity)
(Go maintains only the two most recent minor versions); 1.22 is the
current security-supported floor. Verified SHA256 against go.dev.
`dart pub get` works on Dart 3 (which is what
`dart-lang/setup-dart@v1.5.0` installs in the existing CI workflow
by default — the constraint was already out of sync with the CI
reality).
Test plan
Verified in Docker (Ubuntu 24.04 base, Dart 3.0 builder):
`build-essential`, `linux-libc-dev`, `wget`, `golang-go`.
Controller detects each, skips all apt, advances to `git clone` +
`go build` with streamed output. Zero `apt-get install` calls.
Controller aborts in <2s with the "held by another process" message.
conflicting-action rejection all behave correctly.
A full 10-test matrix passes; the harness (bash + Docker) is standalone
and can be resurrected as real CI in a follow-up PR if maintainers want.
Scope
only compiles on Dart 3.0.x exactly (later 3.x removed `cli.waitFor`,
3.5+ removed `UnmodifiableUint8ListView`). A dcli bump deserves its
own PR with tests first.
`pgrep`, `kill`, etc. Those could also hang in theory; migrating
them is orthogonal and not on the hot path that produced the reported
bug.
Follow-ups
Two additional PRs are prepared, stacked on this one:
release, reuse existing clone on re-deploy, `--yes --deploy` for
unattended runs.
`[FAIL]` markers and exit-code semantics suitable for monitoring /
systemd watchdogs.
Happy to split further if the maintainer prefers smaller PRs.