Skip to content

DLPX-97279 cpu_online cron generates excessive syslog noise, reducing supportability#563

Merged
prakashsurya merged 1 commit into
developfrom
projects/cpu-online-udev
May 19, 2026
Merged

DLPX-97279 cpu_online cron generates excessive syslog noise, reducing supportability#563
prakashsurya merged 1 commit into
developfrom
projects/cpu-online-udev

Conversation

@prakashsurya
Copy link
Copy Markdown
Contributor

Problem

The /opt/delphix/server/bin/cpu_online cron fires unconditionally every minute on every
engine, writing a syslog entry even when all CPUs are already online:

CRON[18098]: (root) CMD (/opt/delphix/server/bin/cpu_online)
CRON[18294]: (root) CMD (/opt/delphix/server/bin/cpu_online)
...

This adds ~1440 noise entries per day, making it harder to find relevant signals (surfaced
during an ESCL-5998 investigation).

Solution

Add a udev rule (/etc/udev/rules.d/70-cpu-online.rules) shipped via the
delphix-platform package. The rule fires only when the kernel raises a cpu add event —
i.e., when a hypervisor actually hot-inserts a vCPU while the VM is running — and writes
1 to the device's sysfs online attribute. This replaces the per-minute polling
behavior of the cpu_online cron.

Landing order: this PR should land before the companion app-gate PR for DLPX-97279,
which removes the cron. Landing this first ensures the udev rule is present before the
cron is removed, avoiding any window where hot-added CPUs would not be brought online.

Testing Done

Tested in two phases to verify that (1) the cpu_online mechanism is genuinely required for
hot-add functionality, and (2) the udev rule is a sufficient replacement.

Phase 1 — control (failing test): Deployed only the companion app-gate change (cron +
script removed, no udev rule) to a fresh ESX-hosted DCenter VM (dlpx-develop group,
engine 2026.4.0.0-snapshot.20260518083647291). Ran the CPU hotplug suite:

Test Result
test_cpu_count PASS
test_hot_add_cpu ERROR
test_warm_add_cpu PASS
test_warm_remove_cpu PASS
test_hot_add_greater_than_max_cpus_fails PASS
test_hot_remove_cpu PASS
test_warm_add_greater_than_max_cpus_fails PASS

test_hot_add_cpu errored with: Number of detected CPUs by the OS was not 3 after 12 retries with polling interval of 10 seconds. This confirms that Linux does not
automatically online hot-plugged CPUs — explicit action is required, and the udev rule
is a necessary part of this change.

Phase 2 — with udev rule (passing test): Deployed both the app-gate cron removal and
this udev rule to a fresh VM (engine 2026.4.0.0-snapshot.20260519100011891). All 7
tests passed:

Test Result
test_cpu_count SUCCESS
test_hot_add_cpu SUCCESS
test_warm_add_cpu SUCCESS
test_warm_remove_cpu SUCCESS
test_hot_add_greater_than_max_cpus_fails SUCCESS
test_hot_remove_cpu SUCCESS
test_warm_add_greater_than_max_cpus_fails SUCCESS

test_hot_add_cpu went from ERROR to SUCCESS, confirming the udev rule is a sufficient
replacement. The platform.hypervisor.cpu.positive and platform.hypervisor.cpu.negative
QA suites are sufficient to verify these changes do not regress CPU hotplug product
functionality.

When a hypervisor hot-inserts a vCPU while the VM is running, the kernel
fires an add event for the new cpu device. This rule writes 1 to the
device's online sysfs attribute, replacing the per-minute cpu_online cron
that was removed from dlpx-app-gate.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@prakashsurya prakashsurya enabled auto-merge (squash) May 19, 2026 18:38
@prakashsurya prakashsurya merged commit 86f4395 into develop May 19, 2026
16 of 17 checks passed
@prakashsurya prakashsurya deleted the projects/cpu-online-udev branch May 19, 2026 19:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants