Skip to content

perf: investigate high CPU usage after booting a DragonOS VM on master #1951

Description

@fslongjin

Context

Related to #1946 because the CubeSandbox guest-kernel effort is affected by this behavior, but this issue is not CubeSandbox-specific.

High host CPU usage can already be observed when booting a DragonOS VM directly from the master branch, without using cube-agent as init. The CubeSandbox runtime path merely makes the same underlying issue visible in another scenario.

Observed behavior

  • Booting a DragonOS VM from master can leave the QEMU process consuming high host CPU even when there is no obvious active workload.
  • This behavior is observable independently of CubeSandbox and should be investigated as a DragonOS baseline runtime/performance issue.
  • No CubeSandbox-specific agent loop should be assumed as the root cause without evidence.

Requested investigation

Identify why the DragonOS guest consumes high host CPU after boot when it should be idle or mostly idle.

Potential areas to inspect:

  • scheduler idle path and CPU halt behavior,
  • timer tick / local APIC timer behavior,
  • wakeup source accounting,
  • interrupt storm or repeated wakeups,
  • virtio device polling paths,
  • epoll/poll or wait queue behavior,
  • any kernel thread or user-space task spinning after normal boot.

Requirements

Please treat this as a general DragonOS runtime issue first. The investigation should establish which task, interrupt source, timer path, or kernel subsystem keeps the VM busy.

Useful outputs from the investigation would include:

  • which CPU/task/kernel path is consuming CPU,
  • whether CPUs reach the architecture idle/halt path,
  • whether timers or interrupts immediately wake idle CPUs,
  • whether a virtio device path is spinning,
  • whether the behavior reproduces with a minimal init/userland,
  • whether the behavior changes across SMP settings or timer configuration.

Suggested validation

  • Add low-overhead instrumentation or snapshot-style diagnostics for scheduler/idle state.
  • Compare master VM boot under different init/userland configurations.
  • Compare single-CPU and multi-CPU QEMU runs if relevant.
  • If a concrete subsystem bug is found, add a focused regression test or diagnostic check where practical.

Acceptance criteria

  • The source of high host CPU usage after normal DragonOS VM boot is identified with evidence.
  • A fix plan or implementation points to the responsible DragonOS subsystem rather than masking the symptom.
  • After the fix, an idle or mostly idle DragonOS VM should not keep QEMU at high CPU usage under the tested boot configuration.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Bug fixA bug is fixed in this pull request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions