[codex] Expose host CPU burst by vwxyzjn · Pull Request #273 · TencentCloud/CubeSandbox

vwxyzjn · 2026-05-15T03:40:12Z

Summary

Adds an optional per-container resources.host_burst.cpu field. CubeMaster aggregates the per-container burst ceilings into cube.master.instance.host_cpu_burst and leaves resources.cpu as the normal scheduler/request CPU.

Cubelet stores the normal host quota in HostCpuQ, stores the burst ceiling in HostCpuBurstQ, and converts the difference into the cgroup burst file:

cgroup v2: cpu.max.burst
cgroup v1, when supported by the kernel: cpu.cfs_burst_us

The API validates host_burst.cpu >= cpu and host_burst.cpu <= 2 * cpu, matching cgroup burst credit limits. For example, cpu=2 and host_burst.cpu=4 keeps average CPU at 2 while allowing bounded burst credit up to 4 CPUs.

Also adds cubemastercli tpl create-from-image --host-burst-cpu so template authors can set the burst ceiling when creating templates from images.

Validation

go test ./pkg/service/sandbox -run 'TestCheckAndGetContainers'
go test ./pkg/service/httpservice/cube -run '^$'
GOOS=linux GOARCH=amd64 go test -c -o /tmp/cubelet-cgroup.test ./plugins/cube/internals/cgroup
GOOS=linux GOARCH=amd64 go test -c -o /tmp/cubelet-cgroup-handle-v1.test ./plugins/cube/internals/cgroup/handle/v1
GOOS=linux GOARCH=amd64 go test -c -o /tmp/cubelet-cgroup-handle-v2.test ./plugins/cube/internals/cgroup/handle/v2
GOOS=linux GOARCH=amd64 go test -c -o /tmp/cubelet-services-cubebox.test ./services/cubebox
GOOS=linux GOARCH=amd64 go test -c -o /tmp/cubelet-store-cubebox.test ./pkg/store/cubebox
GOOS=linux GOARCH=amd64 go test -c -o /tmp/cubemastercli-cubebox.test ./cmd/cubemastercli/commands/cubebox
git diff --check

Note: go test ./cmd/cubemastercli/commands/cubebox -run 'TestParseContainerOverrides' does not compile on this Darwin/arm64 host because the package depends on github.com/agiledragon/gomonkey, which fails with undefined: buildJmpDirective for this local toolchain. The Linux compile check above passes.

Assisted-by: Codex:GPT-5

fslongjin · 2026-05-15T06:22:41Z

Hi @vwxyzjn, thanks a lot for this PR — the implementation is quite complete, covering CLI / API / CubeMaster / Cubelet / cgroup v1 & v2 in one go, and the unit tests are nicely done. While reviewing it I have a few questions I'd like to align on, mainly to make sure we understand the actual benefit this feature delivers in cubebox's current form.

1. How host-level burst takes effect under the cubebox form factor

Looking at the current implementation, cpu.max.burst / cpu.cfs_burst_us is written to the host cgroup that wraps the VMM process. However, cubebox today only has a single instance form, InstanceType_cubebox — essentially a VM whose vCPU count is fixed at template/snapshot time and is aligned with resources.cpu (see GetResourceWithOverhead in Cubelet/plugins/cube/internals/cgroup/cgroup.go).

This makes me unsure about the observable effect of burst on the guest side. For example, with cpu=2 and host_burst.cpu=4, only 2 vCPU threads inside the VM can actually do work; when the cubebox cgroup is fully loaded on the host, total usage is roughly 2 + a bit for VMM helper threads, which is still quite far from 4. So I'd like to confirm:

Which scenario do you primarily expect this burst to serve? Smoothing out spikes from VMM / IO helper threads, or a measurable throughput / latency improvement for the workload inside the guest?
If the goal is the latter, wouldn't burst be more effective if it were applied to the inner cgroup inside the guest? My concern is that host-level burst under a VM form factor may not produce a measurable difference on the guest side.

If you have a different understanding or any references on this, happy to align.

2. Validation data

The Validation section in the PR description consists entirely of go test / go test -c, which I think is enough for correctness. Before merging, could you provide a set of real-machine e2e numbers? Something along these lines would be sufficient:

Run on a Linux box whose kernel supports cpu.cfs_burst_us / cpu.max.burst;
Run a CPU-saturating workload (sysbench cpu, fio, or a concrete business cold-start case), once with host_burst.cpu = cpu and once with host_burst.cpu = 2*cpu;
Share nr_bursts / burst_time from the host-side cpu.stat (to confirm bursts actually occurred), plus guest-side CPU usage and benchmark results.

This will help us evaluate the real value of the feature for users.

3. A small naming suggestion

The host_burst.cpu field exposes the "host" layer to API users, but users typically only think in terms of containers and aren't aware of the host / VM split. If we decide to keep this feature, would it make sense to rename the field to something more neutral (e.g. resources.burst.cpu) and clearly document the layer it applies to (host cgroup vs. guest cgroup)? This is a nit — we can address it after the overall direction is settled.

Overall I'd like to align on points 1 and 2 first. Looking forward to your reply, thanks!

Expose host CPU burst

675f733

Assisted-by: Codex:GPT-5

vwxyzjn marked this pull request as ready for review May 15, 2026 03:45

vwxyzjn requested review from chenhengqi, fslongjin, ls-ggg, tinklone and up2wing as code owners May 15, 2026 03:45

vwxyzjn mentioned this pull request May 15, 2026

[Feature Request] Overcommit #217

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] Expose host CPU burst#273

[codex] Expose host CPU burst#273
vwxyzjn wants to merge 1 commit into
TencentCloud:masterfrom
vwxyzjn:codex/host-cpu-limit

vwxyzjn commented May 15, 2026

Uh oh!

fslongjin commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vwxyzjn commented May 15, 2026

Summary

Validation

Uh oh!

fslongjin commented May 15, 2026

1. How host-level burst takes effect under the cubebox form factor

2. Validation data

3. A small naming suggestion

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants