Skip to content

Soft-fail Intel CPU queue and CPU-* jobs#373

Open
khluu wants to merge 1 commit into
mainfrom
soft-fail-intel-cpu
Open

Soft-fail Intel CPU queue and CPU-* jobs#373
khluu wants to merge 1 commit into
mainfrom
soft-fail-intel-cpu

Conversation

@khluu

@khluu khluu commented Jun 11, 2026

Copy link
Copy Markdown
Member

What

Force soft_fail: true on every generated Buildkite step that either:

  • runs on the intel-cpu agent queue (AgentQueue.INTEL_CPU), or
  • has a label starting with CPU-.

Steps that already opt into soft_fail keep it.

Why

Failures in these jobs should not block the pipeline.

How

Added a _should_soft_fail(step) helper in buildkite/pipeline_generator/buildkite_step.py and use it when constructing each BuildkiteCommandStep instead of passing step.soft_fail directly. The queue check reuses get_agent_queue(step), so it tracks however a step gets routed to intel-cpu.

Verification

Sanity-checked the matcher against representative steps:

label resolved queue soft_fail
Some Intel test (device intel_cpu) intel-cpu ✅ true
CPU-1-test (device cpu) cpu_queue_premerge_us_east_1 ✅ true
CPU-2-no-device gpu_1_queue ✅ true
GPU test gpu_1_queue false
whatever (soft_fail=True) gpu_1_queue ✅ true (preserved)
cpu lowercase test (device cpu) cpu_queue_premerge_us_east_1 false

The CPU- match is case-sensitive and prefix-only, so it targets exactly the CPU-* named jobs without catching unrelated lowercase cpu steps.

🤖 Generated with Claude Code

Mark any generated Buildkite step as soft_fail when it runs on the
intel-cpu agent queue or its label starts with "CPU-", so failures in
these jobs don't block the pipeline. Existing soft_fail opt-ins are
preserved.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant