Skip to content

ci: increase unit test partitions from 8 to 12#5

Open
huth-stacks wants to merge 1 commit into
developfrom
ci/P4-more-test-partitions
Open

ci: increase unit test partitions from 8 to 12#5
huth-stacks wants to merge 1 commit into
developfrom
ci/P4-more-test-partitions

Conversation

@huth-stacks

Copy link
Copy Markdown
Owner

What

Increase the number of parallel unit test partitions from 8 to 12 to reduce the worst-case partition execution time.

Why

Nextest's --partition count:N/8 distributes tests by position in the test list, not by execution time. This creates severe imbalance:

  • Partition 6: 24-27 minutes (clarity type-checker + stackslib cost tests)
  • Partition 3: 7-8 minutes
  • Ratio: 3.4x imbalance

The pipeline waits for the slowest partition, so this directly impacts wall-clock time. Increasing to 12 partitions distributes the heavy tests across more buckets.

The Change

In .github/workflows/stacks-core-tests.yml:

-        partition: [1, 2, 3, 4, 5, 6, 7, 8]
+        partition: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
...
-          total-partitions: 8
+          total-partitions: 12

2 lines changed in 1 file.

Metrics to Track

Metric Baseline Expected
Slowest partition 24m1s ~14-16m
Fastest partition 7m59s ~5-7m
Imbalance ratio 3.0x ~2.0x

Security Checklist

  • No new permissions granted
  • No secrets exposure
  • Same tests executed, different distribution

Part of CI Optimization Series

PR 4 of 7. Expected impact: ~8-10 min off critical path.

@huth-stacks huth-stacks added the no changelog Skip changelog fragment check label Mar 24, 2026
@huth-stacks huth-stacks reopened this Mar 24, 2026
@huth-stacks huth-stacks reopened this Mar 24, 2026
@huth-stacks huth-stacks force-pushed the ci/P4-more-test-partitions branch 3 times, most recently from 68dee00 to 72e3443 Compare March 25, 2026 11:55
Add a Python script that reads JUnit XML timing data from previous
CI runs and uses greedy bin-packing to distribute tests into
time-balanced partitions. Falls back to hash-based partitioning
when no timing data is available (first run, or data expired).

New files:
  .github/scripts/split-tests-by-timing.py - bin-packing script
  .config/nextest.toml - enables JUnit XML output for CI profile

Each partition uploads its JUnit XML as an artifact (90-day retention).
On subsequent runs, all partitions download this timing data and the
script assigns tests to minimize the slowest partition duration.

Uses --workspace-remap and --profile ci to ensure JUnit output is
written correctly when running from a nextest archive.

POC: inlines the nextest command to test this approach. Production
implementation should integrate with stacks-network/actions.
@huth-stacks huth-stacks force-pushed the ci/P4-more-test-partitions branch from 72e3443 to 1e875c4 Compare March 25, 2026 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no changelog Skip changelog fragment check

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant