test(ts): change-based selective integration testing#2921
Conversation
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- structural set now catches nested tsconfigs, __resources__ (?url assets), strandly/, and the orchestration script itself (all invisible to the graph) - DRY_RUN guards the early full-suite fallbacks so verification never hits AWS - word-split-safe file array (portable to bash 3.2) - merge_group falls back to merge_group.base_sha, not origin/main
|
Assessment: Request Changes Solid, well-documented approach — using Vitest's native module graph (no new deps) and failing safe to the full suite on structural/uncertain changes is the right design. The structural regex and base-ref fallbacks (including the Review themes
Nice work narrowing a slow, live-AWS suite without adding orchestration dependencies — once the dist-import path is traceable this will be a clear win. |
- alias @strands-agents/sdk to ./src in both integ projects so vitest related traces src changes to tests importing via the package specifier (which otherwise resolves through exports to dist/ and was silently skipped) - append untracked files to the changed set so brand-new local source files select their covering tests
Adds a SELECTIVE_CHANGED_FILES test seam and a pure-bash test that feeds synthetic changed-file lists through the structural/skip/selective classifier with no git state or network. Locks in the structural fallback set so a future regex edit cannot silently start skipping tests.
|
Assessment: Approve All three points from the previous review are addressed and I verified each one against the updated branch (deps installed, run locally): Verification results
Nice turnaround — the empirical before/after verification and the regression harness make this a solid, maintainable win for integ-test runtime. |
- structural fallback expressed as a commented pattern array instead of a 213-char single-line regex; each trigger is reviewable in isolation - remove strands-wasm/ and wit/ handling (those dirs were removed upstream), so the source filter and prefix strip only deal with strands-ts/
- collapse the three identical full-suite fallback blocks (structural branch, no-base-found, cannot-diff) into a single run_full_suite helper so the fallback behaviour has one definition; status line now goes to stderr - replace the duplicated @strands-agents/sdk alias comment in the integ-browser project with a back-reference to integ-node No behaviour change: classify.test.sh passes 20/20; both git-path fallbacks verified to print + exit 0 under dry-run.
|
Assessment: Approve Reviewed independently and verified the core behaviors empirically (deps installed, What I verified
One non-blocking suggestion inline about pinning the exit-0-on-empty assumption against future vitest majors. Nice win for integ-test runtime without sacrificing safety. |
Document that selective testing's fail-safe invariant depends on vitest v4 exiting 0 (not non-zero) when no spec covers a change, so a future major bump that changes this is caught at review rather than silently breaking selection.
|
Assessment: Approve Re-reviewed after I confirmed the author made the right call declining the automated-smoke-test half of that suggestion — Re-verified post-commit:
No outstanding concerns. Clean win for integ-test runtime that fails safe by design. Nice turnaround on the feedback. 🚀 |
What
Adds change-based selective integration testing for the TypeScript SDK. Instead of always running the full
integ-node+integ-browsersuite, CI and local runs now execute only the integration specs whose module graph depends on the files a change touched, with a fail-safe fallback to the full suite for structural changes.Why
The full integration suite hits live AWS/Bedrock and is slow. For a localized source fix, most of that suite is irrelevant. This narrows the run to the covering specs using Vitest's native module-graph tracing (
vitest related), which resolves the$/sdkpath alias correctly. No new dependencies and no new orchestration tooling.How it works
test-infra/scripts/run-selective-ts.shclassifies the diff against the base ref into three branches:strands-tstsconfig,vitest.config.ts, sharedtest/integ/__fixtures__/and__resources__/,strandly/, the script itself, or TypeScript CI workflows) runs the full suite.vitest related --project integ-node --project integ-browser.The same script backs both the local
npm run test:integ:selectivecommand and the CI step, so local and CI scope match for a given change. Locally it diffs the working tree against the merge-base (so uncommitted edits are included); in CI it diffs against the PR base SHA, falling back to the merge-queue base onmerge_groupevents.Fail-safe design
Every uncertain path runs the full suite rather than skipping: unresolvable base ref, failed diff, or any change the static graph cannot trace (binary assets imported via Vite
?url, tsconfig path-alias definitions, wasm/wit inputs) is routed to the structural fallback. The goal is never to skip a test that should run.Scope
TypeScript only. Python selective testing is intentionally deferred to a follow-up because its options (pytest-testmon and friends) carry real trade-offs with nondeterministic AWS-backed tests and stateful coverage data that warrant separate evaluation.
Files
test-infra/scripts/run-selective-ts.sh(new): the selection logicpackage.json:test:integ:selectivescript.github/workflows/typescript-integration-test.yml: wires the script into the existing integration-test workflow (full git history for diffing; selective run with the PR/merge-queue base SHA)CONTRIBUTING.md: local usage docsVerification
All branch classifications were verified via a dry-run mode that prints the chosen branch without executing tests (so no live AWS calls): structural cases (tsconfigs,
__resources__assets,strandly/) route to the full suite, a normal source edit narrows to the selective set, and unrelated changes skip.Note for reviewers
Integration tests still run against live AWS/Bedrock. This change reduces how many run for a localized change; it does not change per-test reliability.