Close DWI parity family + fix TSV BOM bug (52→58 codes)#3
Merged
Conversation
First iteration of the parity-gap loop, targeting DWI. - Add ds114 (raw DWI: inherited dwi.bval/dwi.bvec + per-subject dwi.nii.gz) to scripts/fetch_bench_data.py and tests/parity/run.py. Gitignored local fetch, same model as the electrophysiology datasets. - Promote UNKNOWN_BIDS_VERSION into the RUST_CODES contract: Rust already emits it correctly and it matches Deno 1:1 on ds114 (an unknown BIDSVersion in dataset_description.json). 52 -> 53 implemented; ISSUE_COVERAGE.md updated. Finding (recorded in CLAUDE.md): bids-examples ships empty placeholder .nii.gz, so NIfTI-header-dependent checks (volume counts, dims, *_NOT_MATCHING_NIFTI) stay dormant -- Deno emits NIFTI_HEADER_UNREADABLE (140x here) where Rust reads nothing. Closing those needs fixtures with valid tiny NIfTI headers (pet002 approach), tuned per check -- not bulk fetching. ds114 alone closed only UNKNOWN_BIDS_VERSION. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Harvest loop, batch 1. The new datasets surfaced a real parity bug rather
than new codes (valid example datasets don't trip error codes):
- parse_tsv now strips a single leading UTF-8 BOM, matching Deno's
TextDecoder. Without it the first header read as "\u{feff}<name>", so the
expected column (participant_id/onset/filename/name) was reported both
MISSING and as an undefined extra column — 48 spurious issues on
fnirs_tapping, now gone. Added a unit test.
- Add emg_Multimodal, fnirs_tapping, eeg_rest_fmri to the parity suite
(gitignored local fetch). All green; extra regression coverage for
EMG/NIRS/EEG channel + events handling.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Harvest loop, batch 2. Both green; no new codes and no divergences (valid example datasets don't trip error codes). Kept for modality-breadth regression coverage — ASL/perf and PET were previously uncovered by the parity suite. Gitignored local fetch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The DWI volume/bval/bvec checks were "fixture-only": the generic check engine and the NIfTI/bval/bvec eval context already implement them, but no parity dataset exercised them (bids-examples ships empty placeholder .nii.gz, so nifti_header is null and the volume checks never run). - Add crates/bids-core/tests/dwi_checks.rs: synthetic DWI datasets (valid 4D NIfTI header + crafted bval/bvec) that assert each code fires with the correct location. CI-gated, no Deno/external data. Each case was confirmed 1:1 vs @bids/validator 2.4.1 on the identical tree. - Add crates/bids-core/tests/common/mod.rs: shared build_ctx/infer_datatype fixture builder (the older integration tests predate it; this avoids a fourth copy). - Promote VOLUME_COUNT_MISMATCH, BVAL_MULTIPLE_ROWS, BVEC_NUMBER_ROWS, DWI_MISSING_BVAL, DWI_MISSING_BVEC to RUST_CODES. ISSUE_COVERAGE.md: 53 -> 58 implemented, 125 -> 120 missing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Same crafted-fixture method as the DWI family. All six were fixture-only: the generic engine, the magnitude/magnitude1/bval associations, and the NIfTI header context already implement them. Each case confirmed 1:1 vs @bids/validator 2.4.1 on the identical synthetic tree. - Add crates/bids-core/tests/fmap_checks.rs (uses tests/common builder). - Promote FIELDMAP_WITHOUT_MAGNITUDE_FILE, MISSING_MAGNITUDE1_FILE, ECHOTIME1_2_DIFFERENCE_UNREASONABLE, MAGNITUDE_FILE_WITH_TOO_MANY_DIMENSIONS, EPI_WITH_BVALS_NEEDS_SMALL_BVALS, TOTAL_READOUT_TIME_MUST_DEFINE to RUST_CODES. ISSUE_COVERAGE.md: 58 -> 64 implemented, 120 -> 114 missing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…4 codes) Same method. All fixture-only — the parsed NIfTI header context already implements them. Each confirmed 1:1 vs @bids/validator 2.4.1. - Add crates/bids-core/tests/nifti_checks.rs. - Promote T1W_FILE_WITH_TOO_MANY_DIMENSIONS, BOLD_NOT_4D, NIFTI_DIMENSION, NIFTI_PIXDIM to RUST_CODES. ISSUE_COVERAGE.md: 64 -> 68 implemented, 114 -> 110 missing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Acting on the external review of PR #3 (crafted-fixture work): - tsv.rs: strip the UTF-8 BOM in parse_tsv_body_with_headers too, so BOM-prefixed .tsv.gz bodies match Deno (the earlier fix only covered parse_tsv). Added a regression test. - run.py: move machine-specific / known-red datasets (ds000003, ds002606, ds005016) into an opt-in OPT_IN_DATASETS dict. The no-argument sweep now runs only the expected-green set and is a clean pass/fail signal; opt-in datasets run when named. Fixed the pre-existing empty-set literal nit. - ISSUE_COVERAGE.md: SUBJECT_FOLDERS was in RUST_CODES but listed missing — moved to implemented; counts corrected to 69 implemented / 109 missing. Documented the dataset-parity vs crafted-fixture evidence distinction. - tests/common/mod.rs: host the shared minimal NIfTI header builder (was triplicated across dwi/fmap/nifti checks); note that build_ctx leaves dataset_modalities/subjects empty. - audit_response.md / CLAUDE.md updated; audit_temp.md removed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A step-change toward Deno parity, plus a real correctness fix surfaced along the way.
Summary
Implemented-code coverage 52 → 58 (
tests/parity/ISSUE_COVERAGE.md), one parser bug fixed, and 7 modality datasets added to the local parity sweep.Correctness fix
parse_tsvnow strips a single leading BOM, matching Deno'sTextDecoder. Without it the first header read as"<name>", so the expected column (participant_id/onset/filename/name) was reported both MISSING and as an undefined extra column — 48 spurious issues on a BOM-prefixed dataset. Affects any TSV exported from Excel/Windows tooling. Unit-tested.New parity codes (verified 1:1 vs @bids/validator 2.4.1)
UNKNOWN_BIDS_VERSION(from ds114).rules.checks.dwi.*family:VOLUME_COUNT_MISMATCH,BVAL_MULTIPLE_ROWS,BVEC_NUMBER_ROWS,DWI_MISSING_BVAL,DWI_MISSING_BVEC. These were "fixture-only" — the generic check engine + NIfTI/bval/bvec context already implemented them, but bids-examples ships empty placeholder.nii.gzsonifti_headerwas null and the checks never ran.Durable tests
crates/bids-core/tests/dwi_checks.rs— synthetic DWI datasets (valid 4D NIfTI header + crafted bval/bvec) asserting each code fires at the right location. CI-gated, no Deno/external data.crates/bids-core/tests/common/mod.rs— shared fixture builder (avoids a 4th copy of the per-test builder).Parity-suite coverage (gitignored local fetch)
ds114 (DWI), emg_Multimodal, fnirs_tapping, eeg_rest_fmri, asl001, pet003 — first DWI/EMG/NIRS/ASL/PET coverage.
Method note
The fetch-only harvest (valid example datasets) mostly yields bug-fixes + coverage, not new codes — valid data doesn't trip error checks, and empty placeholder NIfTI keeps header-dependent checks dormant. Header-dependent codes are closed via crafted synthetic fixtures verified against Deno, then committed as CI-gated tests. See
CLAUDE.md"Open follow-ups" for the next families.Validation
cargo fmt,clippy -D warnings,cargo test --workspaceall green. Parity sweep green on all present datasets (pre-existing ds005016 gap unchanged).🤖 Generated with Claude Code