Close DWI parity family + fix TSV BOM bug (52→58 codes) by neurolabusc · Pull Request #3 · rordenlab/bids-validator-rs

neurolabusc · 2026-06-22T14:23:16Z

A step-change toward Deno parity, plus a real correctness fix surfaced along the way.

Summary

Implemented-code coverage 52 → 58 (tests/parity/ISSUE_COVERAGE.md), one parser bug fixed, and 7 modality datasets added to the local parity sweep.

Correctness fix

UTF-8 BOM in TSV parsing. parse_tsv now strips a single leading BOM, matching Deno's TextDecoder. Without it the first header read as "<name>", so the expected column (participant_id/onset/filename/name) was reported both MISSING and as an undefined extra column — 48 spurious issues on a BOM-prefixed dataset. Affects any TSV exported from Excel/Windows tooling. Unit-tested.

New parity codes (verified 1:1 vs @bids/validator 2.4.1)

UNKNOWN_BIDS_VERSION (from ds114).
DWI rules.checks.dwi.* family: VOLUME_COUNT_MISMATCH, BVAL_MULTIPLE_ROWS, BVEC_NUMBER_ROWS, DWI_MISSING_BVAL, DWI_MISSING_BVEC. These were "fixture-only" — the generic check engine + NIfTI/bval/bvec context already implemented them, but bids-examples ships empty placeholder .nii.gz so nifti_header was null and the checks never ran.

Durable tests

crates/bids-core/tests/dwi_checks.rs — synthetic DWI datasets (valid 4D NIfTI header + crafted bval/bvec) asserting each code fires at the right location. CI-gated, no Deno/external data.
crates/bids-core/tests/common/mod.rs — shared fixture builder (avoids a 4th copy of the per-test builder).

Parity-suite coverage (gitignored local fetch)

ds114 (DWI), emg_Multimodal, fnirs_tapping, eeg_rest_fmri, asl001, pet003 — first DWI/EMG/NIRS/ASL/PET coverage.

Method note

The fetch-only harvest (valid example datasets) mostly yields bug-fixes + coverage, not new codes — valid data doesn't trip error checks, and empty placeholder NIfTI keeps header-dependent checks dormant. Header-dependent codes are closed via crafted synthetic fixtures verified against Deno, then committed as CI-gated tests. See CLAUDE.md "Open follow-ups" for the next families.

Validation

cargo fmt, clippy -D warnings, cargo test --workspace all green. Parity sweep green on all present datasets (pre-existing ds005016 gap unchanged).

🤖 Generated with Claude Code

First iteration of the parity-gap loop, targeting DWI. - Add ds114 (raw DWI: inherited dwi.bval/dwi.bvec + per-subject dwi.nii.gz) to scripts/fetch_bench_data.py and tests/parity/run.py. Gitignored local fetch, same model as the electrophysiology datasets. - Promote UNKNOWN_BIDS_VERSION into the RUST_CODES contract: Rust already emits it correctly and it matches Deno 1:1 on ds114 (an unknown BIDSVersion in dataset_description.json). 52 -> 53 implemented; ISSUE_COVERAGE.md updated. Finding (recorded in CLAUDE.md): bids-examples ships empty placeholder .nii.gz, so NIfTI-header-dependent checks (volume counts, dims, *_NOT_MATCHING_NIFTI) stay dormant -- Deno emits NIFTI_HEADER_UNREADABLE (140x here) where Rust reads nothing. Closing those needs fixtures with valid tiny NIfTI headers (pet002 approach), tuned per check -- not bulk fetching. ds114 alone closed only UNKNOWN_BIDS_VERSION. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Harvest loop, batch 1. The new datasets surfaced a real parity bug rather than new codes (valid example datasets don't trip error codes): - parse_tsv now strips a single leading UTF-8 BOM, matching Deno's TextDecoder. Without it the first header read as "\u{feff}<name>", so the expected column (participant_id/onset/filename/name) was reported both MISSING and as an undefined extra column — 48 spurious issues on fnirs_tapping, now gone. Added a unit test. - Add emg_Multimodal, fnirs_tapping, eeg_rest_fmri to the parity suite (gitignored local fetch). All green; extra regression coverage for EMG/NIRS/EEG channel + events handling. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Harvest loop, batch 2. Both green; no new codes and no divergences (valid example datasets don't trip error codes). Kept for modality-breadth regression coverage — ASL/perf and PET were previously uncovered by the parity suite. Gitignored local fetch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The DWI volume/bval/bvec checks were "fixture-only": the generic check engine and the NIfTI/bval/bvec eval context already implement them, but no parity dataset exercised them (bids-examples ships empty placeholder .nii.gz, so nifti_header is null and the volume checks never run). - Add crates/bids-core/tests/dwi_checks.rs: synthetic DWI datasets (valid 4D NIfTI header + crafted bval/bvec) that assert each code fires with the correct location. CI-gated, no Deno/external data. Each case was confirmed 1:1 vs @bids/validator 2.4.1 on the identical tree. - Add crates/bids-core/tests/common/mod.rs: shared build_ctx/infer_datatype fixture builder (the older integration tests predate it; this avoids a fourth copy). - Promote VOLUME_COUNT_MISMATCH, BVAL_MULTIPLE_ROWS, BVEC_NUMBER_ROWS, DWI_MISSING_BVAL, DWI_MISSING_BVEC to RUST_CODES. ISSUE_COVERAGE.md: 53 -> 58 implemented, 125 -> 120 missing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Same crafted-fixture method as the DWI family. All six were fixture-only: the generic engine, the magnitude/magnitude1/bval associations, and the NIfTI header context already implement them. Each case confirmed 1:1 vs @bids/validator 2.4.1 on the identical synthetic tree. - Add crates/bids-core/tests/fmap_checks.rs (uses tests/common builder). - Promote FIELDMAP_WITHOUT_MAGNITUDE_FILE, MISSING_MAGNITUDE1_FILE, ECHOTIME1_2_DIFFERENCE_UNREASONABLE, MAGNITUDE_FILE_WITH_TOO_MANY_DIMENSIONS, EPI_WITH_BVALS_NEEDS_SMALL_BVALS, TOTAL_READOUT_TIME_MUST_DEFINE to RUST_CODES. ISSUE_COVERAGE.md: 58 -> 64 implemented, 120 -> 114 missing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…4 codes) Same method. All fixture-only — the parsed NIfTI header context already implements them. Each confirmed 1:1 vs @bids/validator 2.4.1. - Add crates/bids-core/tests/nifti_checks.rs. - Promote T1W_FILE_WITH_TOO_MANY_DIMENSIONS, BOLD_NOT_4D, NIFTI_DIMENSION, NIFTI_PIXDIM to RUST_CODES. ISSUE_COVERAGE.md: 64 -> 68 implemented, 114 -> 110 missing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Acting on the external review of PR #3 (crafted-fixture work): - tsv.rs: strip the UTF-8 BOM in parse_tsv_body_with_headers too, so BOM-prefixed .tsv.gz bodies match Deno (the earlier fix only covered parse_tsv). Added a regression test. - run.py: move machine-specific / known-red datasets (ds000003, ds002606, ds005016) into an opt-in OPT_IN_DATASETS dict. The no-argument sweep now runs only the expected-green set and is a clean pass/fail signal; opt-in datasets run when named. Fixed the pre-existing empty-set literal nit. - ISSUE_COVERAGE.md: SUBJECT_FOLDERS was in RUST_CODES but listed missing — moved to implemented; counts corrected to 69 implemented / 109 missing. Documented the dataset-parity vs crafted-fixture evidence distinction. - tests/common/mod.rs: host the shared minimal NIfTI header builder (was triplicated across dwi/fmap/nifti checks); note that build_ctx leaves dataset_modalities/subjects empty. - audit_response.md / CLAUDE.md updated; audit_temp.md removed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

neurolabusc and others added 7 commits June 22, 2026 09:58

neurolabusc merged commit 30458fc into main Jun 22, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Close DWI parity family + fix TSV BOM bug (52→58 codes)#3

Close DWI parity family + fix TSV BOM bug (52→58 codes)#3
neurolabusc merged 7 commits into
mainfrom
crafted-fixtures-dwi-parity

neurolabusc commented Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

neurolabusc commented Jun 22, 2026

Summary

Correctness fix

New parity codes (verified 1:1 vs @bids/validator 2.4.1)

Durable tests

Parity-suite coverage (gitignored local fetch)

Method note

Validation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant