Skip to content

Add CMIP7 QC — per-variable rules for all 293 ESM1-6 variables + batch report integration#456

Open
rbeucher wants to merge 16 commits into
mainfrom
feat/cmip7-qc-notebook-cli-docs
Open

Add CMIP7 QC — per-variable rules for all 293 ESM1-6 variables + batch report integration#456
rbeucher wants to merge 16 commits into
mainfrom
feat/cmip7-qc-notebook-cli-docs

Conversation

@rbeucher

@rbeucher rbeucher commented Jun 23, 2026

Copy link
Copy Markdown
Member

What's this?

QC (quality control) for CMIP7 CMORised output. The idea is simple: after we write a NetCDF file, we want to sanity-check that the data is physically plausible — right units, no junk values, temperature actually looks like temperature, etc.

Why?

We had some basic infra in place but it only really covered tas with hardcoded experiment limits. This PR generalises that to all 293 ACCESS-ESM1-6 mapped variables and plugs QC results directly into the batch report so it's not a separate thing you have to remember to run.

How does it work?

Per-variable rules (cmip7_ranges.yml)

Every variable in the ESM1-6 mapping now has an explicit entry with:

  • its expected units
  • a default physical min/max (derived from the variable's unit type and positive direction)
  • experiment-specific overrides (historical / piControl / ssp*)

The ranges weren't invented — they come from the mapping definitions themselves. E.g. evspsblsoi has units kg m-2 s-1 and positive: up, so its minimum is clamped to 0 (upward flux can't be negative). tas keeps its custom hand-tuned limits. Everything else is auto-derived.

Batch report integration

When moppy-batch-report runs, it now scans the output folder for .nc files and adds a qc block to the JSON report:

{
  "qc": {
    "passed": 42,
    "failed": 2,
    "total": 44,
    "failures": [
      {
        "file": "/output/tas.nc",
        "variable_id": "tas",
        "experiment_id": "piControl",
        "observed_range": [182.0, 329.4],
        "allowed_range": [180.0, 325.0],
        "units": "K"
      }
    ]
  }
}

If you don't want it, set MOPPY_SKIP_QC=1 or pass --skip-qc to the CLI.

New moppy-qc CLI (added in earlier commit on this branch)

moppy-qc /path/to/output/*.nc
# exits 0 if all pass, 1 if any fail

What's tested?

11 unit tests covering:

  • pass/fail range checks for individual variables
  • experiment-specific rule matching (piControl limits tighter than historical)
  • positive-direction enforcement (negative upward flux → fail)
  • units mismatch detection
  • CLI exit codes
  • coverage guard ensuring all 293 mapped variables have explicit YAML entries
  • CMORiser write-path integration

Files changed

  • src/access_moppy/qc/cmip7.py — main validator, validate_cmip7_output_detailed() for non-raising batch use
  • src/access_moppy/resources/qc/cmip7_ranges.yml — 293 per-variable rule blocks
  • src/access_moppy/batch_report.py — QC section added to batch report
  • tests/unit/test_cmip7_qc.py — full unit test coverage
  • docs/source/qc_validation.rst — user-facing docs (notebook API, CLI, batch report, how to extend)

@codecov

codecov Bot commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 81.73077% with 57 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.3%. Comparing base (7142b03) to head (f2d1a2c).

Files with missing lines Patch % Lines
src/access_moppy/qc/cmip7.py 79.8% 31 Missing and 17 partials ⚠️
src/access_moppy/batch_report.py 84.8% 2 Missing and 3 partials ⚠️
src/access_moppy/utilities.py 91.7% 1 Missing and 1 partial ⚠️
src/access_moppy/vocabulary_processors.py 83.3% 2 Missing ⚠️

❌ Your patch check has failed because the patch coverage (81.7%) is below the target coverage (90.0%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@           Coverage Diff           @@
##            main    #456     +/-   ##
=======================================
+ Coverage   76.9%   77.3%   +0.4%     
=======================================
  Files         31      33      +2     
  Lines       5974    6277    +303     
  Branches    1107    1169     +62     
=======================================
+ Hits        4594    4850    +256     
- Misses      1131    1154     +23     
- Partials     249     273     +24     
Flag Coverage Δ
unit 77.3% <81.7%> (+0.4%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

rbeucher added 11 commits June 24, 2026 11:54
- add CMIP7 QC validator module and packaged tas range rules

- run QC automatically on CMORised CMIP7 outputs after write/repack

- add moppy-qc CLI entrypoint and CLI tests

- document notebook and CLI workflows in Sphinx docs
- Generate explicit QC rules for all 293 ACCESS-ESM1-6 mapped variables
  Each variable now has units, default min/max, and experiment-specific
  overrides (historical/piControl/ssp*) derived from mapping definitions.
  No more falling back to unit envelopes at runtime — everything is explicit.

- Remove redundant unit_envelopes section from cmip7_ranges.yml
  They were only ever used to seed the per-variable generation, so keeping
  them in the YAML was just noise.

- Add QC section to batch report (moppy_batch_report.json)
  When the batch run finishes, the report now includes a qc block with
  pass/fail counts and per-file failure details (observed vs allowed range).
  Can be disabled with MOPPY_SKIP_QC=1 or --skip-qc / skip_qc=True.

- Add validate_cmip7_output_detailed() for non-raising validation
  Returns a ValidationResult dataclass so batch collection can gather
  failures without stopping on the first bad file.

- Update docs to reflect explicit per-variable setup and batch QC
tasmax (monthly max temperature): ceiling set 5-10 K above tas since we
expect warmer peaks — 340 K historical, 335 K piControl, 345 K ssp*

tasmin (monthly min temperature): floor dropped 5 K below tas to allow
colder night-time minimums — 175 K floor, 325/320/330 K ceiling per
experiment
CMORised files can have lat_bnds/lon_bnds/time_bnds alongside the main
data variable. The previous fallback only triggered when there was exactly
one data_var, so files like tasmax with 4 vars (tasmax + 3 bounds) raised
an error instead of selecting tasmax.

Fix: filter out *_bnds variables before falling back to single-variable
selection. Still raises if multiple non-bounds variables remain.
That test is about verifying the repack subprocess is called — it
was written before QC was wired into the write path. The test fixture
dataset doesn't have units on the data variable, so QC was raising.
Patch out validate_cmip7_output since QC is already tested separately.
The widened tas range made the piControl regression tests pass when they
should fail at 326.5 K. Put the narrower tas limits back in place so the
existing QC behavior stays intact.
Added 15 new unit tests:

QC Validation Tests (5 new):
- test_validate_cmip7_output_requires_variable_id: Validates error handling for missing variable_id
- test_validate_cmip7_output_requires_experiment_id: Validates error handling for missing experiment_id
- test_validate_cmip7_output_detects_all_missing_values: Validates detection of all-NaN data
- test_validate_cmip7_output_detects_infinity_values: Validates detection of infinity values
- test_validate_cmip7_output_experiment_pattern_matching: Validates wildcard pattern matching for experiments

Batch Report QC Integration Tests (10 new):
- test_run_qc_on_output_folder_with_all_passing_files: QC passes with valid files
- test_run_qc_on_output_folder_with_failing_files: QC detects invalid files
- test_run_qc_on_output_folder_with_mixed_pass_fail: QC tallies mixed results
- test_run_qc_on_output_folder_respects_moppy_skip_qc_env_var: MOPPY_SKIP_QC disables QC
- test_run_qc_on_output_folder_with_no_nc_files: QC handles empty folders
- test_run_qc_on_output_folder_with_nested_files: QC finds files in subdirectories
- test_build_batch_report_includes_qc_section_by_default: QC section included by default
- test_build_batch_report_omits_qc_section_when_skip_qc_true: skip_qc parameter works
- test_write_batch_report_passes_skip_qc_to_build: write_batch_report passes skip_qc
- test_run_qc_on_output_folder_includes_detailed_failure_info: Failure details properly included

All 39 tests pass. Coverage improved from 52.2% to estimated 80%+ for changed code.
@rbeucher rbeucher force-pushed the feat/cmip7-qc-notebook-cli-docs branch from fb56bd1 to 8c7325b Compare June 24, 2026 01:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant