Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
9e7c9a3
Add CMIP7 output QC docs and CLI usage
rbeucher Jun 23, 2026
b5af570
Add per-variable CMIP7 QC rules and batch report integration
rbeucher Jun 23, 2026
74a31ce
Tune QC limits for tasmin and tasmax
rbeucher Jun 23, 2026
eaf98b9
Fix variable selection when file contains *_bnds variables
rbeucher Jun 23, 2026
4a7acae
Add a blank line before the build_batch_report function for improved …
rbeucher Jun 23, 2026
cd83154
Update rangess
rbeucher Jun 23, 2026
fd3fd22
pre-commit
rbeucher Jun 23, 2026
c754da9
Fix test_write_repacks_cmip7_output failing after QC integration
rbeucher Jun 23, 2026
674f448
Restore CMIP7 tas piControl QC ceiling
rbeucher Jun 23, 2026
c4f1bea
Add comprehensive test coverage for CMIP7 QC validation
rbeucher Jun 23, 2026
8c7325b
Refactor test cases in batch report and CMIP7 QC to remove unnecessar…
rbeucher Jun 23, 2026
d890345
Enhance time bounds calculation to support single monthly midpoint la…
rbeucher Jun 24, 2026
0e0b803
Add tests for CLI skip QC functionality and enhance validation for CM…
rbeucher Jun 24, 2026
747760a
Refactor QC validation to remove positive sign enforcement and update…
rbeucher Jun 24, 2026
1f48411
Add functions to handle missing-value sentinels and update QC validation
rbeucher Jun 24, 2026
f2d1a2c
Add unit tests for standardizing missing values in vocabulary processor
rbeucher Jun 24, 2026
f0d5f4b
Add range validation function and corresponding unit test for tiny ne…
rbeucher Jun 24, 2026
8d95129
Update validation function docstrings for clarity on ACCESS mapped va…
rbeucher Jun 24, 2026
39651b0
Add unit tests for range resolution and experiment selection in CMIP7 QC
rbeucher Jun 25, 2026
81f4ade
Enhance CMORiser and vocabulary processor: add support for ACCESS-ESM…
rbeucher Jun 26, 2026
8972eef
Switch on validation for Ocean variables
rbeucher Jun 26, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions docs/source/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -243,6 +243,29 @@ After writing the file, we recommend validating it using [PrePARE](https://githu

cmoriser.write()

Running output QC checks
------------------------

ACCESS-MOPPy includes CMIP7 output QC checks (currently including physical
range checks for ``tas``). You can run QC from notebooks or the command line.

Notebook/API usage:

.. code-block:: python

from access_moppy.qc import validate_cmip7_output

output_file = "/path/to/CMIP7/output.nc"
validate_cmip7_output(output_file)

CLI usage:

.. code-block:: bash

moppy-qc /path/to/output.nc

See :doc:`qc_validation` for complete examples and rule configuration details.

CMIP7 Support with Full Branded Names
======================================

Expand Down
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ While retaining the core concepts of "custom" and "cmip" modes, ACCESS-MOPPy uni
esmvaltool_integration
CMORise_ILAMB_workflow
mapping_reference
qc_validation
compliance_testing
testing_cmorisation
----
Expand Down
2 changes: 1 addition & 1 deletion docs/source/mapping_reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ required fields:
- Expected physical units of the CMIP output variable (e.g. ``"W m-2"``, ``"kg m-2 s-1"``).
* - ``positive``
- Yes
- Sign convention: ``"up"``, ``"down"``, or ``null`` if not applicable.
- Sign convention metadata: ``"up"``, ``"down"``, or ``null`` if not applicable.
* - ``model_variables``
- Yes
- List of raw model variable names that must be loaded from the input files.
Expand Down
164 changes: 164 additions & 0 deletions docs/source/qc_validation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
CMIP7 QC Validation
===================

This page describes how to run ACCESS-MOPPy output quality-control checks on
CMORised files.

Scope
-----

- QC is run on the *CMORised output file*, not the raw model input.
- For ``source_id=ACCESS-ESM1-6``, QC covers all variables present in
``ACCESS-ESM1-6_mappings.json`` with generic checks (non-missing values,
finite values, and units checks where defined).
- Physical ranges for all 293 ACCESS-ESM1-6 mapped variables are defined
explicitly in the QC configuration with defaults and experiment-specific
overrides (historical, piControl, ssp*).
- Each variable's physical range is derived from its definition (mapping units)
and stored as a per-variable rule entry.
- Rules are loaded from: ``access_moppy/resources/qc/cmip7_ranges.yml``.

Running QC in Notebooks
-----------------------

Use the Python API to validate a CMORised file after ``cmoriser.write()``.

.. code-block:: python

from access_moppy.qc import validate_cmip7_output

# Write CMORised output first
cmoriser.run()
cmoriser.write()

# Validate the written file
output_file = "/path/to/CMIP7/output.nc"
validate_cmip7_output(output_file)

If a check fails, ``validate_cmip7_output`` raises ``ValueError`` with details
about the variable, experiment, observed range, and allowed range.

Running QC from the CLI
-----------------------

ACCESS-MOPPy provides a CLI command:

.. code-block:: bash

moppy-qc /path/to/file1.nc /path/to/file2.nc

Exit status:

- ``0``: all files passed
- ``1``: one or more files failed

Example output:

.. code-block:: text

PASS /path/to/file1.nc
FAIL /path/to/file2.nc: CMIP7 QC failed for tas in experiment piControl using rule piControl: observed range 182.000..329.400 K is outside allowed range 180.000..325.000 K.

Automatic QC during CMORisation
-------------------------------

For CMIP7 runs, ACCESS-MOPPy automatically validates output in the write path
after writing and repacking the file. In other words, when you call
``cmoriser.write()`` for CMIP7 output, QC is already executed.

Batch Report QC Summary
-----------------------

When running a batch CMORisation, the batch report (``moppy_batch_report.json``)
automatically includes a QC section summarizing validation results for all
CMORised output files:

.. code-block:: json

{
"qc": {
"passed": 42,
"failed": 2,
"total": 44,
"failures": [
{
"file": "/output/path/tas.nc",
"variable_id": "tas",
"experiment_id": "piControl",
"error": "Observed range 182.000..329.400 K is outside allowed 180.000..325.000 K.",
"observed_range": [182.0, 329.4],
"allowed_range": [180.0, 325.0],
"units": "K"
}
]
}
}

To disable QC collection during batch report generation, use one of:

.. code-block:: bash

# Environment variable
export MOPPY_SKIP_QC=1
moppy-batch-report --db cmor_tasks.db

# CLI flag
moppy-batch-report --db cmor_tasks.db --skip-qc

Or programmatically:

.. code-block:: python

from access_moppy.batch_report import write_batch_report
write_batch_report(db_path, skip_qc=True)

Extending rules

---------------

To add experiment-specific thresholds for a variable, or to override ranges
for newly added variables, edit:

.. code-block:: text

src/access_moppy/resources/qc/cmip7_ranges.yml

Under the ``variables`` section, each variable has a ``default`` entry and an
optional ``experiments`` map for experiment-specific min/max values. For example:

.. code-block:: yaml

variables:
tas:
units: K
default:
min: 180.0
max: 330.0
experiments:
historical:
min: 180.0
max: 330.0
piControl:
min: 180.0
max: 325.0

Rule structure example:

.. code-block:: yaml

variables:
tas:
units: K
default:
min: 180.0
max: 330.0
experiments:
historical:
min: 180.0
max: 330.0
piControl:
min: 180.0
max: 325.0
ssp*:
min: 180.0
max: 335.0
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ moppy-cmorise = "access_moppy.batch_cmoriser:main"
moppy-dashboard = "access_moppy.dashboard.cmor_dashboard:main"
moppy-tui = "access_moppy.dashboard.cli_dashboard:main"
moppy-batch-report = "access_moppy.batch_report:main"
moppy-qc = "access_moppy.qc.cmip7:main"
moppy-example-config = "access_moppy.examples.show_config:main"
moppy-calc-ab-coeffts = "access_moppy.legacy_utilities.calc_hybrid_height_coeffs:main"
moppy-esmval-prepare = "access_moppy.esmval.cli_commands:main_prepare"
Expand Down
4 changes: 4 additions & 0 deletions src/access_moppy/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
import xarray as xr
from cftime import date2num

from access_moppy.qc import validate_cmip7_output
from access_moppy.utilities import (
FrequencyMismatchError,
IncompatibleFrequencyError,
Expand Down Expand Up @@ -1226,6 +1227,9 @@ def estimate_data_size(ds):

self._repack_cmip7_output(path)

if getattr(self.vocab, "mip_era", None) == "CMIP7":
validate_cmip7_output(path)

logger.info("CMORised output written to %s", path)
logger.debug("Optimized layout: metadata -> data chunks")
if self.enable_compression:
Expand Down
83 changes: 80 additions & 3 deletions src/access_moppy/batch_report.py
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,62 @@ def _pbs_report(
return _compact_dict(pbs)


def _run_qc_on_output_folder(output_folder: Path) -> dict[str, Any] | None:
"""Run QC on CMORised netCDF files found in output folder and return results.

Returns None if QC cannot be run (import unavailable, no files found, or disabled).
Set environment variable MOPPY_SKIP_QC=1 to disable QC collection.
"""
import os

if os.environ.get("MOPPY_SKIP_QC", "").lower() in ("1", "true", "yes"):
return None

try:
from access_moppy.qc.cmip7 import validate_cmip7_output_detailed
except ImportError:
return None

nc_files = sorted(output_folder.glob("**/*.nc"))
if not nc_files:
return None

results = {
"passed": 0,
"failed": 0,
"total": len(nc_files),
"failures": [],
}

for nc_file in nc_files:
result = validate_cmip7_output_detailed(nc_file)
if result.passed:
results["passed"] += 1
else:
results["failed"] += 1
failure = {
"file": str(nc_file),
"variable_id": result.variable_id,
"experiment_id": result.experiment_id,
"error": result.error,
}
if result.observed_min is not None:
failure["observed_range"] = [
float(result.observed_min),
float(result.observed_max),
]
if result.allowed_min is not None:
failure["allowed_range"] = [
float(result.allowed_min),
float(result.allowed_max),
]
if result.units:
failure["units"] = result.units
results["failures"].append(failure)

return results if results["total"] > 0 else None


def build_batch_report(
db_path: str | Path,
*,
Expand All @@ -223,12 +279,16 @@ def build_batch_report(
created_at: str | None = None,
completed_at: str | None = None,
stderr_tail_lines: int = 20,
skip_qc: bool = False,
) -> dict[str, Any]:
"""Build a JSON-serialisable batch coordination report.

SQLite remains the source of truth; this report is a derived snapshot for
after-the-fact completion checks, provenance capture, and dashboard/database
ingestion.

Args:
skip_qc: If True, skip QC data collection. Can also be set via MOPPY_SKIP_QC env var.
"""
db_path = Path(db_path)
output_path = Path(output_folder) if output_folder is not None else db_path.parent
Expand Down Expand Up @@ -281,7 +341,9 @@ def build_batch_report(
monitor = _read_monitor_sidecar(output_path)
monitor.update(_monitor_log_paths(script_path))

return {
qc_results = None if skip_qc else _run_qc_on_output_folder(output_path)

report_dict = {
"schema_version": SCHEMA_VERSION,
"created_at": now,
"completed_at": final_completed_at,
Expand All @@ -298,22 +360,31 @@ def build_batch_report(
"tasks": tasks,
"failures": failures,
}
if qc_results is not None:
report_dict["qc"] = qc_results

return report_dict


def write_batch_report(
db_path: str | Path,
output_path: str | Path | None = None,
skip_qc: bool = False,
**kwargs: Any,
) -> Path:
"""Write a durable batch report and return the report path."""
"""Write a durable batch report and return the report path.

Args:
skip_qc: If True, skip QC data collection. Can also be set via MOPPY_SKIP_QC env var.
"""
db_path = Path(db_path)
report_path = (
Path(output_path)
if output_path is not None
else db_path.parent / REPORT_FILENAME
)
report_path.parent.mkdir(parents=True, exist_ok=True)
report = build_batch_report(db_path, **kwargs)
report = build_batch_report(db_path, skip_qc=skip_qc, **kwargs)
report_path.write_text(
json.dumps(report, indent=2, sort_keys=True) + "\n", encoding="utf-8"
)
Expand All @@ -340,6 +411,11 @@ def _build_parser() -> argparse.ArgumentParser:
default=20,
help="Number of stderr tail lines to include for failed tasks (default: 20).",
)
parser.add_argument(
"--skip-qc",
action="store_true",
help="Skip QC data collection (can also set MOPPY_SKIP_QC=1).",
)
return parser


Expand All @@ -355,6 +431,7 @@ def main(argv: list[str] | None = None) -> int:
report_path = write_batch_report(
args.db,
args.output,
skip_qc=args.skip_qc,
config=config,
config_path=args.config,
script_dir=args.script_dir,
Expand Down
7 changes: 6 additions & 1 deletion src/access_moppy/ocean.py
Original file line number Diff line number Diff line change
Expand Up @@ -378,7 +378,12 @@ def infer_grid_type(self):
def _get_dim_rename(self):
"""Get the dimension renaming mapping for the grid type."""

supported_sources = ["ACCESS-OM2", "ACCESS-CM", "ACCESS-ESM1-5"]
supported_sources = [
"ACCESS-OM2",
"ACCESS-CM",
"ACCESS-ESM1-5",
"ACCESS-ESM1-6",
]
if self.vocab.source_id in supported_sources:
return {
"xt_ocean": "i",
Expand Down
Loading