Skip to content

feat: native xarray support via ds.canyonb accessor (v0.3.0)#7

Merged
RaphaelBajon merged 1 commit into
mainfrom
feature/xarray-accessor
Mar 4, 2026
Merged

feat: native xarray support via ds.canyonb accessor (v0.3.0)#7
RaphaelBajon merged 1 commit into
mainfrom
feature/xarray-accessor

Conversation

@RaphaelBajon

Copy link
Copy Markdown
Owner

Summary

This PR adds native xarray support to canyonbpy by implementing a Dataset accessor registered as ds.canyonb. Users can now run CANYON-B predictions directly on an xr.Dataset without any manual variable extraction, and get results back as an xr.Dataset sharing the same dimensions and coordinates — ready to merge.

import canyonbpy  # ds.canyonb is now available on any xr.Dataset

results = ds.canyonb.predict(param=["pH", "NO3"])
ds_enriched = xr.merge([ds, results])

Motivation

Working with ocean model output or Argo float data almost always means xr.Dataset objects. Previously, users had to manually extract each variable into a numpy array, build a dictionary, call canyonb(), then figure out how to reassign coordinates on the outputs. This PR removes all of that friction.

The accessor pattern (used by argopy, cf_xarray, xoak, etc.) is the idiomatic xarray-native approach: it lives on the object, is auto-discoverable, and groups related functionality under a single namespace without polluting the top-level package API.


Changes

New files

File Description
canyonbpy/accessor.py CanyonBAccessor — registered as ds.canyonb via @xr.register_dataset_accessor
canyonbpy/tests/test_accessor.py 12 tests covering registration, predict(), converter()

Modified files

File Description
canyonbpy/preprocessing.py Implements DatasetToNumpy (replaces the empty stub)
canyonbpy/__init__.py Imports accessor module to trigger registration on import; bumps version to 0.3.0
docs/user-guide/advanced-features.md Replaces the "not yet implemented" warning with full xarray section
docs/version-history.md Adds v0.3.0 entry

API

ds.canyonb.predict() — main entry point

Returns an xr.Dataset with the same dimensions and coordinates as the source.

import canyonbpy

# Default variable names: time, latitude, longitude, pressure, temperature, salinity, doxy
results = ds.canyonb.predict()

# Select parameters
results = ds.canyonb.predict(param=["pH", "AT", "NO3"])

# Custom measurement errors
results = ds.canyonb.predict(epres=1.0, etemp=0.01, epsal=0.01)

# Merge back into source dataset
ds_enriched = xr.merge([ds, results])

Custom variable names via var_map

Only keys that differ from the defaults need to be supplied. Argo BGC delayed-mode example:

var_map = {
    "temp": "TEMP_ADJUSTED",
    "psal": "PSAL_ADJUSTED",
    "doxy": "DOXY_ADJUSTED",
    "pres": "PRES_ADJUSTED",
    "lat":  "LATITUDE",
    "lon":  "LONGITUDE",
}
results = ds.canyonb.predict(var_map=var_map, param=["pH", "NO3"])

Default mapping:

canyonb argument Default dataset variable
gtime time
lat latitude
lon longitude
pres pressure
temp temperature
psal salinity
doxy doxy

ds.canyonb.converter() — low-level access

Returns the underlying DatasetToNumpy instance for cases where you need to inspect or modify numpy arrays before running the neural network.

conv   = ds.canyonb.converter()
inputs = conv.to_dict()         # dict[str, np.ndarray]
shape  = conv.original_shape()  # e.g. (n_prof, n_depth)

results = canyonb(**inputs, param=["pH"])
ph_grid = results["pH"].reshape(shape)

Implementation notes

  • Scalar _cim outputs: for carbonate parameters (AT, CT, pH, pCO2), canyonb returns _cim (measurement uncertainty) as a scalar because cvalcimeas = inputsigma[i]**2 is a fixed constant rather than a per-point value. _pack_results handles this by broadcasting scalar/size-1 arrays to original_shape via np.full, rather than attempting a reshape.
  • No circular imports: accessor.py imports canyonb from .core inside the predict() method body, avoiding any circular dependency at module load time.
  • No breaking changes: the existing canyonb() numpy API is untouched.

Tests

pytest canyonbpy/tests/test_accessor.py -v
tests/test_accessor.py::TestAccessorRegistration::test_accessor_is_available   PASSED
tests/test_accessor.py::TestAccessorRegistration::test_accessor_type           PASSED
tests/test_accessor.py::TestPredict::test_returns_dataset                      PASSED
tests/test_accessor.py::TestPredict::test_expected_variables_present           PASSED
tests/test_accessor.py::TestPredict::test_unrequested_params_absent            PASSED
tests/test_accessor.py::TestPredict::test_output_shape_preserved               PASSED
tests/test_accessor.py::TestPredict::test_output_dims_match_input              PASSED
tests/test_accessor.py::TestPredict::test_result_is_mergeable                  PASSED
tests/test_accessor.py::TestPredict::test_custom_var_map_argo                  PASSED
tests/test_accessor.py::TestPredict::test_results_consistent_with_canyonb     PASSED
tests/test_accessor.py::TestPredict::test_custom_errors                        PASSED
tests/test_accessor.py::TestConverter::test_returns_dataset_to_numpy           PASSED
tests/test_accessor.py::TestConverter::test_converter_with_var_map             PASSED

Checklist

  • CanyonBAccessor implemented in canyonbpy/accessor.py
  • DatasetToNumpy implemented in canyonbpy/preprocessing.py (stub → full implementation)
  • Accessor auto-registered on import canyonbpy (no extra import needed for the user)
  • Scalar _cim outputs handled correctly in _pack_results
  • All 13 new tests passing
  • Existing test suite unaffected
  • Documentation updated (advanced-features.md, version-history.md)
  • __version__ bumped to 0.3.0
  • No breaking changes to the existing canyonb() API

@RaphaelBajon RaphaelBajon merged commit e483dfc into main Mar 4, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant