fix: mask CMIP fill values (1e20) before monthly max/min aggregation#457
Closed
rbeucher wants to merge 2 commits into
Closed
fix: mask CMIP fill values (1e20) before monthly max/min aggregation#457rbeucher wants to merge 2 commits into
rbeucher wants to merge 2 commits into
Conversation
When data is loaded with decode_cf=False (as done throughout the pipeline to preserve lazy computation), fill values such as 1e20 are NOT automatically converted to NaN. This caused calculate_monthly_maximum and calculate_monthly_minimum to return 1e20 as the aggregated result when any fill value was present in the input data (e.g. tasmax/tasmin). Fix: detect _FillValue / missing_value from attrs and encoding, and mask those values via .where() before calling .resample().max()/.min(). Add regression tests for both attrs-based and encoding-based fill values.
Codecov Report❌ Patch coverage is
❌ Your patch check has failed because the patch coverage (77.8%) is below the target coverage (90.0%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #457 +/- ##
=====================================
Coverage 76.7% 76.7%
=====================================
Files 31 31
Lines 5927 5946 +19
Branches 1094 1097 +3
=====================================
+ Hits 4546 4562 +16
- Misses 1131 1135 +4
+ Partials 250 249 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When data is loaded with
decode_cf=False(as done throughout the pipeline to preserve lazy/Dask computation), fill values such as1e20are not automatically converted to NaN. This causedcalculate_monthly_maximumandcalculate_monthly_minimumto return1e20as the aggregated result when any fill value was present in the input data — most notably fortasmaxandtasmin.Fix
In both
calculate_monthly_minimumandcalculate_monthly_maximumincalc_utils.py:_FillValue/missing_valuefrom bothattrsandencoding.where(da != fill_val)before calling.resample().max()/.min()xarray.whereis a lazy operationTests
Added two regression tests to
test_derivations_calc_utils.py:test_masks_fill_values_in_attrs— verifies 1e20 inattrsis masked and does not appear in resultstest_masks_fill_values_in_encoding— verifies fill values inencodingare also handledAll 58 existing tests continue to pass.