Fix Metview regrid memory leak#550
Draft
j9sh264 wants to merge 3 commits into
Draft
Conversation
j9sh264
commented
Jun 25, 2026
| Fieldset = t.Any | ||
|
|
||
|
|
||
| def get_safe_base_date(fs: Fieldset) -> t.Union[datetime.datetime, t.List[datetime.datetime], None]: |
j9sh264
commented
Jun 25, 2026
| return result[0] if len(result) == 1 else result | ||
|
|
||
|
|
||
| def memory_usage_mb(): |
Collaborator
Author
There was a problem hiding this comment.
Added for debugging purposes for now. Will remove it.
j9sh264
commented
Jun 25, 2026
| return len(matches[0].metadata_list) > 0 | ||
|
|
||
| def apply(self, uri: str) -> None: | ||
| print(f"Initial Memory: {memory_usage_mb():.2f} MB") |
Collaborator
Author
There was a problem hiding this comment.
Added for debugging purposes for now. Will remove it.
j9sh264
commented
Jun 25, 2026
| except Exception as e: | ||
| logger.error(f'Regrid failed for {uri!r}. Error: {str(e)}') | ||
|
|
||
| print(f"Final Memory: {memory_usage_mb():.2f} MB") |
Collaborator
Author
There was a problem hiding this comment.
Added for debugging purposes for now. Will remove it.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
During the execution of the Dataflow regrid pipeline, we observed persistent memory accumulation on the worker nodes, eventually leading to
MemoryLimitExceedederrors. The memory leak was isolated to thefs.base_date()method in themetview-pythonpackage. When extracting the year for year-wise directories,fs.base_date()triggers the allocation of C-level memory structures (via the underlying Metview/ecCodes bindings) that are not properly released or tracked by Python's Garbage Collector.Solution
This PR completely bypasses the leaky
base_date()implementation by directly extracting primitive data types from the GRIB keys.Key Changes:
get_safe_base_date()helper: Replaces the nativefs.base_date()call. It utilizesfs.grib_get(["dataDate", "dataTime"])to fetch the raw date/time strings directly from the ecCodes engine.grib_getreturns standard Python primitives (strings/lists), Python's Garbage Collector can effortlessly track and destroy them when they go out of scope.datetimeobjects using Metview's built-inutils.date_from_ecc_keys(d, t).