Allow NUMERICALEARTH_DATA_DIRECTORY to override the Scratch.jl cache#367
Open
glwagner wants to merge 2 commits into
Open
Allow NUMERICALEARTH_DATA_DIRECTORY to override the Scratch.jl cache#367glwagner wants to merge 2 commits into
glwagner wants to merge 2 commits into
Conversation
Add `DataWrangling.download_cache(key)`, a single helper that returns the directory used to cache downloaded data. By default it resolves to a Scratch.jl space (unchanged behavior, same package UUID, so existing caches stay put). When the `NUMERICALEARTH_DATA_DIRECTORY` environment variable is set, data is instead cached under a per-key subdirectory of it. Every dataset module (and the Bathymetry cache) now resolves its cache through this helper in `__init__` instead of calling `@get_scratch!` directly, removing the duplicated Scratch import from each submodule. This is useful on HPC systems where the Julia depot lives on a small or quota-limited filesystem, or to share one cache of large datasets across depots and users — neither of which Scratch.jl supports on its own (its only knob, JULIA_DEPOT_PATH, relocates everything). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an opt-in
NUMERICALEARTH_DATA_DIRECTORYenvironment variable that lets users redirect where downloaded datasets are cached, as an alternative to the default Scratch.jl space.Scratch.jl ties its scratch space to the active Julia depot and offers no per-package env-var override — the only knob is
JULIA_DEPOT_PATH, which relocates everything (installed packages, precompile caches, all scratchspaces). This is too blunt for the common cases:$HOME) but a large scratch filesystem is available for data;What changed
DataWrangling.download_cache(key):@get_scratch!(key)(unchanged behavior);NUMERICALEARTH_DATA_DIRECTORYis set →mkpath(joinpath(ENV[...], key)).__init__instead of calling@get_scratch!directly, which also removes the duplicatedScratchimport from each submodule.ECCO/v4,WOA/annual), and the per-Metadatadirkeyword still overrides everything for individual datasets.Because the helper lives in
DataWranglingand all submodules shareNumericalEarthas their module root,@get_scratch!still resolves to the same package UUID — so existing default caches keep their current on-disk location (verified:~/.julia/scratchspaces/904d977b-.../JRA55).Verification
default_download_directory(...)lands under the configured directory (incl.ECCO/v4subdir). ✅check_no_implicit_imports,check_no_stale_explicit_imports,check_all_explicit_imports_via_owners,check_all_qualified_accesses_via_owners,check_no_self_qualified_accesses) returnnothing. ✅test/test_metadata.jlcovering both the fallback and the env-var override. ✅Docs
Added a "Where data is cached" section to
docs/src/Metadata/metadata_overview.md.cc @monsieuralok
🤖 Generated with Claude Code