ESMcat is a Python package for working with large climate model datasets. It provides a catalogue system for indexing and filtering CMIP6 datasets.
Install directly from GitHub:
pip install git+https://github.com/scotthosking/ESMcat.gitOr clone and install in editable mode for development:
git clone https://github.com/scotthosking/ESMcat.git
cd ESMcat
pip install -e .ESMcat uses a config file at ~/.esmcat/config.json to know which machine you are on, and therefore which dataset paths and catalogue files to use.
Set your machine on first use:
import esmcat as ecat
ecat.set_config('jasmin')This writes {"machine": "jasmin"} to ~/.esmcat/config.json. ESMcat will then load datasets_jasmin.json from the package for dataset root paths and directory structures.
To check your current config:
ecat.get_config()To add support for a new machine, create a datasets_{machine}.json file in the esmcat/ package directory following the same structure as datasets_jasmin.json.
ESMcat uses pre-built catalogue files (Parquet format) stored in ~/.esmcat/. These index the files available on your system for each dataset.
Bundled catalogue files are included in the package under esmcat/catalogues/ and are copied to ~/.esmcat/ automatically on first use. Currently bundled:
| Dataset | Coverage |
|---|---|
cmip6 |
CMIP and ScenarioMIP activities |
To rebuild a catalogue from scratch (e.g. after new data has been added to the archive):
ecat.catalogue(dataset='cmip6', refresh=True)Existing CSV catalogues are automatically migrated to Parquet on first use.
import esmcat as ecat
df = ecat.catalogue(dataset='cmip6',
Experiment='historical',
Var=['tas', 'pr'],
CMOR='Amon')
print(df.head())Use CMOR to select frequency and realm (e.g. Amon for monthly atmosphere, day for daily).
| Column | Description | Example values |
|---|---|---|
MIP |
CMIP6 activity | CMIP, ScenarioMIP |
Centre |
Modelling centre | MOHC, CNRM-CERFACS |
Model |
Model name | HadGEM3-GC31-LL, CNRM-ESM2-1 |
Experiment |
Experiment ID | historical, ssp245, ssp585 |
RunID |
Ensemble member | r1i1p1f1, r2i1p1f2 |
CMOR |
CMOR table (encodes frequency and realm) | Amon, Omon, day, fx |
Var |
Variable name | tas, pr, tos |
Grid |
Grid label | gn (native), gr (regridded) |
Version |
Data version | v20190621 |
StartDate |
Start date of files (YYYYMMDD) | 19500101 |
EndDate |
End date of files (YYYYMMDD) | 21001231 |
Path |
Relative path to data directory | |
DataFiles |
Semicolon-separated list of filenames |
Pass a single-row catalogue entry to ecat.open_dataset() to load it as an Xarray Dataset. Multiple files (e.g. a variable split across decades) are combined automatically via xarray.open_mfdataset.
import esmcat as ecat
catlg = ecat.catalogue(dataset='cmip6',
Experiment='historical',
Var='tas',
CMOR='Amon',
Model='HadGEM3-GC31-LL',
RunID='r1i1p1f3')
ds = ecat.open_dataset(catlg.iloc[0])
print(ds)To loop over multiple variables:
for _, row in catlg.iterrows():
ds = ecat.open_dataset(row)ecat.open_dataset requires Xarray and access to the underlying data files.
df = ecat.catalogue(dataset='cmip6', read_everything=True)Dataset configurations (root paths, directory structures, filename structures) are defined in datasets_{machine}.json. To add support for a new dataset on an existing machine, add an entry to the relevant JSON file following the same structure as the existing ones.
Thanks goes to these wonderful people (emoji key):
Scott Hosking 💻 |
TomBracegirdle 💻 |
Tony Phillips 💻 |
Charles H. Simpson 💻 |
This project follows the all-contributors specification. Contributions of any kind welcome!