ESMcat

ESMcat is a Python package for working with large climate model datasets. It provides a catalogue system for indexing and filtering CMIP6 datasets.

Requirements

Python 3.7+
pandas
NumPy
pyarrow
Xarray

Installation

Install directly from GitHub:

pip install git+https://github.com/scotthosking/ESMcat.git

Or clone and install in editable mode for development:

git clone https://github.com/scotthosking/ESMcat.git
cd ESMcat
pip install -e .

Machine configuration

ESMcat uses a config file at ~/.esmcat/config.json to know which machine you are on, and therefore which dataset paths and catalogue files to use.

Set your machine on first use:

import esmcat as ecat
ecat.set_config('jasmin')

This writes {"machine": "jasmin"} to ~/.esmcat/config.json. ESMcat will then load datasets_jasmin.json from the package for dataset root paths and directory structures.

To check your current config:

ecat.get_config()

To add support for a new machine, create a datasets_{machine}.json file in the esmcat/ package directory following the same structure as datasets_jasmin.json.

Catalogue files

ESMcat uses pre-built catalogue files (Parquet format) stored in ~/.esmcat/. These index the files available on your system for each dataset.

Bundled catalogue files are included in the package under esmcat/catalogues/ and are copied to ~/.esmcat/ automatically on first use. Currently bundled:

Dataset	Coverage
`cmip6`	CMIP and ScenarioMIP activities

To rebuild a catalogue from scratch (e.g. after new data has been added to the archive):

ecat.catalogue(dataset='cmip6', refresh=True)

Existing CSV catalogues are automatically migrated to Parquet on first use.

Usage

Filter the CMIP6 catalogue

import esmcat as ecat

df = ecat.catalogue(dataset='cmip6',
                  Experiment='historical',
                  Var=['tas', 'pr'],
                  CMOR='Amon')

print(df.head())

Use CMOR to select frequency and realm (e.g. Amon for monthly atmosphere, day for daily).

Available columns

Column	Description	Example values
`MIP`	CMIP6 activity	`CMIP`, `ScenarioMIP`
`Centre`	Modelling centre	`MOHC`, `CNRM-CERFACS`
`Model`	Model name	`HadGEM3-GC31-LL`, `CNRM-ESM2-1`
`Experiment`	Experiment ID	`historical`, `ssp245`, `ssp585`
`RunID`	Ensemble member	`r1i1p1f1`, `r2i1p1f2`
`CMOR`	CMOR table (encodes frequency and realm)	`Amon`, `Omon`, `day`, `fx`
`Var`	Variable name	`tas`, `pr`, `tos`
`Grid`	Grid label	`gn` (native), `gr` (regridded)
`Version`	Data version	`v20190621`
`StartDate`	Start date of files (YYYYMMDD)	`19500101`
`EndDate`	End date of files (YYYYMMDD)	`21001231`
`Path`	Relative path to data directory
`DataFiles`	Semicolon-separated list of filenames

Open a dataset

Pass a single-row catalogue entry to ecat.open_dataset() to load it as an Xarray Dataset. Multiple files (e.g. a variable split across decades) are combined automatically via xarray.open_mfdataset.

import esmcat as ecat

catlg = ecat.catalogue(dataset='cmip6',
                     Experiment='historical',
                     Var='tas',
                     CMOR='Amon',
                     Model='HadGEM3-GC31-LL',
                     RunID='r1i1p1f3')

ds = ecat.open_dataset(catlg.iloc[0])
print(ds)

To loop over multiple variables:

for _, row in catlg.iterrows():
    ds = ecat.open_dataset(row)

ecat.open_dataset requires Xarray and access to the underlying data files.

Read everything (bypass default filters)

df = ecat.catalogue(dataset='cmip6', read_everything=True)

Adding or editing datasets

Dataset configurations (root paths, directory structures, filename structures) are defined in datasets_{machine}.json. To add support for a new dataset on an existing machine, add an entry to the relevant JSON file following the same structure as the existing ones.

Contributors ✨

Thanks goes to these wonderful people (emoji key):

_{Scott Hosking}
💻

_{TomBracegirdle}
💻

_{Tony Phillips}
💻

_{Charles H. Simpson}
💻

This project follows the all-contributors specification. Contributions of any kind welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 308 Commits
esmcat		esmcat
notebooks		notebooks
.all-contributorsrc		.all-contributorsrc
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ESMcat

Requirements

Installation

Machine configuration

Catalogue files

Usage

Filter the CMIP6 catalogue

Available columns

Open a dataset

Read everything (bypass default filters)

Adding or editing datasets

Contributors ✨

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ESMcat

Requirements

Installation

Machine configuration

Catalogue files

Usage

Filter the CMIP6 catalogue

Available columns

Open a dataset

Read everything (bypass default filters)

Adding or editing datasets

Contributors ✨

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages