Dataset Information

Dataset Information#

Dataset: E3SM-MMF High-Resolution Real Geography Dataset: E3SM-MMF Low-Resolution Real Geography Dataset: E3SM-MMF Low-Resolution Aquaplanet

Data from multi-scale climate model (E3SM-MMF) simulations were saved at 20-minute intervals for 10 simulated years. Two netCDF files–input and output (target)–are produced at each timestep, totaling 525,600 files for each configuration. 3 configurations of E3SM-MMF were run and can be downloaded from Hugging Face:

  1. High-Resolution Real Geography

    • 1.5° x 1.5° horizontal resolution (21,600 grid columns)

    • 5.7 billion total samples (41.2 TB)

    • 102 MB per input file, 61 MB per output file

  2. Low-Resolution Real Geography

    • 11.5° x 11.5° horizontal resolution (384 grid columns)

    • 100 million total samples (744 GB)

    • 1.9 MB per input file, 1.1 MB per output file

  3. Low-Resolution Aquaplanet

    • 11.5° x 11.5° horizontal resolution (384 grid columns)

    • 100 million total samples (744 GB)

    • 1.9 MB per input file, 1.1 MB per output file

Input files are labeled E3SM-MMF.mli.YYYY-MM-DD-SSSSS.nc, where YYYY-MM-DD-SSS corresponds to the simulation year (YYYY), month (MM), day of the month (DD), and seconds of the day (SSSSS), with timesteps being spaced 1,200 seconds (20 minutes) apart. Target files are labeled the same way, except mli is replaced by mlo. Scalar variables vary in time and “horizontal” grid (ncol), while vertically-resolved variables vary additionally in vertical space (lev). For vertically-resolved variables, lower indices of lev corresponds to higher levels in the atmosphere. This is because pressure decreases monotonically with altitude.

The full list of variables can be found in Supplementary Information, Table 1.

There is also a Quickstart dataset that contains subsampled and prenormalized data. This data was used for training, validation, and metrics for the ClimSim paper and can be reproduced using the preprocessing/create_npy_data_splits.ipynb notebook.