ClimSim: An open large-scale dataset for training high-resolution physics emulators in hybrid multi-scale climate simulators

Dataset: E3SM-MMF High-Resolution Real Geography Dataset: E3SM-MMF Low-Resolution Real Geography Dataset: E3SM-MMF Low-Resolution Aquaplanet

ClimSim: An open large-scale dataset for training high-resolution physics emulators in hybrid multi-scale climate simulators#

ClimSim is the largest-ever dataset designed for hybrid ML-physics research. It comprises multi-scale climate simulations, developed by a consortium of climate scientists and ML researchers. It consists of 5.7 billion pairs of multivariate input and output vectors that isolate the influence of locally-nested, high-resolution, high-fidelity physics on a host climate simulator’s macro-scale physical state. The dataset is global in coverage, spans multiple years at high sampling frequency, and is designed such that resulting emulators are compatible with downstream coupling into operational climate simulators. We implement a range of deterministic and stochastic regression baselines to highlight the ML challenges and their scoring.

fig_1

Getting Started#

Models and Evaluation#

Demo Notebooks#

Project Structure#

Code Repository

References#