07 Demo: ERA5 Data Download#
UW Geospatial Data Analysis
CEE467/CEWA567
David Shean
Climate reanalysis#
Nice introduction: https://climate.copernicus.eu/climate-reanalysis
“Climate reanalyses combine past observations with models to generate consistent time series of multiple climate variables. Reanalyses are among the most-used datasets in the geophysical sciences. They provide a comprehensive description of the observed climate as it has evolved during recent decades, on 3D grids at sub-daily intervals. “
ERA5#
ERA5 = “ECMWF ReAnalysis 5”
ECMWF = “European Centre for Medium-Range Weather Forecasts”
https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-v5
“ERA5 provides hourly estimates of a large number of atmospheric, land and oceanic climate variables. The data cover the Earth on a 30km grid and resolve the atmosphere using 137 levels from the surface up to a height of 80km.”
“ERA5 combines vast amounts of historical observations into global estimates using advanced modelling and data assimilation systems.”
Variables#
Hundreds of output variables for each hourly timestep. See a list of all of the available variables:
https://apps.ecmwf.int/codes/grib/param-db
Resolution#
The ERA5 HRES (High Resolution) data have a native resolution of 0.28125 degrees (31km)
https://confluence.ecmwf.int/display/CKB/ERA5%3A+What+is+the+spatial+reference
The ERA5-Land data have a native resolution of 9 km (~0.08°)
https://confluence.ecmwf.int/display/CKB/ERA5-Land%3A+data+documentation
How many grid cells are required to store one variable (like temperature) for full 72 year record at hourly resolution?#
#Space
s = 360*180*4*4*137
s
142041600
#Time
t = 72*365.25*24
t
631152.0
s*t
89649839923200.0
f'{s*t:e}'
'8.964984e+13'
Additional notes on the ERA5 grid#
The model is actually run using a “reduced gaussian grid” with quasi-equal spacing across the planet:
https://confluence.ecmwf.int/display/CKB/ERA5%3A+What+is+the+spatial+reference
https://www.ecmwf.int/sites/default/files/elibrary/2016/17262-new-grid-ifs.pdf
These values are then interpolated to the regular grid of 0.25° cells in the netCDF files. We will revisit this issue during the exercises.
Data Availability#
From CDS (Climate Data Store)#
For future reference, you can access the ERA5 data directly! The CDS API allows you to request subsets of ERA5 products for desired spatial extent, time periods, time intervals, etc.:
https://cds.climate.copernicus.eu/api-how-to
https://confluence.ecmwf.int/display/CKB/How+to+download+ERA5
Some commonly used products are also available on Amazon S3#
https://registry.opendata.aws/ecmwf-era5/
Shortcut: download sample datasets#
We could submit requests directly from the CDS API, but you will need to create an account and use a unique API key. The server-side processing and download will require at least 5-40 minutes per dataset.
For this lab, I submitted some requests to prepare sample ERA5 datasets. The scripts are available in the cds_scripts subdirectory. But as a shortcut, we will download these datasets, which were staged in public data archive.
Zenodo#
Zenodo is a great, free, permanent data archiving solution: https://about.zenodo.org/
Lab09 Zenodo record#
https://zenodo.org/record/6302343
Three main files needed for the Lab07 notebooks. Original datasets from CDS are also archived.
Notebook 1: ‘climatology_0.25g_ea_2t.nc’, ‘1month_anomaly_Global_ea_2t.nc’
Notebook 2: ‘WA_ERA5-Land_hourly_1950-2022_6hr.nc’
Check disk space!#
Before running, open a terminal on the hub and run the following command
df -h ~. Should report something like this:
Filesystem Size Used Avail Use% Mounted on
/dev/sdf 50G 41G 9.7G 81% /home/jovyan
You will need ~4.5 GB available for these data products
If you don’t have that, you can go back and delete some of the products from previous labs that are no longer needed, or can be easily downloaded again
!df -h ~
Filesystem Size Used Avail Use% Mounted on
/dev/sdc 1007G 136G 821G 15% /
import os
from pathlib import Path
era5_data_dir = f'{Path.home()}/gda_demo_data/era5_data'
if not os.path.exists(era5_data_dir):
os.makedirs(era5_data_dir)
base_url = 'https://zenodo.org/record/6302343/files/'
fn_list = ['climatology_0.25g_ea_2t.nc', \
'1month_anomaly_Global_ea_2t.nc', \
#'WA_ERA5-Land_hourly_1950-2022_6hr.nc'
]
url_list = [base_url+fn for fn in fn_list]
#For parallel download from command line:
#url_list_str = ' '.join(url_list)
url_list
['https://zenodo.org/record/6302343/files/climatology_0.25g_ea_2t.nc',
'https://zenodo.org/record/6302343/files/1month_anomaly_Global_ea_2t.nc']
for url in url_list:
!wget -nc -P {era5_data_dir} {url}
--2025-02-18 18:58:00-- https://zenodo.org/record/6302343/files/climatology_0.25g_ea_2t.nc
Resolving zenodo.org (zenodo.org)... 188.185.45.92, 188.185.48.194, 188.185.43.25, ...
Connecting to zenodo.org (zenodo.org)|188.185.45.92|:443... connected.
HTTP request sent, awaiting response... 301 MOVED PERMANENTLY
Location: /records/6302343/files/climatology_0.25g_ea_2t.nc [following]
--2025-02-18 18:58:00-- https://zenodo.org/records/6302343/files/climatology_0.25g_ea_2t.nc
Reusing existing connection to zenodo.org:443.
HTTP request sent, awaiting response... 200 OK
Length: 49874647 (48M) [application/octet-stream]
Saving to: ‘/home/eric/gda_demo_data/era5_data/climatology_0.25g_ea_2t.nc’
climatology_0.25g_e 100%[===================>] 47.56M 10.2MB/s in 5.6s
2025-02-18 18:58:06 (8.51 MB/s) - ‘/home/eric/gda_demo_data/era5_data/climatology_0.25g_ea_2t.nc’ saved [49874647/49874647]
--2025-02-18 18:58:06-- https://zenodo.org/record/6302343/files/1month_anomaly_Global_ea_2t.nc
Resolving zenodo.org (zenodo.org)... 188.185.48.194, 188.185.43.25, 188.185.45.92, ...
Connecting to zenodo.org (zenodo.org)|188.185.48.194|:443... connected.
HTTP request sent, awaiting response... 301 MOVED PERMANENTLY
Location: /records/6302343/files/1month_anomaly_Global_ea_2t.nc [following]
--2025-02-18 18:58:07-- https://zenodo.org/records/6302343/files/1month_anomaly_Global_ea_2t.nc
Reusing existing connection to zenodo.org:443.
HTTP request sent, awaiting response... 200 OK
Length: 2147121539 (2.0G) [application/octet-stream]
Saving to: ‘/home/eric/gda_demo_data/era5_data/1month_anomaly_Global_ea_2t.nc’
1month_anomaly_Glob 100%[===================>] 2.00G 19.4MB/s in 1m 57s
2025-02-18 19:00:04 (17.6 MB/s) - ‘/home/eric/gda_demo_data/era5_data/1month_anomaly_Global_ea_2t.nc’ saved [2147121539/2147121539]
!ls -lh $era5_data_dir
total 2.1G
-rw-r--r-- 1 eric eric 2.0G Feb 18 19:00 1month_anomaly_Global_ea_2t.nc
-rw-r--r-- 1 eric eric 48M Feb 18 18:58 climatology_0.25g_ea_2t.nc