You can now download ERA5 data over the command line. Here is why that is awesome.
So you need ERA5 data? We have some good news for you. The command line tool era5cli lets you access all CDS ERA5 datasets from the comfort of your terminal. In a single command you can download a multitude of data to, say, validate your climate model, or observe the consequences of wildfires on a variety of meteorological variables. Whatever your purpose, if you need to use ERA5 data, this tool makes accessing it a breeze.
If that is all you needed to be convinced of its benefits, then here is how it works:
- Register at the Copernicus Climate Data Service (CDS), and store the key on your machine
-
Install the tool:
pip install era5cli
-
Run the tool:
era5cli monthly --variables temperature --startyear 2008
(Or any other combination of variables, years, frequency, dataset, statistics… the sky is the limit — or, to be fair, pressure level 1000 is. Check the documentation for all available options.)
Want to get started? era5cli.readthedocs.io has all the details.
Not sure yet? Here are five reasons why era5cli is awesome.
#1: You can access all ERA5 datasets currently available
The main power of era5cli lies in, well, ERA5. Let’s not take the main credit here: ERA5 data is why the tool exists and why we think you should use it. ERA5 represents the state of the art in global reanalysis data. In layman’s terms: the best estimate we have of past weather for each location in the world. Copernicus is adding datasets to the collection, so its power keeps increasing, and era5cli has been keeping up: all current available datasets, from the (preliminary) back-extension stretching all the way back to 1950, to the recently added ERA5-Land data, can be downloaded with era5cli.
Diagram of the family of ERA5 datasets as loaded on 2021-03-11 via ECMWF.
While Copernicus tries to make things easier through a web-interface that compiles your API request (more on that later) based on boxes you tick, the multitude of datasets currently available translates to a multitude of such interfaces.
The command line tool tries to keep things simple: adding a --land
flag is
the only thing required to access ERA5-Land data; a --prelimbe
flag gets you
data from the preliminary back extension. Everything else stays the same.
The latest release of era5cli gives you access to the currently available data,
and the team behind era5cli is actively maintaining the tool so you do not have
to wait when more data becomes available. Make sure to use the latest release to
get up-to-date functionality: you can update era5cli with pip install era5cli
--upgrade
.
#2: No more wrestling with the API
ERA5 data is downloadable via the CDS API, an interface that defines the commands and conventions to get access to the information available. This already makes life easier than pressing buttons for a download. Potentially, an API is the difference between clicking, scrolling, and drag-and-dropping to eternity — and pressing ‘enter’, then sitting back with some coffee while your data is collected. The CDS API has a client implemented in Python, which can be used in a Python script to request ERA5 data directly.
Unfortunately, navigating an API is not without its challenges. Using the CDS API Python module requires basic Python knowledge at the very least, and most likely still a fair bit of clicking, scrolling, and copy-pasting from the CDS website to generate a valid request. And, perhaps crucially, human mistakes are bound to happen. Forgetting to change the name of the output file in your script will retrieve a datafile with mystery content for future-you. Incorrect combinations of options will send you into a back-and-forth of cryptic error messages, or worse, it will still download data, but leave you in the dark about the actual request that was fulfilled.
With era5cli, by contrast, you only need to access the command line. You list the data you want, using the bare minimum of info. The tool checks your instructions, if necessary informs you of any impossible combinations, formulates the correct request, calls the CDS API, and saves your data in a standardized format and with a clear file name that corresponds to the request.
Does that sound vague? Here is an example of a Python script that requests monthly temperature data in 2010, across all pressure levels, and saves it as a netcdf file:
import cdsapi
c = cdsapi.Client()
c.retrieve(
'reanalysis-era5-pressure-levels-monthly-means',
{
'format': 'netcdf',
'product_type': 'monthly_averaged_reanalysis',
'variable': 'temperature',
'pressure_level': [
'1', '2', '3',
'5', '7', '10',
'20', '30', '50',
'70', '100', '125',
'150', '175', '200',
'225', '250', '300',
'350', '400', '450',
'500', '550', '600',
'650', '700', '750',
'775', '800', '825',
'850', '875', '900',
'925', '950', '975',
'1000',
],
'year': '2010',
'month': [
'01', '02', '03',
'04', '05', '06',
'07', '08', '09',
'10', '11', '12',
],
'time': '00:00',
},
'era5_temperature_2010_monthly.nc')
Here is what that same request looks like when made via era5cli on the command line:
era5cli monthly --variables temperature --startyear 2010
Yup. That’s it.
#3: You can automate with ease
Because of the simplicity of making a call to the CDS via era5cli, it is easily incorporated in a script. Any language that can access the command line can use the tool, which means that you are not dependent on Python to automate your downloads. Use your language of choice and include the era5cli command line call in your script.
Furthermore, choosing different types of data to download is easy in the structured input that era5cli requires. Downloading multiple variables and years is a breeze in a single command (see point #4), but also downloading both hourly and monthly data merely require switching out these terms in the era5cli call, while under the hood they request data from different datasets with their own complicated name and product type.
#4: Customize output in a single command
Do you want to download the data from multiple variables over several decades?
era5cli has you covered, and it’s easy as 1, 2, 3. Multiple parameters can be
placed in the same call, which will then download their data to a series of
output files. Multiple years can be included easily using the --startyear
and
--endyear
flags, which splits the data into multiple output files, one per
year — or combine them, with the use of the --merge
flag. Whether you want to
download precipitation data in january of 2006, or everything snow-related from
1980 to now? The download may take longer, but your effort will be the same.
Also, pst, here’s a little trick: run
era5cli info 2Dvars | grep "snow"
to quickly see what variables contain the word ‘snow’, so you can include them
in your call. Instead of 2Dvars
you can of course also search the other
variable options; check era5cli info -h
to see what they are.
#5: You can contribute!
era5cli is an open source effort, and welcomes your contribution! Is there a new feature that we’ve missed? Tell us about it! Do you know how to add it yourself? Even better! Our team is maintaining the software actively, and is very happy to hear from you.