ewatercycle logo

PCRGlobWB example use case#

This example shows how the PCRGlobWB model can be used within the eWaterCycle system. It is assumed you have already seen this tutorial notebook explaining how to run the simple HBV model for the River Leven at Newby Bridge.

The PCRGlobWB model is an example of a distributed model where fluxes and stores in the balance are calculated for grid cells (often also called pixels). This requires both the forcing data as well as any parameters to also be spatially distributed. Depending on the complexity of the model, these datasets can be quite large in memory size.

Here we will be running PCRGLobWB for Great Brittain and will extract discharge data at the location of the River Leven again, to compare with the HBV model run. We will also demonstrate how to interact with the state of the model, during runtime, showcasing the benefit of using the BMI interface when building experiments using models.

# This cell is only used to suppress some distracting output messages
import warnings

warnings.filterwarnings("ignore", category=UserWarning)
import matplotlib.pyplot as plt
from cartopy import crs
from cartopy import feature as cfeature
import fiona
import shapely.geometry
from pyproj import Geod
#from rich import print
import pandas as pd
import xarray as xr

import ewatercycle.forcing
import ewatercycle.models
import ewatercycle.parameter_sets
station_latitude = 54.26935849558577 #Newby Bridge location from Google Maps
station_longitude = -2.9710855713537745 

Loading a parameter set#

For this example we have prepared and hosted a global parameter set made by Utrecht University. For each model run, what needs to be specified to deliniate the region of interest is a “clone map”. The config file has many options, one of which is the location of this clone map.

Note that this is very specific to PCRGlobWB. For complex (and legacy) models like PCRGlobWB one needs to know quite detailed information about the model before being able to run it. However, using eWaterCycle does reduce the time for seting up the model and getting it to run.

parameter_set = ewatercycle.parameter_sets.ParameterSet(
    name="custom_parameter_set",
    directory="/home/shared/pcrglobwb_global",
    config="./pcrglobwb_uk_05min.ini",
    target_model="pcrglobwb",
    supported_model_versions={"setters"},
)
print(parameter_set)
Parameter set
-------------
name=custom_parameter_set
directory=/home/shared/pcrglobwb_global
config=pcrglobwb_uk_05min.ini
doi=N/A
target_model=pcrglobwb
supported_model_versions={'setters'}
downloader=None

Load forcing data#

For this example case, the forcing is generated in this seperate notebook. This is a common practice when generating forcing takes considerable (CPU, memory, disk) resources.

In the cell below, we load the pre-generated forcing. Note that in contrast with HBV, PCRGlobWB only needs temperature and precipitation as forcing inputs. HBV also needs potential evaporation. PCRGlobWB calculated potential and actual evaporation as part of its update step.

forcing = ewatercycle.forcing.sources["PCRGlobWBForcing"].load(
    directory="/home/rhut/forcing/UK/work/diagnostic/script",
)
print(forcing)
start_time='1997-08-01T00:00:00Z' end_time='2000-08-31T00:00:00Z' directory=PosixPath('/home/rhut/forcing/UK/work/diagnostic/script') shape=PosixPath('/home/rhut/forcing/UK/work/diagnostic/script/camelsgb_73010.shp') filenames={} precipitationNC='pcrglobwb_OBS6_ERA5_reanaly_1_day_pr_1997-2000_camelsgb_73010.nc' temperatureNC='pcrglobwb_OBS6_ERA5_reanaly_1_day_tas_1997-2000_camelsgb_73010.nc'

Setting up the model#

Note that the model version and the parameterset versions should be compatible.

pcrglob = ewatercycle.models.PCRGlobWB(
    parameter_set=parameter_set,
    forcing=forcing
)
print(pcrglob)
parameter_set=ParameterSet(name='custom_parameter_set', directory=PosixPath('/home/shared/pcrglobwb_global'), config=PosixPath('pcrglobwb_uk_05min.ini'), doi='N/A', target_model='pcrglobwb', supported_model_versions={'setters'}, downloader=None) forcing=PCRGlobWBForcing(start_time='1997-08-01T00:00:00Z', end_time='2000-08-31T00:00:00Z', directory=PosixPath('/home/rhut/forcing/UK/work/diagnostic/script'), shape=PosixPath('/home/rhut/forcing/UK/work/diagnostic/script/camelsgb_73010.shp'), filenames={}, precipitationNC='pcrglobwb_OBS6_ERA5_reanaly_1_day_pr_1997-2000_camelsgb_73010.nc', temperatureNC='pcrglobwb_OBS6_ERA5_reanaly_1_day_tas_1997-2000_camelsgb_73010.nc')
pcrglob.version
'setters'

eWaterCycle exposes a selected set of configurable parameters. These can be modified in the setup() method.

print(pcrglob.parameters)
dict_items([('start_time', '1997-08-01T00:00:00Z'), ('end_time', '1997-08-01T00:00:00Z'), ('routing_method', 'accuTravelTime'), ('max_spinups_in_years', '0')])

Calling setup() will start up the model container. Be careful with calling it multiple times!

cfg_file, cfg_dir = pcrglob.setup(
    end_time="2000-08-31T00:00:00Z",
    max_spinups_in_years=0
)
cfg_file, cfg_dir
('/home/rhut/repos/eurocsdms_workshop/book/oneModel/additional/pcrglobwb/pcrglobwb_20241029_183731/pcrglobwb_ewatercycle.ini',
 '/home/rhut/repos/eurocsdms_workshop/book/oneModel/additional/pcrglobwb/pcrglobwb_20241029_183731')
print(pcrglob.parameters)
dict_items([('start_time', '1997-08-01T00:00:00Z'), ('end_time', '2000-08-31T00:00:00Z'), ('routing_method', 'accuTravelTime'), ('max_spinups_in_years', '0')])

Note that the parameters have been changed. A new config file which incorporates these updated parameters has been generated as well. If you want to see or modify any additional model settings, you can acces this file directly. When you’re ready, pass the path to the config file to initialize().

pcrglob.initialize(cfg_file)

We prepare a small dataframe where we can store the discharge output from the model

time = pd.date_range(pcrglob.start_time_as_isostr, pcrglob.end_time_as_isostr)
timeseries = pd.DataFrame(
    index=pd.Index(time, name="time"), columns=["PCRGlobWB: Leven"]
)
timeseries.head()
PCRGlobWB: Leven
time
1997-08-01 00:00:00+00:00 NaN
1997-08-02 00:00:00+00:00 NaN
1997-08-03 00:00:00+00:00 NaN
1997-08-04 00:00:00+00:00 NaN
1997-08-05 00:00:00+00:00 NaN

Running the model#

Simply running the model from start to end is straightforward. At each time step we can retrieve information from the model.

while pcrglob.time < pcrglob.end_time:
    pcrglob.update()

    # Track discharge at station location
    discharge_at_station = pcrglob.get_value_at_coords(
        "discharge", lat=[station_latitude], lon=[station_longitude]
    )
    time = pcrglob.time_as_isostr
    timeseries["PCRGlobWB: Leven"][time] = discharge_at_station[0]

    # Show progress
    print(time,end='\r')  # "\r" clears the output before printing the next timestamp
2000-08-31T00:00:00Z

Interacting with the model#

PCRGlobWB exposes many variables. Just a few of them are shown here:

list(pcrglob.output_var_names)[-15:-5]
['upper_soil_transpiration',
 'snow_water_equivalent',
 'total_runoff',
 'transpiration_from_irrigation',
 'fraction_of_surface_water',
 'bottom_elevation_of_uppermost_layer',
 'industry_water_withdrawal',
 'relativeGroundwaterHead',
 'total_fraction_water_allocation',
 'groundwater_volume_estimate']

Model fields can be fetched as xarray objects (or as flat numpy arrays using get_value()):

da = pcrglob.get_value_as_xarray("discharge")
da.thin(5)  # only show every 5th value in each dim
<xarray.DataArray (longitude: 29, latitude: 22)> Size: 5kB
array([[0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.36099541e+00,
        1.14954865e+00, 1.63362265e+00, 5.81284702e-01, 1.03070605e+00,
        7.93194473e-01, 1.05944443e+00, 1.03646994e+00, 5.14641225e-01,
        5.14155090e-01, 1.09502316e-01],
       [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        1.22899306e+00, 0.00000000e+00, 9.28414345e-01, 1.26675928e+00,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.72627318e+00,
        5.43769693e+00, 4.03241968e+00, 1.58971059e+00, 1.55043983e+00,
        1.14820147e+00, 6.18921757e-01],
       [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        2.72685178e-02, 9.35214138e+00, 3.50451380e-01, 1.46504626e-01,
        8.38614559e+00, 2.71886587e-01],
       [0.00000000e+00, 0.00000000e+00, 6.52777791e-01, 0.00000000e+00,
        0.00000000e+00, 1.04513891e-01, 0.00000000e+00, 0.00000000e+00,
...
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00],
       [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00],
       [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00],
       [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        1.59323442e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00]])
Coordinates:
  * latitude   (latitude) float64 176B -5.958 -5.542 -5.125 ... 2.375 2.792
  * longitude  (longitude) float64 232B 49.04 49.46 49.88 ... 59.88 60.29 60.71

Xarray makes it very easy to plot the data. In the figure below, we add a cross at the location where we collected the discharge every timestep: Leven at Newby bridge.

fig = plt.figure(dpi=120)
ax = fig.add_subplot(111, projection=crs.PlateCarree())
da.plot(ax=ax, cmap="GnBu")

# Overlay ocean and coastines
ax.add_feature(cfeature.OCEAN)
ax.add_feature(cfeature.RIVERS, color="k")
ax.coastlines()

# Add a red cross marker at the location of the Leven River at Newby Bridge
ax.scatter(station_longitude, station_latitude, s=250, c="r", marker="x", lw=2)
<matplotlib.collections.PathCollection at 0x7f6328792c30>
../../../_images/797b3e07c66f4724789bde0634c92d8a84a24e456fe132de59f7d46349d45a6d.png

We can get (or set) the values at custom points as well:

#extra
timeseries.plot()
<Axes: xlabel='time'>
../../../_images/c16a33076e938e1aa0e343a4b9da6c717be32f6ef84fbf1d8dd4f112552a7e42.png

We of course want to compare this both to observations as well as to the result of the HBV model.

prepared_forcing_path_caravan_central = "/data/eurocsdms-data/forcing/camelsgb_73010/caravan"
camelsgb_forcing = ewatercycle.forcing.sources['CaravanForcing'].load(directory=prepared_forcing_path_caravan_central)
xr_camelsgb_forcing = xr.open_dataset(camelsgb_forcing['Q'])
xr_hbv_model_output = xr.open_dataset('~/river_discharge_data.nc')
timeseries.plot()
xr_camelsgb_forcing["Q"].plot(label="Observed discharge")
xr_hbv_model_output['Modelled_discharge'].plot(label="modelled discharge HBV")
plt.legend()
<matplotlib.legend.Legend at 0x7f620a883740>
../../../_images/ddec3ec8f7c3b46484bb5a449ee3e0cd8faa9b33252031b78722379f4fd4e188.png

HUH???#

Oh, wait…

PCRGlobWB reports in m3/s and HBV and the observation are in mm/day. So we need to multiply the HBV and observational data with the area of the catchment, devide by 1000 (to get to meters) and devide by 86400 to get from days to seconds.

# flux_out_Q unit conversion factor from mm/day to m3/s
conversion_mmday2m3s = 1 / (1000 * 86400)
shape = fiona.open(camelsgb_forcing.shape)
poly = [shapely.geometry.shape(p["geometry"]) for p in shape][0]
geod = Geod(ellps="WGS84")
poly_area, poly_perimeter = geod.geometry_area_perimeter(poly)
catchment_area_m2 = abs(poly_area)
print(catchment_area_m2)
248130958.76862004
xr_camelsgb_forcing["Q"]  = xr_camelsgb_forcing["Q"] * conversion_mmday2m3s * catchment_area_m2
xr_hbv_model_output['Modelled_discharge'] = xr_hbv_model_output['Modelled_discharge']* conversion_mmday2m3s * catchment_area_m2
timeseries.plot()
xr_camelsgb_forcing["Q"].plot(label="Observed discharge")
xr_hbv_model_output['Modelled_discharge'].plot(label="modelled discharge HBV")
plt.legend()
<matplotlib.legend.Legend at 0x7f6208e18770>
../../../_images/c55b37dd4005cb2feb16db37e6945599b144c4201e9348b86d52fc0f2efc0add.png

Still doesn’t look to good for PCRGlobWB. This is because for this small area we are only looking at 10-ish pixels and most likely there is a big mismatch between the pixels that drain through the outlet in PCRGlobWB and the actual catchment. Please ask Rolf for details

Cleaning up#

Models usually perform some “wrap up tasks” at the end of a model run, such as writing the last outputs to disk and releasing memory. In the case of eWaterCycle, another important teardown task is destroying the container in which the model was running. This can free up a lot of resources on your system. Therefore it is good practice to always call finalize() when you’re done with an experiment.

pcrglob.finalize()