Tutorial 1: Model-Observation comparison with MOM6 and CrocoLake

Tutorial 1: Model-Observation comparison with MOM6 and CrocoLake#

The goal of this tutorial is to get familiar with the basics of using CrocoCamp to interpolate MOM6 output onto the space of observations stored in CrocoLake. For this tutorial we will use output from a MOM6 run already stored on NCAR’s HPC system, and CrocoLake observation files already in obs_sequence format and also stored on NCAR’s HPC system.

Installing CrocoCamp#

If you don’t have CrocoCamp set up yet, here are the instructions to install it on NCAR’s HPC. If you have any issues or need to install it on a different machine, please contact enrico.milanese@whoi.edu.

Running the workflow#

Running the workflow to interpolate the model is quite simple: create a WorkflowModelObs instance using a configuration file, and then call its run() method. For this tutorial we will use the configuration file provided config_tutorial_1.yaml in the tutorial folder. A template file to use as reference is also provided at ../configs/config_template.yaml.

While running, CrocoCamp generates temporary input files that tell DART’s perfect_model_obs executable where to find MOM6 and CrocoLake data to perform the interpolation. You need to be running this notebook on Casper, as the configuration file points to DART’s installation on that machine.

from crococamp.workflows import WorkflowModelObs

# Create and run workflow to interpolate MOM6 model onto World Ocean Database obs space
workflow_crocolake = WorkflowModelObs.from_config_file('config_tutorial_1.yaml')
workflow_crocolake.run() #use flag clear_output=True if you want to re-run it and automatically clean all previous output

Displaying the interactive map#

CrocoCamp generates a parquet dataset that contains the values of the WOD observations, the MOM6 model data interpolated onto the observations space, and some basic statistics.

The dataframe by default is stored to model_obs_df (as a dask dataframe), and CrocoCamp offers tools to access all data, only the successful interpolations, or only the failed interpolations. For now, let’s load only the interpolations that succeeded, which are ~98% of the total (see output message from previous cell):

good_model_obs_df = workflow_crocolake.get_good_model_obs_df(compute=True) # compute=True triggers the compute of the dask dataframe, returning a pandas dataframe with data loaded in memory
good_model_obs_df.head()                                                   # displays first 5 rows in the dataframe

Loading the interactive map to explore the succesfull interpolations is as simple as importing the widget and passing the dataframe to it:

from crococamp.viz import InteractiveWidgetMap
# Create an interactive map widget to visualize model-observation comparisons
# The widget provides controls for selecting variables, observation types, and time ranges
widget = InteractiveWidgetMap(good_model_obs_df)
widget.setup()

Interpolation errors#

The dataframe built by CrocoCamp during WorkflowModelObs.run() contains the column interpolated_model_QC, which stores information about the quality of the interpolation. The value is set by DART’s perfect_model_obs program, and you can find more information about it here and here. In general, for this workflow we want QC≤2, and indeed the method get_good_model_obs_df() that we used earlier uses this criterion.

Let’s now have a look at the failed interpolations by loading the data and plotting it:

failed_model_obs_df = workflow_crocolake.get_failed_model_obs_df(compute=True) # compute=True triggers the compute of the dask dataframe, returning a pandas dataframe with data loaded in memory
failed_model_obs_df.head()                                                     # displays first 5 rows in the dataframe

widget_failed = InteractiveWidgetMap(good_model_obs_df)
widget_failed.setup()

We note two things:

when an interpolation fails, a value of -888888 is assigned to the interpolated model;
from head(), we see values of QC greater than 1000.
the failed observations are close to the model boundaries (north and east) or to land (west);

The following command gives us the unique values of model_QC:

failed_model_obs_df['interpolated_model_QC'].unique()

We should have only values equal to 1018. Values greater than 1000 are given as 1000 + failure_code, with failure_code from here. In this case, the QC flag 1018 indicates that one or more grid points required for the interpolation are not in the basin, so the interpolation cannot be performed. This is consistent with the visual inspection from the map, where we noticed that failed interpolations are closed to model boundary and to the sea/land border.

Finally, if you want to load all the data at once, you can use the following command:

# Load the parquet dataset generated by the workflow above
model_obs_df = workflow_crocolake.get_all_model_obs_df(compute=True) # compute=True triggers the compute of the dask dataframe, returning a pandas dataframe with data loaded in memory
model_obs_df.head() # displays first 5 rows in the dataframe