Tutorial 2: Model-Observation comparison with MOM6 and a single Argo float from CrocoLake

Tutorial 2: Model-Observation comparison with MOM6 and a single Argo float from CrocoLake#

The goal of this tutorial is to learn how to generate your own obs_seq files from CrocoLake and learn about another interactive widget: the interactive profile. We will select a specific Argo float, interpolate our model output to the temperature and salinity values of that float, and compare the interpolated model and the float’s profiles.

Installing CrocoCamp#

If you don’t have CrocoCamp set up yet, here are the instructions to install it on NCAR’s HPC. If you have any issues or need to install it on a different machine, please contact enrico.milanese@whoi.edu.

Generating your own obs_seq.in files#

CrocoLake is a flexible parquet dataset that is fast and easy to access and filter. For this workshop, a copy of the CrocoLake version with physical variables is stored at /glade/campaign/cgd/oce/projects/CROCODILE/workshops/2025/CrocoCamp/CrocoLakePHY:

crocolake_path = '/glade/campaign/cgd/oce/projects/CROCODILE/workshops/2025/CrocoCamp/CrocoLakePHY'

import os
basename = "myCL_obs_seq_"
outdir = "$WORK/crocodile_2025/CrocoCamp/tutorial_2/in_CL/"
basename = os.path.expandvars(outdir+basename)
outdir = os.path.expandvars(outdir)

In the following cell you have the code that I used to generate the observation files that we used in Tutorial 1. Have a look at the code, in particular at how you to select variables and define filters. Once you’re ready, create a new cell below, copy-paste the code, and add a filter so that you only store the data recorded by the Argo float with identifier 1900256. Also change the file name to add the float number to it. Note: in CrocoLake, the platform identifier is stored as a string in the variable 'PLATFORM_NUMBER'.

import datetime
from convert_crocolake_obs import ObsSequence

# define horizontal region
LAT0 = 5
LAT1 = 60
LON0 = -100
LON1 = -30

# define variables to import from CrocoLake
selected_variables = [
    "DB_NAME",  # ARGO, GLODAP, SprayGliders, OleanderXBT, Saildrones
    "JULD", # this contains timestamp
    "LATITUDE",
    "LONGITUDE",
    "PRES", # This will be automatically converted to depths in meters
    "TEMP",
    "PRES_QC",
    "TEMP_QC",
    "PRES_ERROR",
    "TEMP_ERROR",
    "PSAL",
    "PSAL_QC",
    "PSAL_ERROR"
]

# month and year are constant in out case
year0 = 2010
month0 = 5

# we loop to generate one file per day
for j in range(10):

    # set date range
    day0 = 1+j
    day1 = day0+1
    date0 = datetime.datetime(year0, month0, day0, 0, 0, 0)
    date1 = datetime.datetime(year0, month0, day1, 0, 0, 0)
    print(f"Converting obs between {date0} and {date1}")

    # this defines AND filters, i.e. we want to load each observation that has latitude within the given range AND longitude within the given range, etc.
    # to exclude NaNs, impose a range to a variable
    and_filters = (
        ("LATITUDE",'>',LAT0),  ("LATITUDE",'<',LAT1),
        ("LONGITUDE",'>',LON0), ("LONGITUDE",'<',LON1),
        ("PRES",'>',-1e30), ("PRES",'<',1e30),
        ("JULD",">",date0), ("JULD","<",date1),
    )

    # this adds OR conditions to the and_filters, i.e. we want to load all observations that statisfy the AND conditions above, AND that have finite salinity OR temperature values
    db_filters = [
        list(and_filters) + [("PSAL", ">", -1e30), ("PSAL", "<", 1e30)],
        list(and_filters) + [("TEMP", ">", -1e30), ("TEMP", "<", 1e30)],
    ]

    # generate output filename
    obs_seq_out = basename + f".{year0}{month0:02d}{day0:02d}.out"

    # generate obs_seq.in file
    obsSeq = ObsSequence(
        crocolake_path,
        selected_variables,
        db_filters,
        obs_seq_out=obs_seq_out,
        loose=True
    )
    obsSeq.write_obs_seq()

The following cell contains the solution:

import datetime
from convert_crocolake_obs import ObsSequence

# define horizontal region
LAT0 = 5
LAT1 = 60
LON0 = -100
LON1 = -30

# define variables to import from CrocoLake
selected_variables = [
    "DB_NAME",  # ARGO, GLODAP, SprayGliders, OleanderXBT, Saildrones
    "JULD", # this contains timestamp
    "LATITUDE",
    "LONGITUDE",
    "PRES", # This will be automatically converted to depths in meters
    "TEMP",
    "PRES_QC",
    "TEMP_QC",
    "PRES_ERROR",
    "TEMP_ERROR",
    "PSAL",
    "PSAL_QC",
    "PSAL_ERROR"
]

# month and year are constant in out case
year0 = 2010
month0 = 5
wmo_id = str(1900256)
basename = basename + f".ARGO_{wmo_id}"
if not os.path.exists(outdir):
    os.makedirs(outdir, exist_ok=True)

# we loop to generate one file per day
for j in range(10):

    # set date range
    day0 = 1+j
    day1 = day0+1
    date0 = datetime.datetime(year0, month0, day0, 0, 0, 0)
    date1 = datetime.datetime(year0, month0, day1, 0, 0, 0)
    print(f"Converting obs between {date0} and {date1}")

    # this defines AND filters, i.e. we want to load each observation that has latitude within the given range AND longitude within the given range, etc.
    # to exclude NaNs, impose a range to a variable
    and_filters = (
        ("LATITUDE",'>',LAT0),  ("LATITUDE",'<',LAT1),
        ("LONGITUDE",'>',LON0), ("LONGITUDE",'<',LON1),
        ("PRES",'>',-1e30), ("PRES",'<',1e30),
        ("JULD",">",date0), ("JULD","<",date1),
        ("PLATFORM_NUMBER","==",wmo_id)
    )

    # this adds OR conditions to the and_filters, i.e. we want to load all observations that statisfy the AND conditions above, AND that have finite salinity OR temperature values
    db_filters = [
        list(and_filters) + [("PSAL", ">", -1e30), ("PSAL", "<", 1e30)],
        list(and_filters) + [("TEMP", ">", -1e30), ("TEMP", "<", 1e30)],
    ]

    # generate output filename
    obs_seq_out = basename + f".{year0}{month0:02d}{day0:02d}.out"

    # generate obs_seq.in file
    obsSeq = ObsSequence(
        crocolake_path,
        selected_variables,
        db_filters,
        obs_seq_out=obs_seq_out,
        loose=True
    )
    obsSeq.write_obs_seq()

The previous cell should have generate one obs_seq file, as the float that we selected recorded data only on one day between 2010-05-01 and 2010-05-10. We will use this generate file in the model-obs comparison in the following.

The configuration file#

The configuration file config_tutorial_2.yaml has already all the paths set. Make sure that the output is being saved where you wish.

Running the workflow#

Running the workflow to interpolate the model is identical as in Tutorial 1 (remember to adapt the config file name in from_config_file()!): the obs_seq files that we generated only contain the float’s measurements, so only those observtions will be used by DART’s perfect_model_obs to perform the interpolation.

from crococamp.workflows import WorkflowModelObs

# interpolate model onto obs space for the single float
workflow_float = WorkflowModelObs.from_config_file('config_tutorial_2.yaml')
workflow_float.run() #use flag clear_output=True if you want to re-run it and automatically clean all previous output

Displaying the interactive map#

We can load the data with the same tools as in Tutorial 1. Note that in this case all interpolations were succesfull, so get_good_model_obs_df() and get_all_model_obs() will return the same dataframe (get_failed_model_obs() will return an empty dataframe).

good_model_obs_df = workflow_float.get_good_model_obs_df(compute=True) # compute=True triggers the compute of the dask dataframe, returning a pandas dataframe with data loaded in memory
good_model_obs_df.head()                                                   # displays first 5 rows in the dataframe

Now load the interactive map as in Tutorial 1. What do you see?

from crococamp.viz import InteractiveWidgetMap

# Create an interactive map widget to visualize model-observation comparisons
# The widget provides controls for selecting variables, observation types, and time ranges
widget = InteractiveWidgetMap(good_model_obs_df)
widget.setup()

The previous cell should show a dot in the middle of the ocean, which is not very informative. We can then use MapConfig to pass extra arguments to InteractiveWidgetMap, among which the extent of the map area to plot. Play with the map_extent values in the cell below and re-execute the cell until you’re happy with your plotted region.

from crococamp.viz import MapConfig

map_config = MapConfig(
    map_extent=(-90,0,0,90) #(lon_min, lon_max, lat_min, lat_max)
)

# Create an interactive map widget to visualize model-observation comparisons
# The widget provides controls for selecting variables, observation types, and time ranges
widget = InteractiveWidgetMap(good_model_obs_df, config=map_config)
widget.setup()

Interactive profile#

Finally, we can load the interactive profile widget to explore how our model performed compared to the specific float measurements we selected:

from crococamp.viz import InteractiveWidgetProfile
# Create an interactive profile widget to analyze vertical profiles
# This is ideal for analyzing single float/CTD profiles vs model data
widget = InteractiveWidgetProfile(good_model_obs_df)
widget.setup()

You can also pass some settings through the config argument of the InteractiveWidgetProfile class:

from crococamp.viz import ProfileConfig

# Customize the profile widget appearance and behavior
profile_config = ProfileConfig(
    figure_size=(7, 7),
    marker_size=5,
    marker_alpha=0.6,
    invert_yaxis=False,  # Don't invert for this example
    grid=True
)

# Create widget with custom configuration
widget = InteractiveWidgetProfile(good_model_obs_df, x='obs', y='interpolated_model', config=profile_config)
widget.setup()