Tutorial 2: Model-Observation comparison with MOM6 and a single Argo float from CrocoLake#
The goal of this tutorial is to learn how to generate your own obs_seq files from CrocoLake and learn about another interactive widget: the interactive profile. We will select a specific Argo float, interpolate our model output to the temperature and salinity values of that float, and compare the interpolated model and the float’s profiles.
Installing CrocoCamp#
If you don’t have CrocoCamp set up yet, here are the instructions to install it on NCAR’s HPC. If you have any issues or need to install it on a different machine, please contact enrico.milanese@whoi.edu.
Generating your own obs_seq.in files#
CrocoLake is a flexible parquet dataset that is fast and easy to access and filter. For this workshop, a copy of the CrocoLake version with physical variables is stored at /glade/campaign/cgd/oce/projects/CROCODILE/workshops/2025/CrocoCamp/CrocoLakePHY
:
crocolake_path = '/glade/campaign/cgd/oce/projects/CROCODILE/workshops/2025/CrocoCamp/CrocoLakePHY'
import os
basename = "myCL_obs_seq_"
outdir = "$WORK/crocodile_2025/CrocoCamp/tutorial_2/in_CL/"
basename = os.path.expandvars(outdir+basename)
outdir = os.path.expandvars(outdir)
In the following cell you have the code that I used to generate the observation files that we used in Tutorial 1. Have a look at the code, in particular at how you to select variables and define filters. Once you’re ready, create a new cell below, copy-paste the code, and add a filter so that you only store the data recorded by the Argo float with identifier 1900256
. Also change the file name to add the float number to it. Note: in CrocoLake, the platform identifier is stored as a string in the variable 'PLATFORM_NUMBER'
.
import datetime
from convert_crocolake_obs import ObsSequence
# define horizontal region
LAT0 = 5
LAT1 = 60
LON0 = -100
LON1 = -30
# define variables to import from CrocoLake
selected_variables = [
"DB_NAME", # ARGO, GLODAP, SprayGliders, OleanderXBT, Saildrones
"JULD", # this contains timestamp
"LATITUDE",
"LONGITUDE",
"PRES", # This will be automatically converted to depths in meters
"TEMP",
"PRES_QC",
"TEMP_QC",
"PRES_ERROR",
"TEMP_ERROR",
"PSAL",
"PSAL_QC",
"PSAL_ERROR"
]
# month and year are constant in out case
year0 = 2010
month0 = 5
# we loop to generate one file per day
for j in range(10):
# set date range
day0 = 1+j
day1 = day0+1
date0 = datetime.datetime(year0, month0, day0, 0, 0, 0)
date1 = datetime.datetime(year0, month0, day1, 0, 0, 0)
print(f"Converting obs between {date0} and {date1}")
# this defines AND filters, i.e. we want to load each observation that has latitude within the given range AND longitude within the given range, etc.
# to exclude NaNs, impose a range to a variable
and_filters = (
("LATITUDE",'>',LAT0), ("LATITUDE",'<',LAT1),
("LONGITUDE",'>',LON0), ("LONGITUDE",'<',LON1),
("PRES",'>',-1e30), ("PRES",'<',1e30),
("JULD",">",date0), ("JULD","<",date1),
)
# this adds OR conditions to the and_filters, i.e. we want to load all observations that statisfy the AND conditions above, AND that have finite salinity OR temperature values
db_filters = [
list(and_filters) + [("PSAL", ">", -1e30), ("PSAL", "<", 1e30)],
list(and_filters) + [("TEMP", ">", -1e30), ("TEMP", "<", 1e30)],
]
# generate output filename
obs_seq_out = basename + f".{year0}{month0:02d}{day0:02d}.out"
# generate obs_seq.in file
obsSeq = ObsSequence(
crocolake_path,
selected_variables,
db_filters,
obs_seq_out=obs_seq_out,
loose=True
)
obsSeq.write_obs_seq()
The following cell contains the solution:
import datetime
from convert_crocolake_obs import ObsSequence
# define horizontal region
LAT0 = 5
LAT1 = 60
LON0 = -100
LON1 = -30
# define variables to import from CrocoLake
selected_variables = [
"DB_NAME", # ARGO, GLODAP, SprayGliders, OleanderXBT, Saildrones
"JULD", # this contains timestamp
"LATITUDE",
"LONGITUDE",
"PRES", # This will be automatically converted to depths in meters
"TEMP",
"PRES_QC",
"TEMP_QC",
"PRES_ERROR",
"TEMP_ERROR",
"PSAL",
"PSAL_QC",
"PSAL_ERROR"
]
# month and year are constant in out case
year0 = 2010
month0 = 5
wmo_id = str(1900256)
basename = basename + f".ARGO_{wmo_id}"
if not os.path.exists(outdir):
os.makedirs(outdir, exist_ok=True)
# we loop to generate one file per day
for j in range(10):
# set date range
day0 = 1+j
day1 = day0+1
date0 = datetime.datetime(year0, month0, day0, 0, 0, 0)
date1 = datetime.datetime(year0, month0, day1, 0, 0, 0)
print(f"Converting obs between {date0} and {date1}")
# this defines AND filters, i.e. we want to load each observation that has latitude within the given range AND longitude within the given range, etc.
# to exclude NaNs, impose a range to a variable
and_filters = (
("LATITUDE",'>',LAT0), ("LATITUDE",'<',LAT1),
("LONGITUDE",'>',LON0), ("LONGITUDE",'<',LON1),
("PRES",'>',-1e30), ("PRES",'<',1e30),
("JULD",">",date0), ("JULD","<",date1),
("PLATFORM_NUMBER","==",wmo_id)
)
# this adds OR conditions to the and_filters, i.e. we want to load all observations that statisfy the AND conditions above, AND that have finite salinity OR temperature values
db_filters = [
list(and_filters) + [("PSAL", ">", -1e30), ("PSAL", "<", 1e30)],
list(and_filters) + [("TEMP", ">", -1e30), ("TEMP", "<", 1e30)],
]
# generate output filename
obs_seq_out = basename + f".{year0}{month0:02d}{day0:02d}.out"
# generate obs_seq.in file
obsSeq = ObsSequence(
crocolake_path,
selected_variables,
db_filters,
obs_seq_out=obs_seq_out,
loose=True
)
obsSeq.write_obs_seq()
The previous cell should have generate one obs_seq file, as the float that we selected recorded data only on one day between 2010-05-01 and 2010-05-10. We will use this generate file in the model-obs comparison in the following.
The configuration file#
The configuration file config_tutorial_2.yaml
has already all the paths set. Make sure that the output is being saved where you wish.
Running the workflow#
Running the workflow to interpolate the model is identical as in Tutorial 1 (remember to adapt the config file name in from_config_file()
!): the obs_seq files that we generated only contain the float’s measurements, so only those observtions will be used by DART’s perfect_model_obs
to perform the interpolation.
from crococamp.workflows import WorkflowModelObs
# interpolate model onto obs space for the single float
workflow_float = WorkflowModelObs.from_config_file('config_tutorial_2.yaml')
workflow_float.run() #use flag clear_output=True if you want to re-run it and automatically clean all previous output
Displaying the interactive map#
We can load the data with the same tools as in Tutorial 1. Note that in this case all interpolations were succesfull, so get_good_model_obs_df()
and get_all_model_obs()
will return the same dataframe (get_failed_model_obs()
will return an empty dataframe).
good_model_obs_df = workflow_float.get_good_model_obs_df(compute=True) # compute=True triggers the compute of the dask dataframe, returning a pandas dataframe with data loaded in memory
good_model_obs_df.head() # displays first 5 rows in the dataframe
Now load the interactive map as in Tutorial 1. What do you see?
from crococamp.viz import InteractiveWidgetMap
# Create an interactive map widget to visualize model-observation comparisons
# The widget provides controls for selecting variables, observation types, and time ranges
widget = InteractiveWidgetMap(good_model_obs_df)
widget.setup()
The previous cell should show a dot in the middle of the ocean, which is not very informative. We can then use MapConfig
to pass extra arguments to InteractiveWidgetMap
, among which the extent of the map area to plot. Play with the map_extent
values in the cell below and re-execute the cell until you’re happy with your plotted region.
from crococamp.viz import MapConfig
map_config = MapConfig(
map_extent=(-90,0,0,90) #(lon_min, lon_max, lat_min, lat_max)
)
# Create an interactive map widget to visualize model-observation comparisons
# The widget provides controls for selecting variables, observation types, and time ranges
widget = InteractiveWidgetMap(good_model_obs_df, config=map_config)
widget.setup()
Interactive profile#
Finally, we can load the interactive profile widget to explore how our model performed compared to the specific float measurements we selected:
from crococamp.viz import InteractiveWidgetProfile
# Create an interactive profile widget to analyze vertical profiles
# This is ideal for analyzing single float/CTD profiles vs model data
widget = InteractiveWidgetProfile(good_model_obs_df)
widget.setup()
You can also pass some settings through the config
argument of the InteractiveWidgetProfile class:
from crococamp.viz import ProfileConfig
# Customize the profile widget appearance and behavior
profile_config = ProfileConfig(
figure_size=(7, 7),
marker_size=5,
marker_alpha=0.6,
invert_yaxis=False, # Don't invert for this example
grid=True
)
# Create widget with custom configuration
widget = InteractiveWidgetProfile(good_model_obs_df, x='obs', y='interpolated_model', config=profile_config)
widget.setup()