Regional Ocean: Basic Region and Surface Field Visualization¶

Note: This notebook is meant to be run with the cupid-analysis kernel (see CUPiD Installation). This notebook is often run by default as part of CESM post-processing steps, but you can also run it manually.

Diagnostics and Plotting Resources¶

case_name = ""  # "/glade/campaign/cgd/oce/projects/CROCODILE/workshops/2025/Diagnostics/CESM_Output/"
CESM_output_dir = ""  # "CROCODILE_tutorial_nwa12_MARBL"

# As regional domains vary so much in purpose, simulation length, and extent, we don't want to assume a minimum duration
## Thus, we ignore start and end dates and simply reduce/output over the whole time frame for all of the examples given.
start_date = None  # "0001-01-01"
end_date = None  # "0101-01-01

save_figs = False
fig_output_dir = None

lc_kwargs = {}
serial = False

sfc_variables = []  # ['SSH', 'tos', 'sos', 'speed', 'SSV', 'SSU']
monthly_variables = []  # ['thetao', 'so', 'uo', 'vo']

# Parameters
case_name = "CROCODILE_tutorial_nwa12_MARBL"
CESM_output_dir = (
    "/glade/campaign/cgd/oce/projects/CROCODILE/workshops/2025/Diagnostics/CESM_Output/"
)
start_date = ""
end_date = ""
save_figs = True
fig_output_dir = None
lc_kwargs = {"threads_per_worker": 1}
serial = True
sfc_variables = ["tos"]
monthly_variables = ["thetao"]
subset_kwargs = {}
product = "/glade/work/ajanney/crocodile_2025/CUPiD/examples/regional_ocean/computed_notebooks//ocn/Regional_Ocean_Report_Card.ipynb"

Output directory is: /glade/campaign/cgd/oce/projects/CROCODILE/workshops/2025/Diagnostics/CESM_Output/CROCODILE_tutorial_nwa12_MARBL/ocn/hist

Load in Model Output and Look at Variables/Meta Data¶

Default File Structure in MOM6¶

This file structure will be different if you modify the diag_table.

static data: contains horizontal grid, vertical grid, land/sea mask, bathymetry, lat/lon information
sfc data: daily output of 2D surface fields (salinity, temp, SSH, velocities)
monthly data: averaged monthly output of the full 3D domain, regridded to a predefined grid (MOM6 default is WOA, see more below)
native data: averaged monthly output of ocean state and atmospheric fluxes on the native MOM6 grid

# Xarray time decoding things
time_coder = xr.coders.CFDatetimeCoder(use_cftime=True)

## Static data includes hgrid, vgrid, bathymetry, land/sea mask
static_data = xr.open_mfdataset(
    os.path.join(OUTDIR, "*h.static.nc"),
    decode_timedelta=True,
    decode_times=time_coder,
    engine="netcdf4",
)

## Surface Data
sfc_data = xr.open_mfdataset(
    os.path.join(OUTDIR, "*h.sfc*.nc"),
    decode_timedelta=True,
    decode_times=time_coder,
    engine="netcdf4",
)

## Monthly Full Domain Data
## Not used in this notebook by default
monthly_data = xr.open_mfdataset(
    os.path.join(OUTDIR, "*h.z*.nc"),
    decode_timedelta=True,
    decode_times=time_coder,
    engine="netcdf4",
)

## Monthly Full Domain Data, on native MOM6 grid
## Not used in this notebook by default
native_data = xr.open_mfdataset(
    os.path.join(OUTDIR, "*h.native*.nc"),
    decode_timedelta=True,
    decode_times=time_coder,
    engine="netcdf4",
)

## Image/Gif Output Directory
if fig_output_dir is None:
    image_output_dir = os.path.join(
        "/glade/derecho/scratch/",
        os.environ["USER"],
        "archive",
        case_name,
        "ocn",
        "cupid_images",
    )
else:
    image_output_dir = os.path.join(fig_output_dir, case_name, "ocn", "cupid_images")
if not os.path.exists(image_output_dir):
    os.makedirs(image_output_dir)
print("Image output directory is:", image_output_dir)

Image output directory is: /glade/derecho/scratch/ajanney/archive/CROCODILE_tutorial_nwa12_MARBL/ocn/cupid_images

## Select for only the variables we want to analyze
if len(sfc_variables) > 0:
    print("Selecting only the following surface variables:", sfc_variables)
    sfc_data = sfc_data[sfc_variables]
if len(monthly_variables) > 0:
    print("Selecting only the following monthly variables:", monthly_variables)
    monthly_data = monthly_data[monthly_variables]

## Apply time boundaries
## if they are the right format
if len(start_date.split("-")) == 3 and len(end_date.split("-")) == 3:
    import cftime

    calendar = sfc_data.time.encoding.get("calendar", "standard")

    calendar_map = {
        "gregorian": cftime.DatetimeProlepticGregorian,
        "noleap": cftime.DatetimeNoLeap,
    }

    CFTime = calendar_map.get(calendar, cftime.DatetimeGregorian)
    y, m, d = [int(i) for i in start_date.split("-")]
    start_date_time = CFTime(y, m, d)
    y, m, d = [int(i) for i in end_date.split("-")]
    end_date_time = CFTime(y, m, d)

    print(
        f"Applying time range from start_date: {start_date_time} and end_date: {end_date_time}."
    )

    sfc_data = sfc_data.sel(time=slice(start_date_time, end_date_time))
    monthly_data = monthly_data.sel(time=slice(start_date_time, end_date_time))
    native_data = native_data.sel(time=slice(start_date_time, end_date_time))

sfc_time_bounds = [
    sfc_data["time"].isel(time=0).compute().item(),
    sfc_data["time"].isel(time=-1).compute().item(),
]
monthly_time_bounds = [
    monthly_data["time"].isel(time=0).compute().item(),
    monthly_data["time"].isel(time=-1).compute().item(),
]

print(f"Surface Data Time Bounds: {sfc_time_bounds[0]} to {sfc_time_bounds[-1]}")
print(
    f"Monthly Data Time Bounds: {monthly_time_bounds[0]} to {monthly_time_bounds[-1]}"
)

Selecting only the following surface variables: ['tos']
Selecting only the following monthly variables: ['thetao']
Surface Data Time Bounds: 1999-12-30 12:00:00 to 2000-11-28 12:00:00
Monthly Data Time Bounds: 2000-01-14 12:00:00 to 2000-10-14 12:00:00

**`mom6.h.static.nc`**: static information about the domain¶

The MOM6 grid uses an Arakawa C grid which staggers velocities and tracers (temp, salinity, SSH, etc.). See this MOM6 documentation for more information.

Some variables of interest:¶

geolon/geolat (and c/u/v variants): specifies the true lat/lon of each cell. We use these variables for plotting and placing the data geographically.
wet (and c/u/v variants): the land-sea mask that specifies if a given point is on land or sea.
areacello (and bu,cu,cv variants): area of grid cell (important for taking area-weighted averages)
deptho: depth of ocean floor - bathymetry

Indexing and the Horizontal C-grid in MOM6

When accessing the output, we need to pay particular attention to which variables we are accessing and which coordinates correspond to their position on the grid. This also affects plotting and spatial averages (as we will see in this notebook and others).

xh/yh: index the center of the cell in x and y respectively
xq/yq: index the corner of the cell in x and y respectively

Coordinates:

(xh,yh): center of cell, where tracers are. Plot with geolon,geolat.
(xh,yq): middle of horizontal interface, where meridional (v) velocity is. Plot with geolon_v,geolat_v.
(xq,yh): middle of vertical interface, where zonal (u) velocity is. Plot with geolon_u,geolat_u.
(xq,yq): corners between cells, where vorticity is.

**`h.sfc.nc`**: daily surface fields¶

The surface fields are especially useful for diagnosing short runs. This is not only the most dynamic field for short runs, but it also is the only file that stores daily results (by default). A lot of the diagnostics in this notebook use surface fields because we are able to take time averages and look at time series for any run longer than a couple of days (unlike the full 3D domain fields which are averaged over each month).

Some variables of interest:¶

SSH: sea surface height
tos: temperature of ocean surface
sos: salinity of ocean surface
speed: magnitude of speed (considered a tracer, at the center of a cell)
SSU and SSV: zonal and meridional velocity at the surface

**`h.z.nc`**: fields for the full 3D domain, averaged monthly, regridded vertically¶

These are vertically remapped diagnostics files that capture the full 3D domain, but they only output monthly averages. For runs less than a month, they will average over the full length of the run.

By default, this diagnostics MOM6 regrids output to the vertical grid from the 2009 World Ocean Atlas (35 layers down to 6750 m).

Some variables of interest:¶

uo and vo: zonal and meridional velocity
thetao: potential temperature
so: salinity
vmo and umo: zonal and meridional mass transport
volcello: volume of each cell (important for averaging over a volume of the domain)

Note: now there are two z coordinates. z_i is the vertical interface between cells and z_l identifies the depth of the cell centers.

Note on Vertically Remapped Diagnostics in MOM6

The output found in *mom6.h.z.*.nc files is vertically remapped by MOM6. By default, this output is on the 2009 World Ocean Atlas grid. You can regrid the output after runtime (see packages like xgcm and and xESMF), but if you know a vertical grid that you need output on, MOM6 can handle the interpolation automatically.

The vertical grid settings for a particular CESM run can be found in MOM_parameter_doc.all in the CESM case run directory. See this MOM6 documentation for more information.

**`h.native.nc`**: fields for the full 3D domain, averaged monthly, on native MOM6 grid.¶

These outputs are on the MOM6 native grid. From the MOM6 Documentation:

Since the model can be run in arbitrary coordinates, say in hybrid-coordinate mode, then native-space diagnostics can be potentially confusing. Native diagnostics are useful when examining exactly what the model is doing

The default native file (as configured by CESM) also outputs useful global averages and atmospheric variables.

Some variables of interest:¶

soga: global mean ocean salinity
thetaoga: global mean ocean potential temperature
tauuo and tauvo: zonal and meridional downward stress from the atmospheric forcing
hfds: net downward surface heat flux
- hf*: various individual heat fluxes into the ocean
friver: freshwater flux from rivers

native_data

Look at Regional Domain¶

Note: you may see some numpy divide by zero errors below. I don’t why, but they aren’t an issue!

Plotting Regional MOM6 Output¶

We will primarily look at the surface fields and some of the monthly full domain fields in this notebook.

Xarray comes with great default plotting wrappers that can do a lot! If you take an Xarray DataArray (one variable data structure) and call

xr.DataArray.plot()

it will usually do a pretty good job! Here we’ll walk through some powerful ways to use this plot wrapper, and we provide some additional functions to help the plotting along!

Simple plotting with Xarray¶

By default, these files are all loaded in as Datasets which store a large set of variables with a variety of shared coordinates/dimensions.

When plotting, we need to reduce this space to a single variable. If we want to plot a spatial field we need one time step, and if we want to plot a timeseries, we need to average over the spatial dimensions (this is a bit more complicated than it might seem because the grid cells are not all the same size).

Let’s use some basic Xarray plotting to look at a surface field:

## Select a single variable at a single time
sst = sfc_data["tos"].isel(time=0)
sst.plot(cmap="viridis", vmin=-1.9, vmax=30)  # approx. color bar bounds for temp
# Note that it automatically labels with coordinate names and attributes.

Looks good! But from above we know we want to plot with geolon and geolat for an accurate map. We can also choose some different color maps and projections. Let’s use some regional_utils to get this done.

## Select a single variable at a single time
sst = sfc_data["tos"].isel(time=0)

coords = utils.chooseGeoCoords(sst.dims)
lat = static_data[coords["latitude"]]
lon = static_data[coords["longitude"]]

sst = sst.assign_coords({"lon": lon, "lat": lat})

cmap = utils.chooseColorMap(sst.name)

cbar_levels = utils.chooseColorLevels(
    sst.min().compute().item(),
    sst.max().compute().item(),
)

# Create the plot with projection
fig = plt.figure(dpi=200)
ax = plt.axes(projection=ccrs.Mercator())

# Plot the data
p = sst.plot(
    x="lon",
    y="lat",
    cmap=cmap,
    levels=cbar_levels,
    transform=ccrs.PlateCarree(),
    ax=ax,
    add_colorbar=True,
)

# Add coastlines - now this should work
ax.coastlines(resolution="50m", color="black", linewidth=0.3)

plt.show()

Looks good! Doing all this setup everytime would be cumbersome, so we wrapped it in a function utils.plotLatLonField.

Please go check it out and see what’s going on behind the scenes. It also can calculate statistics for the field taking into account area/volume weights.

MOM6 Output - Surface Fields¶

The cells below visualize the mean state and the std of the fields over the full time given.

## Ploting basic surface state variables
add_stats = True
time_bounds = (
    sfc_data["time"][0].compute().item(),
    sfc_data["time"][-1].compute().item(),
)

for var in sfc_variables:
    if var not in list(sfc_data.variables):
        print(f"Variable '{var}' not in given sfc_data. It will not be plotted.")
        sfc_variables.remove(var)


print(f"Taking mean and std dev from {time_bounds[0]} to {time_bounds[-1]}")
for var in sfc_variables:
    field = sfc_data[var]

    coords = utils.chooseGeoCoords(field.dims)
    areacello = utils.chooseAreacello(field.dims)

    mean = field.mean(dim="time", skipna=True).compute()
    std = field.std(dim="time", skipna=True).compute()
    mean.attrs = field.attrs
    std.attrs = field.attrs
    mean.attrs["long_name"] = f"Mean {field.long_name}"
    std.attrs["long_name"] = f"Std Dev {field.long_name}"

    utils.plotLatLonField(
        mean,
        latitude=static_data[coords["latitude"]],
        longitude=static_data[coords["longitude"]],
        stats=add_stats,
        area_weights=static_data[areacello],
        save=save_figs,
        save_path=image_output_dir,
    )
    plt.show()

    utils.plotLatLonField(
        std,
        latitude=static_data[coords["latitude"]],
        longitude=static_data[coords["longitude"]],
        stats=add_stats,
        area_weights=static_data[areacello],
        save=save_figs,
        save_path=image_output_dir,
    )
    plt.show()

Taking mean and std dev from 1999-12-30 12:00:00 to 2000-11-28 12:00:00

/glade/work/ajanney/conda-envs/cupid-analysis/lib/python3.11/site-packages/dask/array/numpy_compat.py:57: RuntimeWarning: invalid value encountered in divide
  x = np.divide(x1, x2, out)

Area Weighted Averages and Timeseries¶

Calculating and plotting timeseries can also be complicated. We need to be careful when taking an average of the fields that we take weighted statistics because the grid cells are not guaranteed to be constant area or volume.

The information we need for these weighted averages is contained in the areacello* variables (in static_data) and the volcello variable (in native_data and monthly_data).

We have to pay attention to coordinates and dimensions because of MOM6’s staggered grid (see how we choose the geolon/lat coords above).

Note: These timeseries may be less helpful for shorter runs, but see what they reveal!

for var in sfc_variables:
    field = sfc_data[var]
    areacello_var = utils.chooseAreacello(field.dims)

    utils.plotAvgTimeseries(
        sfc_data[var],
        static_data[areacello_var],
        save=save_figs,
        save_path=image_output_dir,
    )

Viewing Full Domain Monthly Output¶

Let’s look at the monthly output. When using longer runs (years or decades). These hold a lot more information about how your model is evolving and performing over time. For shorter runs (like many regional models often are), we may only have a handful of monthly output timesteps.

Xarray has some very useful methods for visualizing multiple levels and easily plotting them using the col and col_wrap keywords.

**Note: ** none of the following plots are saved. If you want to save them, rerun the first few cells of the notebook and manually save these.

for var in monthly_variables:
    if var not in list(monthly_data.variables):
        print(f"Variable '{var}' not in given monthly_data. It will not be plotted.")
        monthly_variables.remove(var)

## Let's only look at one variable for now (this plotting can be slow)
for var in monthly_variables[0:1]:
    # Look at only the first time step for now
    field = monthly_data[var].isel(time=0)

    cmap = utils.chooseColorMap(var)
    levels = utils.chooseColorLevels(
        field.min().compute().item(),
        field.max().compute().item(),
    )

    subplot_kwargs = {
        "projection": ccrs.Mercator(),
    }

    p = field.plot(
        col="z_l",
        col_wrap=5,
        cmap=cmap,
        levels=levels,
        transform=ccrs.PlateCarree(),
        subplot_kws=subplot_kwargs,
    )
    plt.suptitle(f"{var} at depths", y=1.0, fontsize=20)

    for ax in p.axs.flat:
        ax.coastlines(resolution="50m", color="black", linewidth=0.3)

Vertical Profiles¶

Another useful tool, sometimes we want to look at the vertical profile of something like salinity or temperature. Be careful to use area weighted averages here. (This could be a tool in regional_utils, but it’s not too much code, you decide!)

for var in monthly_variables[0:1]:  # only one variable again, make it more!
    # Look at only the first time step for now
    field = monthly_data[var].isel(time=0)

    area_weights = utils.chooseAreacello(field.dims)
    field_weighted = field.weighted(static_data[area_weights])

    mean_vars = [dim for dim in field.dims if dim != "z_l"]
    field_mean = field_weighted.mean(mean_vars)

    field_mean.plot(y="z_l", yincrease=False)

Bonus: Interpolating Levels and Slices¶

We successfully plotted different levels above, we can also plot vertical slices. In both cases, we might run into an issue where we want to interpolate to a depth or lat/lon that is not explicitly resolved in the model. We can write our own tailored methods for this, but for simple visual inspection/comparison Xarray tools work well!

monthly_data["z_l"]

## Plot a specific depth
field = monthly_data["thetao"].isel(time=0)
field = field.interp(
    z_l=550, method="linear"
)  # you may choose a different depth if you want to test the interpolation

coords = utils.chooseGeoCoords(field.dims)
areacello = utils.chooseAreacello(field.dims)

utils.plotLatLonField(
    field,
    latitude=static_data[coords["latitude"]],
    longitude=static_data[coords["longitude"]],
    stats=True,
    area_weights=static_data[areacello],
    save=save_figs,
    save_path=image_output_dir,
)
plt.show()

Let’s try it with a vertical slice now too!¶

field = monthly_data["thetao"].isel(time=0)
x_var = "xh" if "xh" in field.dims else "xq"
longitude = np.mean(field[x_var].to_numpy())
vertical = field.interp({x_var: longitude})
levels = np.linspace(
    vertical.min().compute().item(), vertical.max().compute().item(), 35
)
vertical.plot(
    yincrease=False,
    levels=levels,
)

Regional Ocean: Basic Region and Surface Field Visualization

Regional Ocean: Basic Region and Surface Field Visualization¶

Diagnostics and Plotting Resources¶

Highly Recommended:¶

Great! Focused on Global Output Diagnostics¶

Misc.¶

Load in Model Output and Look at Variables/Meta Data¶

Default File Structure in MOM6¶

*mom6.h.static*.nc: static information about the domain¶

Some variables of interest:¶

*h.sfc*.nc: daily surface fields¶

Some variables of interest:¶

*h.z*.nc: fields for the full 3D domain, averaged monthly, regridded vertically¶

Some variables of interest:¶

*h.native*.nc: fields for the full 3D domain, averaged monthly, on native MOM6 grid.¶

Some variables of interest:¶

Look at Regional Domain¶

Plotting Regional MOM6 Output¶

Simple plotting with Xarray¶

MOM6 Output - Surface Fields¶

Area Weighted Averages and Timeseries¶

Viewing Full Domain Monthly Output¶

Vertical Profiles¶

Bonus: Interpolating Levels and Slices¶

Let’s try it with a vertical slice now too!¶

**`mom6.h.static.nc`**: static information about the domain¶

**`h.sfc.nc`**: daily surface fields¶

**`h.z.nc`**: fields for the full 3D domain, averaged monthly, regridded vertically¶

**`h.native.nc`**: fields for the full 3D domain, averaged monthly, on native MOM6 grid.¶