SeaSloth Benchmark Report

Performance snapshot — benchmarks run on Derecho/GLADE. Regression timeline → (needs 2+ commits to show data)

Summary

Bathymetry — Topo.set_from_dataset()

The cost of loading GEBCO bathymetry and regridding it onto a model grid scales with domain extent, not destination resolution. Before regridding, the pipeline slices GEBCO to the model bounding box — so a larger geographic region means more source data to read and interpolate, regardless of how fine the destination grid is. A 5°×5° domain takes 11.1 s; a 40°×40° domain takes 16.0 min. All grids are generated at 0.1° resolution, so larger domains also produce larger destination grids. Memory usage averages ~18.9 GB.

Regridding weights — xe.Regridder() / raw ESMF

Computing interpolation weights (the one-time setup cost before any regridding can happen) scales with grid size. For a 300×300 source → 150×150 destination grid, bilinear weight generation takes 655 ms; scaling up to 1500×700 → 700×350 takes 10.6 s. Conservative interpolation takes roughly 1.6× longer than bilinear at the same grid size. Raw ESMF weight generation is similar in cost (0.96× relative to xESMF for the same grid pair), confirming that xESMF's overhead is negligible — it is a thin Python wrapper around the same ESMF C library. Weight files themselves occupy 225 MB–1.8 GB of RSS memory.

OBC-style (locstream) weight generation

When the destination is a boundary line of points rather than a full grid (the pattern used for open-boundary conditions), weight generation is substantially faster: 46 ms–6.2 s across the tested source grid sizes. This is because the destination has far fewer points than a full 2-D grid of similar extent.

Applying pre-computed weights — regridder(ds)

Once weights are built, applying them to a data array is fast regardless of grid size. Using xESMF, a single timestep on a small grid takes 2 ms; 60 timesteps on the largest tested grid takes 117 ms. The cost is dominated by the number of destination points and time steps, not the source grid size. On average, nearest_s2d is 2.4× faster than bilinear during application. Raw ESMF (no xarray) applies the same pre-computed weights considerably faster: 37 μs and 62 ms for the same two cases — roughly 46× and 1.9× faster respectively. The gap is largest for single timesteps, where xESMF's xarray Dataset → numpy conversion and index alignment dominate; it narrows for many timesteps on large grids where the actual interpolation work takes over. In practice xESMF is used because it handles multi-variable, dask-backed arrays transparently — the overhead is a deliberate trade-off for API convenience.

Runoff mapping — gen_rof_maps()

gen_rof_maps() builds ESMF regridding weight files that map river runoff from a land-model (ROF) mesh onto the ocean (OCN) model mesh. These weight files are computed once and reused for every model run. The ROF source mesh is held constant (JRA55) across all pairs; only the OCN destination grid varies — from coarse (1/10°) through fine (1/40°) to a larger regional domain — so timing directly reflects how destination grid size drives cost.

Module import times

Importing CrocoDash and mom6_forge is fast enough to be negligible in any workflow: CrocoDash.case loads in 7 ms, mom6_forge.topo in 4 ms, and mom6_forge.grid / mom6_forge.vgrid in 2 ms / 1 ms.

Data source availability

All checked data sources are accessible. Each method is validated on every benchmark run; the table below shows pass/fail status and how long each validation took.

A note on xESMF and ESMF benchmarks

xESMF and ESMF are external libraries — they are not part of CrocoDash or mom6_forge and their performance will not change from commit to commit on those repos. These benchmarks serve as a stable reference: they tell you how fast the underlying regridding engine is on this machine, independently of any CROC code changes. If these numbers change significantly between runs, suspect a different library version, different node type, or different CPU load — not a regression in CROC code.

Detailed charts below ↓

CrocoDash

DataAccessHealth — all products

Product / MethodWorking?Link?Validation time
gebco / get gebco data with pythonYesYes
gebco / get gebco data scriptYesYes6 ms
glofas / get global data with pythonYesYes443 ms
glofas / get processed global glofas script for cliYesYes5 ms
glorys / get glorys data from rdaYesYes1.3 s
glorys / get glorys data from cds apiYesYes1.1 min
glorys / get glorys data script for cliYesYes6 ms
mom6_output / get mom6 dataYesYes35.7 s
seawifs / get global seawifs script for cliYesYes7 ms
seawifs / get processed global seawifs script for cliYesYes7 ms

CrocoDashImports.time_import

param[0]: CrocoDash.case, mom6_forge.grid, mom6_forge.topo, mom6_forge.vgrid
CrocoDashImports.time_import

OBCRegridMerge.time_regrid_and_merge

param[0]: 5, 15, 30
OBCRegridMerge.time_regrid_and_merge

DataAccessLinkCheck.time_link_check

param[0]: tpxo, cesminputdata, gebco, glofas, glorys, mom6_output, seawifs
DataAccessLinkCheck.time_link_check

esmf

ESMFRegridApply.time_apply

param[0]: 300,300, 800,600, 1500,700
param[1]: 150,150, 400,300, 700,350
param[2]: 1, 12, 60
param[3]: bilinear, nearest_s2d
ESMFRegridApply.time_apply

ESMFRegridApply.track_rss_mb

param[0]: 300,300, 800,600, 1500,700
param[1]: 150,150, 400,300, 700,350
param[2]: 1, 12, 60
param[3]: bilinear, nearest_s2d
ESMFRegridApply.track_rss_mb

ESMFWeightsGenerate.time_generate_weights

param[0]: 300,300, 800,600, 1500,700
param[1]: 150,150, 400,300, 700,350
param[2]: bilinear, nearest_s2d
ESMFWeightsGenerate.time_generate_weights

ESMFWeightsGenerate.track_rss_mb

param[0]: 300,300, 800,600, 1500,700
param[1]: 150,150, 400,300, 700,350
param[2]: bilinear, nearest_s2d
ESMFWeightsGenerate.track_rss_mb

mom6_forge

RunoffMappingNearestNeighbour.time_gen_rof_maps_nn

param[0]: coarse_dst, fine_dst, larger_dst
RunoffMappingNearestNeighbour.time_gen_rof_maps_nn

RunoffMappingSmoothed.time_gen_rof_maps_smoothed

param[0]: coarse_dst, fine_dst, larger_dst
RunoffMappingSmoothed.time_gen_rof_maps_smoothed

TopoSetFromDataset.time_set_from_dataset

param[0]: 5, 10, 20, 40
TopoSetFromDataset.time_set_from_dataset

TopoSetFromDataset.track_rss_mb

param[0]: 5, 10, 20, 40
TopoSetFromDataset.track_rss_mb

xESMF / ESMF

XESMFRegridApply.time_apply

param[0]: 300,300, 800,600, 1500,700
param[1]: 150,150, 400,300, 700,350
param[2]: 1, 12, 60
param[3]: bilinear, nearest_s2d
XESMFRegridApply.time_apply

XESMFRegridApply.track_rss_mb

param[0]: 300,300, 800,600, 1500,700
param[1]: 150,150, 400,300, 700,350
param[2]: 1, 12, 60
param[3]: bilinear, nearest_s2d
XESMFRegridApply.track_rss_mb

XESMFRegridApplyLocstream.time_apply

param[0]: 300,300, 800,600, 1500,700
param[1]: 1000, 10000, 100000
param[2]: 1, 12, 60
XESMFRegridApplyLocstream.time_apply

XESMFWeightsGenerate.time_generate_weights

param[0]: 300,300, 800,600, 1500,700
param[1]: 150,150, 400,300, 700,350
param[2]: bilinear, conservative
XESMFWeightsGenerate.time_generate_weights

XESMFWeightsGenerate.track_rss_mb

param[0]: 300,300, 800,600, 1500,700
param[1]: 150,150, 400,300, 700,350
param[2]: bilinear, conservative
XESMFWeightsGenerate.track_rss_mb

XESMFWeightsGenerateLocstream.time_generate_weights

param[0]: 300,300, 800,600, 1500,700
param[1]: 1000, 10000, 100000
param[2]: bilinear, nearest_s2d
XESMFWeightsGenerateLocstream.time_generate_weights

XESMFWeightsGenerateLocstream.track_rss_mb

param[0]: 300,300, 800,600, 1500,700
param[1]: 1000, 10000, 100000
param[2]: bilinear, nearest_s2d
XESMFWeightsGenerateLocstream.track_rss_mb