How CrocoDash Is Organized#
CrocoDash manages the complete workflow from raw data sources to a fully configured regional MOM6 case within CESM. Understanding its structure helps you know where to find information, what each piece does, and how to extend it for your needs.
The Three Phases of CrocoDash#
Everything in CrocoDash revolves around three main phases that happen sequentially:
Phase 1: Grid Definition#
Before you can set up a case, you need to define the spatial domain. CrocoDash uses three grid objects to represent your domain:
Horizontal Grid (
Gridclass) - Your Arakawa C-grid specification (lat/lon coordinates, resolution, etc.)Vertical Grid (
VGridclass) - Depth levels and layer thickness for your domainBathymetry/Topography (
Topoclass) - Seafloor depths and land mask
These classes are from mom6_bathy, and mom6_bathy provides an interactive bathymetry editor if you want to manually adjust seafloor features before setting up your case.
Phase 2: Case Setup#
Once you have grids defined, you create a Case object that represents your simulation. This phase:
Creates the case directory structure in CESM
Configures all model components (atmosphere, ocean, land, ice, runoff)
Registers your custom MOM6 grid with CESM
Sets options like compset, machine, queue, and other parameters
The Case class is your main entry point for all case operations. It handles coordination with CESM behind the scenes using VisualCaseGen and CESM’s native configuration system.
Phase 3: Forcing and Boundary Conditions#
The final phase generates all the data files your simulation needs:
Data Access - Retrieves raw data from multiple sources (MOM6 output, TPXO tides, GLOFAS runoff, etc.) through a unified interface
Regridding - Maps data from source grids onto your custom regional grid
Format Conversion - Converts data to MOM6-compatible file formats
Boundary Extraction - Extracts open boundary conditions (OBC) for your domain edges (This is part of regridding)
The actual processing is decoupled from the main Case workflow because it’s computationally intensive. You configure what you want in Phase 2, then CrocoDash handles the heavy lifting separately through the extract_forcings module.
The Complete Workflow#
Here’s how you move through these phases in practice:
1. Define Grid (horizontal & vertical)
↓
2. Define Bathymetry
↓
3. Create Case Object with grids
↓
4. Configure Forcing (specify what data you want)
↓
5. Process Forcings (download, regrid, format)
↓
6. Build Case
↓
7. Submit Case
How CrocoDash Organizes Its Code#
CrocoDash is structured so that each major task has its own module. Understanding this organization helps you know where to look when you need something:
Grid and Geometry Handling#
grid,vgrid,topo- Define and manage your horizontal grid, vertical grid, and bathymetrytopo_editor- Interactive tool for editing bathymetry before case creation
Case Management#
case- The mainCaseclass that orchestrates everythingforcing_configurations- Where you specify what forcing data your case needs. This module is “under-the-hood” of the configure_forcings function.
Data Processing#
extract_forcings- The computational engine that generates all your forcing and boundary condition filesraw_data_access- Unified interface to get data from many different sources
Supporting Infrastructure#
logging- Consistent logging setup across the package
The Forcing Configuration Registry#
One key design pattern you’ll encounter is the configuration registry. Instead of editing configuration files by hand, CrocoDash uses Python classes to represent each type of forcing configuration (Tides, Biogeochemistry, Rivers, etc.).
Why this approach? Because configurations aren’t just data—they involve validation logic. For example, “you can’t use BGC without the BGC component in your compset.” By using Python classes, CrocoDash can enforce these rules automatically and give you helpful error messages if something doesn’t make sense.
This also makes CrocoDash extensible: if you want to add a new forcing type, you can create a new configuration class and register it.
The Data Access Registry#
Similarly, CrocoDash provides a data registry that lets you access different datasets through a common interface. Under the hood, each dataset (MOM6, TPXO, GLOFAS) has its own implementation for downloading, caching, and loading data. But from your perspective, you just ask for the data you need and CrocoDash handles the details.
This design makes it easy to:
Add support for new datasets without changing core code (which you can do!)
Swap out data sources if you want to use a different provider
Cache data locally to avoid repeated downloads
Data Flow: Case Setup to Forcing Generation#
Here’s a detailed view of what happens at each step:
1. Initialize Grid, Topo, VGrid objects
↓
2. Create Case object with grid specifications and input/run folders
↓
3. Setup case via CESM integration (creates case directories)
↓
4. Configure forcing via the ForcingConfigRegistry from forcing_configurations.base:
- Specifies what datasets and configurations you want
- Validates that your choices are compatible
- Stores configuration in a JSON file for reproducibility
↓
5. Process forcings:
- raw_data_access gets data from sources (downloads if needed)
- extract_forcings regrids and reformats data
- Output files are placed in your input directories
↓
6. Build and submit case from CESM
Key Objects You’ll Interact With#
These are the main classes you’ll use when working with CrocoDash:
Case- Represents your simulation. You create one, configure it, then build and submit it.Grid- Your horizontal grid specificationVGrid- Your vertical grid specificationTopo- Your bathymetry dataForcing Configurators - Objects representing Tides, Biogeochemistry, Rivers, etc. that you specify in your case
Integration with External Tools#
CrocoDash doesn’t do everything itself—it orchestrates several specialized tools:
mom6_bathy - Grid generation and bathymetry tools
regional-mom6 - Regional MOM6 setup and OBC generation
VisualCaseGen - CESM case creation interface
These are included as submodules in the CrocoDash repository, so you have everything you need in one place.
Design Philosophy: Separation of Concerns#
CrocoDash separates tasks into focused modules so you can understand and work with each piece independently:
The
Caseclass handles CESM case management—you don’t need to know the internals of CESM to use itThe
forcing_configurationshandles forcing configuration logic separately from case setupThe
extract_forcingsmodule handles the computationally heavy work of processing dataThe
raw_data_accessmodule handles data retrieval, insulating you from the differences between data sources
This organization means:
We can test or debug individual pieces without understanding the whole system
Adding new functionality (like a new forcing type or data source) doesn’t require modifying existing code
The learning curve is gentler because you can focus on the piece relevant to what you’re doing right now
Extending CrocoDash#
One benefit of this architecture is that CrocoDash is designed to be extended. You can:
Add new forcing configurations by creating a new configuration class and registering it (see Forcing Configurations)
Add new data sources by implementing a new data product class in the registry (see Adding Data Access)
Customize bathymetry using the interactive TopoEditor
Create custom configurations by modifying JSON files or Python configuration objects
None of these require modifying CrocoDash’s core code—the architecture is designed to support this kind of extension.
What Happens When You Run Your Case#
When you execute the workflow, here’s what’s actually happening behind the scenes:
Grid Definition - You create Python objects representing your spatial domain
Case Creation - CrocoDash calls CESM’s case creation utilities to set up the directory structure
Forcing Configuration - Your specifications are validated and stored in JSON
Data Processing - CrocoDash queries remote datasets, downloads what’s needed, regrids to your grid, and formats for MOM6
Building - CESM builds the executable with your specific configuration
Submission - The case runs on the supercomputer
You don’t need to understand all these details to use CrocoDash, but they’re here when you need them.
Workflow Diagram#
The following diagram illustrates the workflow of CrocoDash to set up a regional model:
:::{figure} ../_static/workflow_diagram.png :align: center :alt: Workflow diagram showing the steps from CrocoDash to a regional model. :width: 80%
Workflow Diagram: This diagram shows the key steps involved in using CrocoDash to form a fully configured regional model. :::
Module Diagram#
The following diagram describes the connections between various modules and the case.py file:
:::{figure} ../_static/module_diagram.svg :align: center :alt: Module diagram showing connections to case.py. :width: 80%
Module Diagram: This diagram highlights how different modules interact with the main workflow (case.py) file in CrocoDash.
:::