Extract Forcings (case.process_forcings)#

The final part of the CrocoDash workflow is extracting and processing all the forcing data your simulation needs. This includes initial conditions, boundary conditions, tidal forcings, biogeochemistry data, and more. You process all of this data through the case.process_forcings call. case.process_forcings wraps a submodule of CrocoDash called extract_forcings. Extract_forcings is a set of scripts to process each forcing, like initial/boundary conditions, tides, etc… extract_forcings also holds a subdirectory called case_setup. This has a driver and config file. This holds all of the specific case information to run the processing scripts. When you run the workflow, this case_setup folder gets copied into your input directory. case.process_forcings goes into this subfolder and runs the driver. You can also run the driver yourself through the command-line.

Workflow Overview#

When you run the CrocoDash workflow, configure_forcings and process_forcings:

copies a ready-to-run forcing extraction system into your case directories (from case.configure_forcings)
Runs it to download data from external sources
Regrids data to your custom domain
Formats everything for MOM6

The key insight: you don’t have to run this from a Jupyter notebook. You get a complete, standalone extraction system that you can submit to your supercomputer’s job queue.

Directory Structure#

When CrocoDash sets up your case, it creates an extract_forcings directory in your input folder:

input_directory/
├── extract_forcings/
│   ├── driver.py              # Main script that orchestrates everything
│   ├── config.json            # Your case-specific configuration
└── ocnice/                    # Output goes here
    ├── initial_conditions.nc
    ├── boundary_conditions/
    ├── tides/
    └── ...

Command-Line Options#

The driver script accepts several options for fine-grained control:

# Run all forcing extractions
python driver.py --all

# Run only specific forcings
python driver.py --tides
python driver.py --runoff
python driver.py --bgc

# Run multiple forcings
python driver.py --tides --runoff --bgc

# Run all except certain forcings
python driver.py --all --skip bgcic
python driver.py --all --skip conditions bgcic

# Skip entire processing phases
python driver.py --all --skip conditions

This flexibility is intentional—you might want to:

Test individual components without running everything
Re-run one forcing type if your source data changed
Run on a supercomputer queue while iterating elsewhere
Resume after an interrupted run

The Processing Pipeline#

Here’s what happens internally when the driver runs:

1. Load config.json with your case specifications
   ↓
2. Calls an extract_forcing script
   ↓
3. Outputs all the data to ocnice

Design Philosophy#

CrocoDash deliberately doesn’t do all the processing itself. Instead, it leverages packages:

Task	Tool	Used By
Regridding & OBC extraction	regional-mom6	`regrid_dataset_piecewise.py` & Various Modules
Minor processing (fill, mapping, Chlorophyll)	mom6_bathy	Various modules
Data formatting	`netCDF4`, `xarray`	Throughout

If you want to modify how regridding or initial/boundary conditions are processed, the main place to look is CrocoDash.extract_forcings.regrid_dataset_piecewise, which calls regional-mom6 under the hood. You can look at regional_mom6 documentation for more information, allthrough it may be difficult to tease out how we use regional_mom6 without looking into the code a bit more.

Example: Running Forcings on Your HPC System#

Here’s a typical workflow for an HPC system with job queues:

Set up your case locally (or on login node):

case = Case(...)
case.configure_forcings(...)  # Sets up all configuration

Submit extraction as a batch job:

cd /path/to/case/input_directory/extract_forcings
# Activate CrocoDash environment and submit to batch system!

Extract Forcings (case.process_forcings)

Contents