CrocoDash.extract_forcings package#

Subpackages#

Submodules#

CrocoDash.extract_forcings.bgc module#

CrocoDash.extract_forcings.bgc.process_bgc_ic(file_path, output_path)#

Copy BGC initial condition file

Parameters: - file_path: str, path to the original BGC IC file - output_path: str, path to save the processed BGC IC file

Returns: - None

CrocoDash.extract_forcings.bgc.process_bgc_iron_forcing(nx, ny, MARBL_FESEDFLUX_FILE, MARBL_FEVENTFLUX_FILE, inputdir)#

Create dummy iron forcing files for MARBL. Parameters: - nx: int, number of grid points in x-direction - ny: int, number of grid points in y-direction - MARBL_FESEDFLUX_FILE: str, filename for sediment flux input - MARBL_FEVENTFLUX_FILE: str, filename for event flux input - inputdir: str, directory to save the generated files Returns: - None

CrocoDash.extract_forcings.bgc.process_river_nutrients(global_river_nutrients_filepath, ocn_grid, mapping_file, river_nutrients_nnsm_filepath)#

CrocoDash.extract_forcings.chlorophyll module#

CrocoDash.extract_forcings.chlorophyll.process_chl(ocn_grid, ocn_topo, inputdir, chl_processed_filepath, output_filepath)#

CrocoDash.extract_forcings.get_dataset_piecewise module#

CrocoDash.extract_forcings.merge_piecewise_dataset module#

CrocoDash.extract_forcings.regrid_dataset_piecewise module#

CrocoDash.extract_forcings.runoff module#

CrocoDash.extract_forcings.runoff.generate_rof_ocn_map(rof_grid_name, rof_esmf_mesh_filepath, ocn_mesh_filepath, inputdir, grid_name, rmax, fold)#

Generate runoff to ocean mapping files if runoff is active in the compset.

CrocoDash.extract_forcings.tides module#

CrocoDash.extract_forcings.tides.process_tides(ocn_topo, inputdir, supergrid_path, vgrid_path, tidal_constituents, boundaries, tpxo_elevation_filepath, tpxo_velocity_filepath)#

CrocoDash.extract_forcings.utils module#

class CrocoDash.extract_forcings.utils.Config(config_path: str = 'config.json')#

Bases: object

keys()#
CrocoDash.extract_forcings.utils.check_date_continuity(boundary_file_list: dict)#

Check for overlaps or missing dates between consecutive files.

CrocoDash.extract_forcings.utils.make_local_cluster(n_workers=1, threads_per_worker=1)#

Create a Dask Client backed by a LocalCluster.

Workers are used for the GET (download) step only. REGRID and MERGE always run sequentially in the main process — ESMF’s VM fails to initialize in subprocess workers on PBS/HPC systems (ESMCI::VM::getCurrent() rc=545).

For HPC batch jobs, see make_pbs_cluster().

Typical usage:

from CrocoDash.extract_forcings.utils import make_local_cluster
client = make_local_cluster(n_workers=4)
process_obc_conditions(..., client=client)
client.close()
Parameters:
  • n_workers – Number of worker processes (used for GET/MERGE).

  • threads_per_worker – Threads per worker.

Returns:

dask.distributed.Client connected to the LocalCluster.

CrocoDash.extract_forcings.utils.make_pbs_cluster(n_workers, cores=1, processes=1, memory='4GiB', walltime='01:00:00', job_name='crocodash', queue=None, resource_spec=None)#

Create a Dask Client backed by a PBS cluster via dask-jobqueue.

Each Dask worker is submitted as a separate PBS job. The function prints the generated job script so you can verify the PBS directives before jobs are queued.

Requires dask-jobqueue (pip install dask-jobqueue).

Typical usage:

from CrocoDash.extract_forcings.utils import make_pbs_cluster
from CrocoDash.extract_forcings.case_setup.driver import run_workflow

client = make_pbs_cluster(n_workers=8, queue="regular", walltime="02:00:00")
run_workflow(bc=True, client=client)
client.close()
Parameters:
  • n_workers – Number of PBS jobs (workers) to submit.

  • cores – CPU cores per PBS job.

  • processes – Dask processes per PBS job (usually 1).

  • memory – Memory per PBS job (e.g. '4GiB').

  • walltime – Walltime per PBS job (e.g. '01:00:00').

  • job_name – Job name visible in qstat.

  • queue – PBS queue/partition. Site-specific; omit to use the scheduler default.

  • resource_spec – Raw PBS -l resource string (e.g. 'select=1:ncpus=4:mem=4gb'). Optional; overrides cores/memory when set.

Returns:

dask.distributed.Client connected to the PBSCluster.

CrocoDash.extract_forcings.utils.parse_dataset_folder(folder: str | Path, input_dataset_regex: str, date_format: str)#

Parse a folder to find and extract dataset file information based on a regex pattern.

Parameters:
  • folder (str or Path) – Path to the folder containing the dataset files.

  • input_dataset_regex (str) – Regular expression pattern to match dataset filenames. Example: “(north|east|south|west)_unprocessed.(d{8})_(d{8}).nc”

  • date_format (str) – Date format string used to parse dates in filenames (e.g., “%Y%m%d”).

Returns:

Dictionary mapping boundaries to a list of tuples with: - Start date (datetime) - End date (datetime) - Full file path (Path)

Example: {

”north”: [(datetime(2000, 1, 1), datetime(2000, 1, 2), Path(“/path/to/north_20000101_20000102.nc”))], “east”: [(datetime(2000, 1, 3), datetime(2000, 1, 4), Path(“/path/to/east_20000103_20000104.nc”))]

}

Return type:

dict

Module contents#