title: Reflow description: Decorator-based HPC workflow engine with Result-based data wiring, reusable workflows, and an auto-generated CLI.
reflow¶
Decorator-based HPC workflow engine with Result-based data wiring, reusable workflows, and an auto-generated CLI.
from reflow
from pathlib import Path
from typing import Annotated
wf = reflow.Workflow("climate")
@wf.job(cpus=4, time="02:00:00", mem="16G")
def prepare(
start: Annotated[str, reflow.Param(help="Start date")],
run_dir: RunDir = reflow.RunDir(),
) -> list[str]:
"""Download and preprocess input files."""
return [str(f) for f in (run_dir / "input").glob("*.nc")]
@wf.job(array=True, cpus=8, time="04:00:00", mem="32G")
def compute(
nc_file: Annotated[str, reflow.Result(step="prepare")],
run_dir: RunDir = reflow.RunDir(),
) -> str:
"""Process a single input file (one per array element)."""
return str(run_dir / "output" / Path(nc_file).name)
if __name__ == "__main__":
wf.cli()
$ python pipeline.py submit --run-dir /scratch/run1 --start 2025-01-01
Created run climate-20250301-a1b2 in /scratch/run1
$ python pipeline.py status --run-id climate-20250301-a1b2
Features¶
- Decorator-driven
- Define tasks with
@wf.job(), wire data between them withResult, and let reflow handle submission, dependency chaining, and result collection. - Automatic fan-out
- Return a
listfrom a task, mark the downstream asarray=True, and reflow submits one array element per item with zero boilerplate. - Merkle-DAG caching
- Each task instance gets a content-addressed identity. Re-runs skip tasks whose inputs haven't changed.
- Reusable flows
- Build a library of
Flowobjects and compose them into workflows withwf.include(flow, prefix="..."). - Auto-generated CLI
wf.cli()produces a full argparse CLI withsubmit,status,cancel,retry,dag, anddescribesubcommands.- Multi-scheduler
- Works with Slurm, PBS Pro / Torque, LSF, SGE / UGE, and Flux out of the box. Write scheduler-agnostic config and reflow translates to the right flags.
Installation¶
Requires Python 3.10+. The only runtime dependency is
tomli on Python 3.10 (stdlib
tomllib is used on 3.11+).
Next steps¶
- Getting started — build a workflow from scratch in five minutes
- User guide — concepts, parameters, data wiring, and array jobs
- Scheduler backends — configure Slurm, PBS, LSF, SGE, or Flux
- CLI reference — all subcommands and flags
- Python reference — submission API and run handles
- API reference — auto-generated from docstrings