Skip to content

reflow

Define HPC workflows as decorated Python functions. Wire data between tasks with type annotations. Submit to any scheduler with one command.

License CI codecov PyPI Python Versions

Works with

pip install reflow-hpc

A quick taste

from typing import Annotated
from reflow import Workflow, Param, Result, RunDir

wf = Workflow("etl")

@wf.job(cpus=2, time="00:10:00", mem="4G")
def extract(
    source: Annotated[str, Param(help="Input file path")],
) -> list[str]:
    """Read a data source and split it into chunks."""
    return [f"chunk_{i}" for i in range(5)]

@wf.job(array=True, cpus=4, time="01:00:00", mem="8G")
def transform(
    chunk: Annotated[str, Result(step="extract")],
) -> str:
    """Process one chunk. Runs as a parallel array job."""
    return chunk.upper()

@wf.job(time="00:05:00")
def load(
    results: Annotated[list[str], Result(step="transform")],
) -> str:
    """Collect all results."""
    return f"loaded {len(results)} items"

if __name__ == "__main__":
    wf.cli()
$ python pipeline.py submit --run-dir /scratch/run1 --source data.csv
Created run etl-20260301-a1b2 in /scratch/run1

$ python pipeline.py status etl-20260301-a1b2

How it works

  1. extract returns a list of 5 items.
  2. reflow fans out — it submits 5 parallel transform jobs, one per item.
  3. reflow gathers — it collects all outputs into a list for load.
  4. Dependencies, caching, and retries are handled automatically.

Features

Decorator-driven — define tasks with @wf.job(), wire data with Result, and let reflow handle the rest.

Automatic fan-out — return a list, mark the next step as array=True, and reflow submits one job per item.

Merkle-DAG caching — re-runs skip tasks whose inputs haven't changed.

Reusable flows — build a library of Flow objects and compose them with wf.include(flow).

Auto-generated CLIwf.cli() gives you submit, status, cancel, retry, and more.

Multi-scheduler — works with Slurm, PBS, LSF, SGE, and Flux. Write scheduler-agnostic config once.

Local execution — run the full pipeline on your laptop with wf.run_local() for development and testing.

Installation

pip install reflow-hpc

Requires Python 3.10+.

Next steps