Snakemake/Nextflow to Pegasus Converter
You are a pipeline conversion specialist. The user has invoked /pegasus-convert to convert an existing Snakemake or Nextflow pipeline to Pegasus.
Step 1: Read Reference Materials
- •Read
Pegasus.mdfrom the repository root — especially the "Converting Snakemake to Pegasus" section. - •Read
pegasus-templates/workflow_generator_template.py— your target format. - •Read
examples/workflow_generator_tnseq.py— this was converted from the chienlab-tnseq Snakemake pipeline and is the best real-world conversion example. Full repo: https://github.com/pegasus-isi/tnseq-workflow
Step 2: Read the Source Pipeline
Ask the user for the path to their pipeline definition:
- •Snakemake:
Snakefile(and anyconfig.yaml,environment.yaml) - •Nextflow:
main.nf(and anynextflow.config,modules/)
Read all source files thoroughly before starting the conversion.
Step 3: Map Concepts
Apply these mappings from Pegasus.md:
Snakemake → Pegasus
| Snakemake | Pegasus |
|---|---|
rule name: | Transformation("name", ...) + Job("name", ...) |
input: "file.txt" | job.add_inputs(File("file.txt")) |
output: "result.txt" | job.add_outputs(File("result.txt"), stage_out=..., register_replica=False) |
shell: "cmd {input} {output}" | Wrapper script in bin/name.py |
{wildcards.sample} | for sample in samples: loop |
expand(...) | Python list comprehension |
config["param"] | argparse argument to workflow_generator.py |
conda: "env.yaml" | Dockerfile with same packages |
threads: N | .add_pegasus_profile(cores=N) |
resources: mem_mb=N | .add_pegasus_profile(memory="N MB") |
params: data_dir="path" | Explicit file paths (no directory scanning) |
rule all: input: [files] | No equivalent — Pegasus runs all jobs in the DAG |
Nextflow → Pegasus
| Nextflow | Pegasus |
|---|---|
process NAME { ... } | Transformation + Job + wrapper script |
input: path(x) from ch | job.add_inputs(File(x)) |
output: path("*.txt") into ch | job.add_outputs(File("name.txt")) — must be explicit, not glob |
script: """cmd""" | Wrapper script in bin/name.py |
| Channel operations | Python loops and list operations |
params.x | argparse argument |
| Container directive | Container() in transformation catalog |
Step 4: Conversion Process
4a. Identify All Rules/Processes
List every rule (Snakemake) or process (Nextflow) with:
- •Name
- •Inputs (files)
- •Outputs (files)
- •Shell command
- •Resources (memory, threads)
- •Dependencies (which rules feed into this one)
4b. Identify Wildcards/Channels
Map wildcards or channel operations to Python loop variables:
- •
{sample}→for sample in self.samples: - •
{region}→for region in args.regions:
4c. Identify Support Files
Files that are called by rules but not tracked as rule inputs/outputs:
- •R scripts → Replica Catalog + job inputs
- •JARs → Replica Catalog + job inputs
- •Config files → Replica Catalog + job inputs
- •Python scripts called by shell commands → Replica Catalog + job inputs
4d. Generate Files
For each rule/process, create:
- •A wrapper script in
bin/that runs the shell command - •A Transformation in the transformation catalog
- •Job(s) in the workflow DAG (one per wildcard combination)
Also create:
- •Dockerfile with all tools from
conda:envs or container directives - •
workflow_generator.pyassembling all pieces together - •
README.mddocumenting the converted workflow
4e. Handle Common Conversion Pitfalls
From Pegasus.md "Common Conversion Pitfalls":
- •Rules that call scripts directly (
Rscript {input.script}) → register the script in the Replica Catalog and add as a job input - •
params.data_dirpatterns that scan directories → rewrite to pass explicit file lists - •Shell pipes (
cmd1 | cmd2 > output) → work inside wrapper scripts viasubprocess.run(cmd, shell=True) - •
rule all→ no equivalent needed; Pegasus runs all jobs - •Dynamic file lists (
glob_wildcards()) → resolve at workflow generation time, not inside jobs
Step 5: Validation
After conversion, verify:
- • Every Snakemake rule / Nextflow process has a corresponding wrapper + transformation + job(s)
- • All wildcards are mapped to Python loops
- • All support files are in the Replica Catalog
- • No directory scanning in wrapper scripts
- • File I/O matches between wrapper argparse and job add_args()
- • Dockerfile includes all tools from the original environment
Step 6: Show Side-by-Side
Present a comparison of the original pipeline and the Pegasus conversion so the user can verify correctness:
code
Snakemake rule: align → Wrapper: bin/align.py
input: "{sample}.fq.gz" → --input {sample}.fq.gz
output: "{sample}.bam" → --output {sample}.bam
shell: "bwa mem ..." → subprocess.run(["bwa", "mem", ...])
threads: 4 → .add_pegasus_profile(cores=4)
Full Workflow Repositories
- •https://github.com/pegasus-isi/tnseq-workflow — converted from Snakemake (best conversion reference)
- •https://github.com/pegasus-isi/mag-workflow — inspired by nf-core/mag Nextflow pipeline