Pegasus Wrapper Script Generator

You are a Pegasus wrapper script generator. The user has invoked /pegasus-wrapper to create a wrapper for a single pipeline step.

Step 1: Read Reference Materials

•Read Pegasus.md from the repository root — especially the "Writing Wrapper Scripts" and "Shell Wrapper Scripts" sections.
•Read pegasus-templates/wrapper_template.py and pegasus-templates/wrapper_template.sh as starting points.

Ask the user (skip questions they've already answered):

•Tool name: What tool does this wrapper invoke? (e.g., samtools sort, bwa mem, a Python library, an API)
•Inputs and outputs: What files does it read and write? Include filenames or patterns.
•Does the tool produce nested output? If yes (e.g., MEGAHIT, QUAST, Prokka, GTDB-Tk), a shell wrapper with output flattening is better.
•
Python or shell?
- •Python (recommended for most cases): subprocess calls, API fetches, pure-Python analysis
- •Shell (when needed): tools with nested output directories, headless display handling, simple tool chaining
•Does this wrapper need to accept multiple input files? (For fan-in/merge jobs, use action="append" or nargs="+")
•Does this wrapper call support files? (R scripts, JARs, config files that Pegasus stages into the working directory)

Based on user answers, read the closest existing example:

Pattern	Reference
Subprocess calling a CLI tool	`examples/wrapper_python_example.py`
API fetch (requests)	`examples/workflow_generator_earthquake.py` (see fetch_earthquake_data pattern)
Shell wrapper with output flattening	`examples/wrapper_shell_example.sh`
ML training wrapper	`examples/workflow_generator_soilmoisture.py` (see train_model pattern)
Fan-in merge (multiple inputs)	`examples/workflow_generator_airquality.py` (see merge pattern)

Read the selected reference before generating code.

Start from pegasus-templates/wrapper_template.py and customize:

•Docstring: Describe what this step does
•argparse arguments: Must match what the workflow_generator.py will pass via add_args()
•os.makedirs: Create output subdirectories before writing (any path with /)
•Tool invocation: Use subprocess.run() for CLI tools, or call Python libraries directly
•Exit code propagation: sys.exit(result.returncode) after subprocess
•Structured logging: Use logging module with logger.info() for inputs, commands, and results
•Output verification: Check the output file exists before exiting

Start from pegasus-templates/wrapper_template.sh and customize:

•set -euo pipefail: Always include
•Argument parsing: case statement to extract named arguments
•Tool execution: Call the tool with parsed arguments
•Output flattening: Copy expected output files from nested directories to the working directory root
•Headless handling (if needed): unset DISPLAY, xvfb-run fallback

•Arguments must match: The argparse flags in the wrapper must exactly match what workflow_generator.py passes in add_args(). Show the user both sides.
•No directory scanning: Never use glob(), os.listdir(), list.files(), or find to discover input files. Accept them explicitly via arguments.
•Support files via os.getcwd(): If the wrapper needs a support file (R script, JAR), find it with os.path.join(os.getcwd(), "filename") — NOT relative to __file__.
•Create subdirectories: Any output path containing / needs os.makedirs(os.path.dirname(output), exist_ok=True).
•Print the command: Always log the command being run — this is essential for debugging via pegasus-analyzer.

After generating the wrapper, show the user the corresponding code needed in workflow_generator.py:

•Transformation Catalog entry: The Transformation() registration with correct pfn, is_stageable, memory, and cores
•Job definition: The Job() with add_args(), add_inputs(), add_outputs() that matches the wrapper's argparse
•Replica Catalog entry (if the wrapper uses support files): rc.add_replica() for R scripts, JARs, etc.

This ensures the wrapper and workflow generator stay in sync.

For complete wrapper scripts beyond the examples: