Immunopipe Configuration Generator (Main Skill)
Purpose: Master skill for generating immunopipe pipeline configurations. Routes to individual process skills and determines pipeline architecture based on analysis requirements.
When to Use This Skill
- •User wants to create/modify immunopipe configuration files
- •Need to determine which processes to enable based on analysis goals
- •Need to configure pipeline-level options (name, outdir, forks, scheduler)
- •Need routing to specific process configuration skills
Pipeline Architecture Decision Tree
Step 1: Data Type Assessment
Ask the user about their data:
- •
Do you have scRNA-seq data?
- •If YES → RNA analysis processes needed
- •If NO → Cannot proceed (RNA data required)
- •
Do you have scTCR-seq or scBCR-seq data?
- •If YES → Enable TCR/BCR processes (TCR route)
- •If NO → RNA-only analysis (No-TCR route)
- •
Is your RNA data already processed in a Seurat object?
- •If YES → Use
LoadingRNAFromSeuratinstead ofSampleInfo+SeuratPreparing - •If NO → Use standard input via
SampleInfo
- •If YES → Use
Step 2: Analysis Goals
Ask what analyses they want to perform:
| Goal | Required Processes | Routing |
|---|---|---|
| Basic clustering & visualization | SampleInfo, SeuratPreparing, SeuratClustering, SeuratClusterStats | Use sampleinfo, seuratpreparing, seuratclustering, seuratclusterstats skills |
| T/B cell selection | Add TOrBCellSelection | Use torbcellselection skill |
| Cell type annotation | Add CellTypeAnnotation or SeuratMap2Ref | Use celltypeannotation or seuratmap2ref skills |
| Marker finding | Add ClusterMarkers or MarkersFinder | Use clustermarkers or markersfinder skills |
| TCR clonotype analysis | Add CDR3Clustering, TESSA, ClonalStats | Use cdr3clustering, tessa, clonalstats skills |
| Cell-cell communication | Add CellCellCommunication | Use cellcellcommunication skill |
| Pathway enrichment | Add ScFGSEA | Use scfgsea skill |
| Metabolic analysis | Add ScrnaMetabolicLandscape | Use scrnametaboliclandscape skill |
| Differential expression | Add PseudoBulkDEG | Use pseudobulkdeg skill |
Step 3: Essential vs Optional Processes
Essential Processes (always needed for TCR route):
- •
SampleInfo(orLoadingRNAFromSeurat) - •
ScRepLoading(if TCR/BCR data present) - •
SeuratPreparing(unless loading from prepared Seurat object) - •
SeuratClustering - •
SeuratClusterStats
Essential Processes (RNA-only route):
- •
SampleInfo(orLoadingRNAFromSeurat) - •
SeuratPreparing - •
SeuratClustering - •
SeuratClusterStats
Optional Processes (enable only if requested):
- •
TOrBCellSelection- T/B cell separation - •
SeuratClusteringOfAllCells- Clustering before T/B selection - •
ClusterMarkersOfAllCells- Markers before T/B selection - •
TopExpressingGenesOfAllCells- Top genes before T/B selection - •
CellTypeAnnotation- Automated cell type annotation - •
SeuratMap2Ref- Reference-based annotation - •
SeuratSubClustering- Sub-clustering analysis - •
ClusterMarkers- Differential expression between clusters - •
TopExpressingGenes- Top expressed genes per cluster - •
MarkersFinder- Flexible marker finding - •
ModuleScoreCalculator- Module/pathway scoring - •
ScRepCombiningExpression- TCR + RNA integration - •
CDR3Clustering- TCR CDR3 clustering - •
TESSA- TCR-specific analysis - •
CDR3AAPhyschem- CDR3 physicochemical properties - •
ClonalStats- Clonality statistics - •
CellCellCommunication- Ligand-receptor analysis - •
CellCellCommunicationPlots- Communication plots - •
ScFGSEA- Fast gene set enrichment - •
PseudoBulkDEG- Pseudo-bulk differential expression - •
ScrnaMetabolicLandscape- Comprehensive metabolic analysis
Pipeline-Level Configuration
Basic Pipeline Options
name = "my_pipeline" # Pipeline name (affects workdir and outdir) outdir = "./output" # Output directory (default: ./<name>-output) loglevel = "info" # Logging level: debug, info, warning, error forks = 4 # Number of parallel jobs (adjust based on CPU cores) cache = true # Enable caching (recommended) error_strategy = "halt" # halt, ignore, or retry num_retries = 3 # Number of retries if error_strategy = "retry"
Scheduler Configuration
Local execution (default):
scheduler = "local"
SLURM cluster:
scheduler = "slurm"
[scheduler_opts]
qsub_opts = "-p general -q general -N {job.name} -t {job.index}"
SGE cluster:
scheduler = "sge" [scheduler_opts] qsub_opts = "-V -cwd -j yes"
Google Cloud Batch:
# Use: immunopipe gbatch instead of immunopipe # See gbatch skill for configuration
Plugin Options
[plugin_opts.report] filters = ["name:Filter"] # Filter processes in report [plugin_opts.runinfo] # Runinfo plugin enabled by default
Routing to Process Skills
When user needs specific process configuration, route to the appropriate skill:
Core Input Processes
- •SampleInfo: Use
sampleinfoskill - •LoadingRNAFromSeurat: Use
loadingrnafromseuratskill - •ScRepLoading: Use
screploadingskill
Preprocessing Processes
- •SeuratPreparing: Use
seuratpreparingskill
Clustering Processes
- •SeuratClustering: Use
seuratclusteringskill - •SeuratClusteringOfAllCells: Use
seuratclusteringofallcellsskill - •SeuratSubClustering: Use
seuratsubclusteringskill
Cell Selection
- •TOrBCellSelection: Use
torbcellselectionskill
Annotation Processes
- •CellTypeAnnotation: Use
celltypeannotationskill - •SeuratMap2Ref: Use
seuratmap2refskill
Marker Analysis
- •ClusterMarkers: Use
clustermarkersskill - •ClusterMarkersOfAllCells: Use
clustermarkersofallcellsskill - •MarkersFinder: Use
markersfinderskill - •TopExpressingGenes: Use
topexpressinggenesskill - •TopExpressingGenesOfAllCells: Use
topexpressinggenesofallcellsskill
TCR/BCR Analysis
- •ScRepCombiningExpression: Use
screpcombiningexpressionskill - •CDR3Clustering: Use
cdr3clusteringskill - •TESSA: Use
tessaskill - •CDR3AAPhyschem: Use
cdr3aaphyschemskill - •ClonalStats: Use
clonalstatsskill
Downstream Analysis
- •ModuleScoreCalculator: Use
modulescorecalculatorskill - •CellCellCommunication: Use
cellcellcommunicationskill - •CellCellCommunicationPlots: Use
cellcellcommunicationplotsskill - •SeuratClusterStats: Use
seuratclusterstatsskill - •ScFGSEA: Use
scfgseaskill - •PseudoBulkDEG: Use
pseudobulkdegskill
Metabolic Analysis
- •ScrnaMetabolicLandscape: Use
scrnametaboliclandscapeskill
Configuration File Structure
A complete TOML configuration file has three sections:
# 1. PIPELINE-LEVEL OPTIONS name = "my_pipeline" outdir = "./output" forks = 4 # 2. PROCESS-LEVEL OPTIONS [ProcessName] cache = true forks = 2 # Override pipeline-level forks for this process [ProcessName.in] # Input files specification [ProcessName.envs] # Environment variables (process parameters) # 3. GOOGLE BATCH OPTIONS (if using immunopipe gbatch) [cli-gbatch] project = "my-gcp-project" region = "us-central1"
Example Workflows
Example 1: Basic TCR Analysis
User request: "I have scRNA-seq and scTCR-seq data. I want basic analysis with T cell selection."
Response:
- •Enable essential TCR processes:
SampleInfo,ScRepLoading,SeuratPreparing,SeuratClustering,SeuratClusterStats - •Enable T cell selection:
SeuratClusteringOfAllCells,TOrBCellSelection - •Route to
sampleinfoskill to configure input files - •Route to each process skill for configuration
Minimal config:
name = "tcr_analysis" forks = 4 [SampleInfo.in] infile = ["sample_info.txt"] [SeuratClusteringOfAllCells] [TOrBCellSelection]
Example 2: Advanced RNA-only Analysis
User request: "RNA-only data. I need clustering, cell type annotation, marker finding, and pathway enrichment."
Response:
- •Enable essential RNA processes:
SampleInfo,SeuratPreparing,SeuratClustering,SeuratClusterStats - •Add requested analyses:
CellTypeAnnotation,ClusterMarkers,ScFGSEA - •Route to individual skills for configuration
Example 3: Loading from Prepared Seurat Object
User request: "I already have a processed Seurat object. I want to run TCR analysis."
Response:
- •Use
LoadingRNAFromSeuratinstead ofSampleInfo+SeuratPreparing - •Enable TCR processes:
ScRepLoading,SeuratClustering, etc. - •Set
prepared = trueinLoadingRNAFromSeuratto skip preprocessing
Important Notes
Process Dependencies
Some processes have dependencies:
- •
ScRepCombiningExpressionrequires bothScRepLoadingand RNA input - •
ClusterMarkersrequiresSeuratClustering - •
TOrBCellSelectionusually followsSeuratClusteringOfAllCells - •
CellCellCommunicationrequires clustering to be complete
Mutually Exclusive Options
- •Use EITHER
SampleInfoORLoadingRNAFromSeuratas entry point (not both) - •If using
TOrBCellSelection, typically enableSeuratClusteringOfAllCellsfirst - •
CellTypeAnnotationandSeuratMap2Refserve similar purposes (can use both, but one usually sufficient)
Cache Strategy
- •Set
cache = "force"at pipeline level to reuse all previous results - •Set
cache = falsefor specific process to force re-run - •Useful when tweaking visualization parameters without re-running analysis
Configuration Validation
After generating configuration, validate with:
python -m immunopipe.validate_config config.toml
External References
When process options reference external packages, expand them:
Seurat Functions
- •When seeing
Seurat::FunctionName, check: https://satijalab.org/seurat/reference/ - •Common functions:
FindMarkers(),FindClusters(),SCTransform(),RunUMAP()
Plotthis Functions
- •Plot types map to functions:
bar→BarPlot,box→BoxPlot - •Full reference: https://pwwang.github.io/plotthis/reference/
DESeq2 Design
- •For
PseudoBulkDEG, design formulas use DESeq2 syntax - •Reference: https://bioconductor.org/packages/release/bioc/html/DESeq2.html
GSEA Databases
- •For
ScFGSEA, GMT files from MSigDB - •Reference: https://www.gsea-msigdb.org/gsea/msigdb/
CellChat Database
- •For
CellCellCommunication, CellChat databases - •Reference: http://www.cellchat.org/
Workflow Summary
- •Assess data type (RNA-only vs TCR/BCR)
- •Determine analysis goals (clustering, annotation, TCR analysis, etc.)
- •Select essential processes based on data type
- •Add optional processes based on goals
- •Configure pipeline-level options (name, forks, scheduler)
- •Route to individual process skills for detailed configuration
- •Generate complete TOML file
- •Validate configuration before running
Quick Start Templates
For quick starts, use these templates:
- •Basic TCR:
basic-tcrtemplate skill - •Basic RNA-only:
basic-rnatemplate skill - •Advanced TCR:
advanced-tcrtemplate skill - •Metabolic analysis:
metabolictemplate skill - •Cell communication:
communicationtemplate skill
Error Prevention
Common configuration errors to avoid:
- •Missing input specification: Always set
[ProcessName.in]for entry processes - •TCR data without ScRepLoading: If TCRData/BCRData columns exist, enable
ScRepLoading - •Contradictory process enablement: Don't enable both "OfAllCells" and regular versions without
TOrBCellSelection - •Invalid gene names: Use human gene symbols (uppercase) or mouse (title case)
- •Path issues: Use absolute paths or paths relative to config file location
- •Resource limits: Set appropriate
forksbased on available CPU/memory
Next Steps
After generating config:
- •Save to
.tomlfile (e.g.,config.toml) - •Run:
immunopipe config.toml - •Or use web UI:
pipen board @config.toml - •Or use Google Batch:
immunopipe gbatch config.toml
For modifications, route to specific process skills based on what needs to change.