ClusterMarkers Process Configuration
Purpose
Finds differentially expressed genes (markers) for clusters of T/B cells using Seurat's FindMarkers function. Performs statistical testing between clusters, identifies cluster-defining genes, and automatically runs pathway enrichment analysis (via Enrichr) on significant markers. Generates publication-ready visualizations including volcano plots, dot plots, heatmaps, and enrichment plots.
When to Use
- •After SeuratClustering: Essential for cluster interpretation and annotation
- •Cluster annotation: Identify marker genes to assign biological meaning to clusters
- •Publication preparation: Generate marker tables, volcano plots, and enrichment figures
- •Cell type characterization: Understand functional differences between cell populations
- •Comparative analysis: Compare clusters to find unique gene expression signatures
Configuration Structure
Process Enablement
[ClusterMarkers] cache = true # Cache results for faster re-runs with different visualizations
Input Specification
[ClusterMarkers.in] srtobj = ["SeuratClustering"] # Seurat object with cluster assignments
Environment Variables
Core Parameters
[ClusterMarkers.envs]
# Number of cores for parallel computation
ncores = 1 # int; Parallelize Seurat procedures
# Subset cells before marker finding (R expression)
subset = "seurat_clusters %in% c('c1', 'c2', 'c3')" # Optional
# Cache location for intermediate results
cache = "/tmp" # Path; Set to false to disable caching
# Assay to use for marker finding
assay = "RNA" # Default: uses active assay
# Error on no markers found
error = false # bool; If true, fail if no markers found
Statistical Test Selection
[ClusterMarkers.envs] # Statistical test for differential expression test.use = "wilcox" # Default
Available tests:
- •
"wilcox": Wilcoxon rank sum test (default, fast) - •
"wilcox_limma": Limma implementation (Seurat v4 compatibility) - •
"MAST": GLM with cellular detection rate covariate (recommended) - •
"DESeq2": Negative binomial model (robust, requires counts) - •
"roc": ROC analysis (AUC-based classification) - •
"t": Student's t-test - •
"tobit": Tobit test for censored data - •
"bimod": Likelihood-ratio test for bimodal expression - •
"poisson": Poisson distribution (UMI datasets only) - •
"negbinom": Negative binomial (UMI datasets only) - •
"LR": Logistic regression (latent.vars supported)
Test selection guidelines:
- •Default:
"wilcox"for speed and reliability - •Publication-quality:
"MAST"for single-cell-specific modeling - •Bulk-like DE:
"DESeq2"for rigorous statistical testing - •UMI data:
"negbinom"or"poisson"for count-based models - •Classification:
"roc"for AUC-based marker ranking
Threshold Parameters (Seurat FindMarkers)
[ClusterMarkers.envs] # Minimum log2 fold change threshold logfc.threshold = 0.25 # float; Default: 0.25 # Minimum percentage of cells expressing gene min.pct = 0.1 # float; Range: 0.0-1.0 # Minimum difference in detection percentage min.diff.pct = -Inf # float; Default: no limit # Only positive markers (higher in ident.1 group) only.pos = false # bool; Default: false (both directions) # Maximum cells per identity (downsampling) max.cells.per.ident = Inf # int; No downsampling by default # Minimum cells expressing gene (poisson/negbinom tests) min.cells.feature = 3 # int # Minimum cells per group min.cells.group = 3 # int
Note: Use - to replace . in parameter names (e.g., logfc.threshold, not logfc.threshold)
Significant Markers Filter (for Enrichment)
[ClusterMarkers.envs] # Filter markers for enrichment analysis (R expression) sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 0" # Default # Variables available: p_val, avg_log2FC, pct.1, pct.2, p_val_adj # Example: "p_val_adj < 0.05 & abs(avg_log2FC) > 1" (both directions)
Enrichment Analysis
[ClusterMarkers.envs] # Databases for pathway enrichment dbs = ["KEGG_2021_Human", "MSigDB_Hallmark_2020"] # Default # Enrichment style enrich_style = "enrichr" # Options: "enrichr", "clusterprofiler", "clusterProfiler"
Available databases (enrichit):
- •
"KEGG_2021_Human","KEGG": KEGG pathways - •
"MSigDB_Hallmark_2020","Hallmark": MSigDB Hallmark gene sets - •
"GO_Biological_Process_2025": Gene Ontology Biological Process - •
"GO_Cellular_Component_2025": Gene Ontology Cellular Component - •
"GO_Molecular_Function_2025": Gene Ontology Molecular Function - •
"Reactome_Pathways_2024","Reactome": Reactome pathways - •
"WikiPathways_2024_Human","WikiPathways": WikiPathways - •
"BioCarta_2016": BioCarta pathways
More databases: https://maayanlab.cloud/Enrichr/#libraries
Visualization Parameters
[ClusterMarkers.envs]
# Marker plots configuration
marker_plots_defaults = {order_by = "desc(avg_log2FC)"}
# All markers plots (across clusters)
allmarker_plots = {"Top 10 markers of all clusters": {plot_type = "heatmap"}}
# Enrichment plots (all clusters)
allenrich_plots = {} # Empty by default
# Marker plots (per cluster)
marker_plots = {} # Default: volcano plots and dot plots
# Enrichment plots (per cluster)
enrich_plots = {} # Default: bar plot
# Overlap analysis (venn/upset)
overlaps = {} # Empty by default
External References
Seurat FindMarkers
https://satijalab.org/seurat/reference/findmarkers
- •Core differential expression function
- •Statistical tests: wilcox, MAST, DESeq2, ROC, t-test, etc.
- •Threshold parameters control sensitivity and speed
Enrichr Databases
https://maayanlab.cloud/Enrichr/#libraries
- •Comprehensive gene set enrichment collection
- •KEGG, GO, Reactome, MSigDB, WikiPathways
biopipen MarkersFinder
https://pwwang.github.io/biopipen/api/biopipen.ns.scrna/#biopipen.ns.scrna.MarkersFinder
- •Parent process with extended functionality
- •Visualization: biopipen.utils::VizDEGs, scplotter::EnrichmentPlot
Configuration Examples
Minimal Configuration
[ClusterMarkers] [ClusterMarkers.in] srtobj = ["SeuratClustering"]
Result: Default wilcox test, standard thresholds, hallmark + KEGG enrichment
Standard Marker Finding (Wilcoxon)
[ClusterMarkers] [ClusterMarkers.in] srtobj = ["SeuratClustering"] [ClusterMarkers.envs] test.use = "wilcox" logfc.threshold = 0.25 min.pct = 0.1 sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 0"
Publication-Ready MAST Analysis
[ClusterMarkers] [ClusterMarkers.in] srtobj = ["SeuratClustering"] [ClusterMarkers.envs] test.use = "MAST" logfc.threshold = 0.25 min.pct = 0.1 sigmarkers = "p_val_adj < 0.01 & abs(avg_log2FC) > 1" ncores = 4
DESeq2 for Robust Analysis
[ClusterMarkers] [ClusterMarkers.in] srtobj = ["SeuratClustering"] [ClusterMarkers.envs] test.use = "DESeq2" logfc.threshold = 0.5 # More stringent min.pct = 0.15 sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 0.5"
Note: DESeq2 requires count data in the Seurat object
Stringent Thresholds for High-Confidence Markers
[ClusterMarkers.envs] logfc.threshold = 0.58 # 1.5-fold change (2^0.58) min.pct = 0.25 # Expressed in >25% cells min.diff.pct = 0.1 # 10% difference in detection only.pos = true # Positive markers only sigmarkers = "p_val_adj < 0.01 & avg_log2FC > 1"
Subset Specific Clusters
[ClusterMarkers.envs]
# Only analyze clusters c1, c2, c3 to save computation
subset = "seurat_clusters %in% c('c1', 'c2', 'c3')"
Custom Enrichment Databases
[ClusterMarkers.envs] # Use different pathway databases dbs = ["Reactome_Pathways_2024", "GO_Biological_Process_2025"] enrich_style = "clusterprofiler"
Positive Markers Only (Cluster-Specific)
[ClusterMarkers.envs] only.pos = true sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 0"
Downsample Large Clusters
[ClusterMarkers.envs] max.cells.per.ident = 5000 # Limit to 5000 cells per cluster random.seed = 42 # Reproducible downsampling
Common Patterns
Pattern 1: Quick Wilcoxon Test (Default)
[ClusterMarkers] [ClusterMarkers.in] srtobj = ["SeuratClustering"]
Use case: Initial exploration, speed priority
Pattern 2: Publication-Quality MAST
[ClusterMarkers] [ClusterMarkers.in] srtobj = ["SeuratClustering"] [ClusterMarkers.envs] test.use = "MAST" logfc.threshold = 0.25 min.pct = 0.1 ncores = 8
Use case: Single-cell publication, accounts for detection rate
Pattern 3: Both Positive and Negative Markers
[ClusterMarkers.envs] only.pos = false sigmarkers = "p_val_adj < 0.05 & abs(avg_log2FC) > 0.5"
Use case: Find genes upregulated and downregulated in each cluster
Pattern 4: Stringent Top Markers
[ClusterMarkers.envs] logfc.threshold = 1.0 # 2-fold change min.pct = 0.3 sigmarkers = "p_val_adj < 0.001 & avg_log2FC > 1" only.pos = true
Use case: High-confidence cluster markers for annotation
Pattern 5: Custom Enrichment with Multiple DBs
[ClusterMarkers.envs] dbs = [ "KEGG_2021_Human", "MSigDB_Hallmark_2020", "GO_Biological_Process_2025", "Reactome_Pathways_2024" ] enrich_style = "enrichr"
Pattern 6: ROC Analysis for Classification
[ClusterMarkers.envs] test.use = "roc" logfc.threshold = 0.1 sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 0"
Use case: Find markers with highest AUC for classification
Dependencies
Upstream Processes
- •Required:
SeuratClustering(provides cluster assignments) - •Alternative:
SeuratSubClustering(if sub-clustering analysis) - •Context: Runs after
TOrBCellSelectionif T/B cell selection is enabled
Downstream Processes
- •CellTypeAnnotation: Uses markers for automated cell type assignment
- •SeuratMap2Ref: Reference-based annotation may use marker profiles
- •ScFGSEA: Gene set enrichment on identified markers
- •ModuleScoreCalculator: Score marker genes across cells
Validation Rules
Statistical Test Constraints
- •
test.usemust be one of: wilcox, wilcox_limma, MAST, DESeq2, roc, t, tobit, bimod, poisson, negbinom, LR - •DESeq2 requires count data (automatically uses counts slot)
- •MAST, poisson, negbinom support
latent.varsfor additional covariates
Threshold Validation
- •
logfc.threshold: ≥ 0 (typical range: 0.1-1.0) - •
min.pct: 0.0-1.0 (typical: 0.1-0.3) - •
min.diff.pct: ≥ -Inf (typical: 0.05-0.2) - •
min.cells.feature: ≥ 1 (default: 3) - •
min.cells.group: ≥ 1 (default: 3)
sigmarkers Expression
- •Must be valid R/dplyr expression
- •Available variables: p_val, avg_log2FC, pct.1, pct.2, p_val_adj
- •Use
&for AND,|for OR,!for NOT
Database Constraints
- •
dbsmust be valid enrichit database names or GMT file paths - •Custom GMT files: use absolute paths or paths relative to config file
Troubleshooting
Issue: Too Many Markers Found
Symptoms: Thousands of markers, low statistical power
Solutions:
[ClusterMarkers.envs] logfc.threshold = 0.5 # Increase fold change threshold min.pct = 0.25 # Increase expression percentage min.diff.pct = 0.15 # Increase detection difference sigmarkers = "p_val_adj < 0.01 & avg_log2FC > 1" # Stricter filter
Issue: No Markers Found
Symptoms: Empty marker tables, no enrichment results
Solutions:
[ClusterMarkers.envs] logfc.threshold = 0.1 # Lower threshold min.pct = 0.05 # Lower expression requirement min.diff.pct = -Inf # Remove detection difference sigmarkers = "p_val_adj < 0.1 & avg_log2FC > 0.1" # Looser filter
Issue: Slow Performance
Symptoms: Marker finding takes hours
Solutions:
[ClusterMarkers.envs] ncores = 8 # Use more cores logfc.threshold = 0.5 # Higher threshold reduces genes tested max.cells.per.ident = 5000 # Downsample large clusters
Issue: DESeq2 Fails with Integrated Data
Symptoms: DESeq2 error on integrated Seurat object
Cause: DESeq2 requires count data, integrated objects have empty counts slot
Solution:
# Use SCTransform counts instead of integrated data [SeuratPreparing.envs] method = "SCTransform" integration_method = null # Skip integration for DESeq2 [ClusterMarkers.envs] test.use = "DESeq2"
Alternative: Use MAST or wilcox on integrated data
Issue: Enrichment Analysis Returns No Results
Symptoms: Empty enrichment tables/plots
Solutions:
[ClusterMarkers.envs] # Check sigmarkers filter is too strict sigmarkers = "p_val_adj < 0.1 & avg_log2FC > 0" # Add more databases dbs = ["KEGG_2021_Human", "MSigDB_Hallmark_2020", "Reactome_Pathways_2024"]
Issue: NA p-values in Results
Symptoms: Some markers have NA p-values
Cause: Insufficient cells per group or low expression variance
Solutions:
[ClusterMarkers.envs] min.cells.group = 10 # Increase minimum cells min.cells.feature = 5 # Increase minimum expressing cells
Issue: Different Test Methods Return Similar Results
Symptoms: wilcox and MAST return nearly identical gene lists
Cause: Strong markers are robust across methods
Solution: Use ROC analysis for alternative ranking:
[ClusterMarkers.envs] test.use = "roc"
Issue: Computationally Expensive Enrichment
Symptoms: Enrichment step takes very long
Solutions:
[ClusterMarkers.envs]
# Limit markers for enrichment
sigmarkers = "p_val_adj < 0.01 & avg_log2FC > 1"
# Use fewer databases
dbs = ["MSigDB_Hallmark_2020"]
# Subset clusters for analysis
subset = "seurat_clusters %in% c('c1', 'c2')"
Best Practices
- •Start with default wilcox test for initial exploration
- •Use MAST for publications (single-cell-specific modeling)
- •Set appropriate thresholds: logfc.threshold = 0.25-0.5, min.pct = 0.1-0.2
- •Filter for enrichment: Use sigmarkers to limit to high-confidence markers
- •Customize enrichment databases: Choose databases relevant to your study
- •Use both.pos = false to see upregulated and downregulated genes
- •Parallelize with ncores for large datasets
- •Subset clusters when analyzing many clusters to save computation
- •Validate markers: Check expression patterns in visualization
- •Reproducibility: Set random.seed for downsampling
Related Processes
- •ClusterMarkersOfAllCells: Marker finding before T/B cell selection
- •MarkersFinder: Extended parent process with more flexibility
- •TopExpressingGenes: Top expressed genes per cluster (non-DE)
- •SeuratClustering: Required upstream process for cluster assignments
- •CellTypeAnnotation: Uses markers for automated annotation