ClusterMarkersOfAllCells Process Configuration
Purpose
Finds marker genes for clusters of ALL cells before T/B cell selection. This process identifies differentially expressed genes across unsupervised clusters to help identify broad cell types (T cells, B cells, Myeloid cells, NK cells, etc.) in mixed immune cell populations.
When to Use
- •After
SeuratClusteringOfAllCells: Runs on all cells before T/B selection - •Before
TOrBCellSelection: Provides markers to identify which clusters are T/B cells - •Broad cell type identification: Distinguish major immune cell types from mixed populations
- •Mixed cell populations: When your data contains T, B, Myeloid, NK, and other cell types
- •Initial cell typing: First-pass identification before detailed annotation
- •Data quality check: Verify expected cell types are present in your data
Configuration Structure
Process Enablement
[ClusterMarkersOfAllCells] cache = true
Input Specification
[ClusterMarkersOfAllCells.in] srtobj = ["SeuratClusteringOfAllCells"] # Accepts output from SeuratClusteringOfAllCells process
Environment Variables
All parameters are inherited from ClusterMarkers and MarkersFinder:
[ClusterMarkersOfAllCells.envs]
# Parallel computing
ncores = 1
# Grouping (uses seurat_clusters by default)
group_by = null # null = use Seurat::Idents() (usually "seurat_clusters")
# Statistical test parameters (passed to Seurat::FindMarkers())
test.use = "wilcox" # wilcox (Wilcoxon), bimod, roc, t, negbinom, poisson
min.pct = 0.1 # Only test genes detected in >=10% of cells
logfc.threshold = 0.25 # Minimum log2 fold change
# Marker filtering
sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 0" # Filter for significant markers
# Enrichment analysis
dbs = ["KEGG_2021_Human", "MSigDB_Hallmark_2020"]
enrich_style = "enrichr" # enrichr or clusterprofiler
# Error handling
error = false # Don't error out if no markers found
# Visualization
marker_plots_defaults = {"order_by": "desc(avg_log2FC)"}
allmarker_plots = {"Top 10 markers of all clusters": {"plot_type": "heatmap"}}
External References
Seurat FindMarkers Parameters
- •Full reference: https://satijalab.org/seurat/reference/findmarkers
- •Statistical tests:
test.useparameter- •
"wilcox": Wilcoxon Rank Sum test (default, recommended) - •
"roc": Receiver Operating Characteristic - •
"t": Student's t-test - •
"negbinom": Negative binomial (requires DESeq2) - •
"poisson": Poisson test
- •
- •Common arguments (use
-instead of.in TOML):- •
min-pct: Minimum detection percentage in either group - •
logfc-threshold: Minimum log2 fold change threshold - •
only-pos: Only return positive markers - •
min-diff-pct: Minimum difference in detection percentage
- •
Enrichment Databases
- •MSigDB: https://www.gsea-msigdb.org/gsea/msigdb/
- •KEGG: https://www.genome.jp/kegg/
- •Reactome: https://reactome.org/
- •GO: http://geneontology.org/
Configuration Examples
Minimal Configuration
[SeuratClusteringOfAllCells] [ClusterMarkersOfAllCells] [ClusterMarkersOfAllCells.in] srtobj = ["SeuratClusteringOfAllCells"]
Standard Marker Finding
[SeuratClusteringOfAllCells] [ClusterMarkersOfAllCells] [ClusterMarkersOfAllCells.in] srtobj = ["SeuratClusteringOfAllCells"] [ClusterMarkersOfAllCells.envs] # Find markers for broad cell type identification dbs = ["MSigDB_Hallmark_2020", "KEGG_2021_Human"] sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 0.25" # Generate key visualizations [ClusterMarkersOfAllCells.envs.marker_plots."Volcano Plot (log2FC)"] plot_type = "volcano_log2fc" [ClusterMarkersOfAllCells.envs.allmarker_plots."Top 10 markers of all clusters"] plot_type = "heatmap" [ClusterMarkersOfAllCells.envs.enrich_plots."Bar Plot"] plot_type = "bar" top_term = 10
Common Patterns
Pattern 1: Broad Cell Type Markers
[ClusterMarkersOfAllCells.envs] # Optimized for distinguishing T/B/Myeloid/NK cells min-pct = 0.1 # Require detection in >=10% of cells logfc-threshold = 0.25 # Minimum log2 fold change test.use = "wilcox" # Fast and robust sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 0" # Visualize markers to identify cell types [ClusterMarkersOfAllCells.envs.allmarker_plots."Top 20 markers per cluster"] plot_type = "heatmap" # Check for expected markers in outputs # T cells: CD3D, CD3E, CD3G, CD4, CD8A # B cells: CD19, MS4A1 (CD20), CD79A, CD79B # Myeloid: CD14, LYZ, FCGR3A, CD68 # NK cells: NCAM1 (CD56), KLRD1 (CD94), NKG7
Pattern 2: Quick Wilcoxon for Large Datasets
[ClusterMarkersOfAllCells.envs] # Fast analysis for large datasets (>50k cells) ncores = 8 # Use multiple cores test.use = "wilcox" min-pct = 0.15 # More stringent to reduce noise logfc-threshold = 0.3 sigmarkers = "p_val_adj < 0.01 & avg_log2FC > 0.5" # Skip enrichment to save time dbs = [] # Generate only essential plots [ClusterMarkersOfAllCells.envs.allmarker_plots."Top markers heatmap"] plot_type = "heatmap"
Pattern 3: Identify T/B Cell Clusters
[ClusterMarkersOfAllCells.envs] # Focus on finding T and B cell markers for selection sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 1" # Will help identify which clusters express: # T cell markers: CD3D, CD3E, CD3G # B cell markers: CD19, MS4A1, CD79A [ClusterMarkersOfAllCells.envs.allmarker_plots."All markers heatmap"] plot_type = "heatmap"
Difference from ClusterMarkers
| Aspect | ClusterMarkersOfAllCells | ClusterMarkers |
|---|---|---|
| Timing | BEFORE TOrBCellSelection | AFTER TOrBCellSelection |
| Data Scope | ALL cells (mixed population) | SELECTED T/B cells only |
| Purpose | Identify broad cell types | Fine-grained sub-clusters |
| Typical markers | CD3, CD19, CD14, NK markers | Activation, differentiation markers |
| Use case | "Which clusters are T/B/Myeloid?" | "What subtypes exist within T cells?" |
| Upstream | SeuratClusteringOfAllCells | SeuratClustering (post-selection) |
| Downstream | TOrBCellSelection | Cell type annotation, downstream analysis |
Key insight: Use ClusterMarkersOfAllCells when you need to separate T/B cells from other cell types. Use ClusterMarkers when you want to analyze sub-clusters within already-purified T or B cell populations.
Dependencies
Upstream Processes
- •
SeuratClusteringOfAllCells: Required - provides clustered object withseurat_clustersmetadata - •
SeuratPreparing: Indirect - provides normalized Seurat object - •
SampleInfoorLoadingRNAFromSeurat: Entry point for data
Downstream Processes
- •
TOrBCellSelection: Primary consumer - uses marker results to select T/B cells - •
TopExpressingGenesOfAllCells: Optional complementary analysis
Validation Rules
Required Inputs
[ClusterMarkersOfAllCells.in] srtobj = ["SeuratClusteringOfAllCells"] # Must be specified
Process Enablement
- •Process automatically enabled when
SeuratClusteringOfAllCellsis in config - •No need to explicitly set
[ClusterMarkersOfAllCells]ifSeuratClusteringOfAllCellsis enabled
Parameter Constraints
- •
test.use: Must be one of"wilcox","roc","t","negbinom","poisson" - •
min-pct: Should be between 0 and 1 (e.g., 0.1 = 10%) - •
logfc-threshold: Numeric value (log2 scale) - •
sigmarkers: Valid dplyr filter expression
Common Errors
- •Missing clustering: Ensure
SeuratClusteringOfAllCellsruns first - •No markers found: Adjust
sigmarkersorlogfc-thresholdif too stringent - •Memory issues: Reduce
ncoresor subset data with large datasets
Troubleshooting
Issue: No significant markers found
Symptoms: Empty output directory or warning about no markers
Solutions:
[ClusterMarkersOfAllCells.envs] # Less stringent thresholds logfc-threshold = 0.1 # Lower fold change requirement min-pct = 0.05 # Lower detection percentage sigmarkers = "p_val_adj < 0.1" # More relaxed p-value # Or check data quality # - Are cells properly clustered? # - Is expression matrix normalized? # - Are there enough cells per cluster (>30 recommended)?
Issue: Too many markers (slow enrichment)
Symptoms: Process takes very long, memory issues
Solutions:
[ClusterMarkersOfAllCells.envs] # More stringent filtering logfc-threshold = 0.5 min-pct = 0.2 sigmarkers = "p_val_adj < 0.01 & avg_log2FC > 1" # Reduce enrichment databases dbs = ["MSigDB_Hallmark_2020"] # Or skip enrichment entirely dbs = []
Issue: Can't identify T/B cell clusters
Symptoms: Markers don't show clear T/B cell signatures
Solutions:
- •
Check marker gene presence:
toml# Verify expected markers are in your data # Use SeuratClusterStats to visualize: [SeuratClusterStats.envs.features_defaults] features = ["CD3D", "CD3E", "CD19", "MS4A1", "CD14", "LYZ"]
- •
Adjust clustering parameters:
toml[SeuratClusteringOfAllCells.envs] res = 0.5 # Try different resolutions (0.2-1.5)
- •
Check data quality:
- •Are genes properly normalized?
- •Are there enough cells per cluster?
- •Is species correct (human vs mouse gene symbols)?
Issue: Process not running
Symptoms: Process skipped in workflow
Solutions:
- •Verify
SeuratClusteringOfAllCellsis in config - •Check dependencies are running correctly
- •Ensure TCR data requires T/B selection (not all T cells already)
Typical Marker Genes for Identification
| Cell Type | Positive Markers | Negative Markers |
|---|---|---|
| T cells | CD3D, CD3E, CD3G, CD4, CD8A | CD19, MS4A1, CD14 |
| B cells | CD19, MS4A1 (CD20), CD79A, CD79B | CD3E, CD3D, CD14 |
| Monocytes | CD14, LYZ, FCGR3A, S100A8 | CD3E, CD19 |
| NK cells | NCAM1 (CD56), KLRD1 (CD94), NKG7 | CD3E, CD19, CD14 |
| Dendritic cells | FCER1A, CST3 | CD3E, CD19, CD14 |
| Megakaryocytes | PPBP, PF4 | CD3E, CD19, CD14 |
Use these marker lists to identify which clusters correspond to which cell types in your allmarker_plots heatmaps.