SeuratSubClustering Process Configuration
Purpose
Performs fine-grained re-clustering on specific subsets of cells (e.g., individual clusters, cell types, or custom subsets). Unlike Seurat::FindSubCluster which only finds subclusters within a single cluster, this process performs the complete clustering workflow (PCA, UMAP, FindNeighbors, FindClusters) on any subset of cells defined by metadata filters or cell barcode lists.
When to Use
- •Cluster heterogeneity analysis: When initial clustering identifies mixed cell populations within a cluster
- •Cell type sub-clustering: To resolve heterogeneity within annotated cell types (e.g., T cell subsets: CD4+, CD8+, naive, memory, effector)
- •Lineage-specific analysis: To examine substructure within major cell lineages
- •Differential sub-populations: When a cluster contains multiple biologically distinct populations (e.g., NK cells + CD4 T cells)
- •Multi-resolution exploration: To test different clustering granularities on specific cell subsets
- •Downstream marker discovery: When you need markers for sub-populations within larger clusters
Configuration Structure
Process Enablement
[SeuratSubClustering] cache = true # Cache intermediate results for faster re-runs
Input Specification
[SeuratSubClustering.in] srtobj = ["SeuratClustering"] # Path or reference to Seurat object
Environment Variables
Core Parameters
[SeuratSubClustering.envs]
# Number of cores for parallelization
ncores = 1 # int; Higher values speed up computation
# Metadata mutaters to define subset cells
# Applied BEFORE subsetting to create temporary columns
mutaters = {} # json; Dictionary of dplyr-like mutations
# Expression to subset cells (dplyr::filter syntax)
# Applied to metadata using tidyseurat::filter()
subset = "seurat_clusters == 'c3'" # str; Filter expression
# Cache location for intermediate results
cache = "/tmp" # Path; Set to false to disable caching
Sub-clustering Cases (Multiple Subsets)
[SeuratSubClustering.envs.cases] # Keys are case names (prefixes for outputs) # Values inherit envs parameters (except mutaters, cache) # If empty, default case "subcluster" is created
Case Naming Rules:
- •Case name becomes prefix for reductions:
<CASENAME>PC_,<CASENAME>UMAP_ - •Case name becomes prefix for cluster columns:
<CASENAME>.<resolution> - •Case name becomes final cluster column:
<CASENAME> - •Non-alphanumeric characters in case names are removed
Metadata Output:
- •Each case adds new metadata columns to original Seurat object
- •Reductions saved:
<CASENAME>.pc,<CASENAME>.umap - •Clusters saved:
<CASENAME>.<resolution>for each resolution - •Final clusters:
<CASENAME>column
RunPCA Parameters
[SeuratSubClustering.envs.RunPCA] # See https://satijalab.org/seurat/reference/runpca # object specified internally as subset object npcs = 30 # int; Number of PCs to compute
RunUMAP Parameters
[SeuratSubClustering.envs.RunUMAP]
# See https://satijalab.org/seurat/reference/runumap
# object specified internally as subset object
# dims=N expanded to dims=1:N (min(N, ncol-1))
dims = 30 # int; Number of PCs to use
# Use specific features instead of dimensions
# Can be list: {order = "desc(abs(avg_log2FC))", n = 30}
# Or numeric (treated as n with default order)
features = 30 # int or list; Top markers for UMAP
# Reduction to use for UMAP
reduction = "pca" # str; Uses sobj@misc$integrated_new_reduction if omitted
n.neighbors = 30 # int; Neighborhood size
min.dist = 0.3 # float; Cluster tightness (0.001-0.5)
spread = 1 # float; Embedding scale
seed.use = 42 # int; Random seed
FindNeighbors Parameters
[SeuratSubClustering.envs.FindNeighbors] # See https://satijalab.org/seurat/reference/findneighbors # object specified internally reduction = "pca" # str; Uses sobj@misc$integrated_new_reduction if omitted dims = 30 # int; Dimensions to use k.param = 20 # int; K-nearest neighbors prune.SNN = 0.067 # float; SNN pruning threshold (default: 1/15) nn.method = "annoy" # str; "annoy" or "rann"
FindClusters Parameters
[SeuratSubClustering.envs.FindClusters] # See https://satijalab.org/seurat/reference/findclusters # object specified internally # Resolution: Higher = more clusters, Lower = fewer clusters # Multiple resolutions supported: [0.4, 0.6, 0.8, 1.0] # Range syntax: "0.1:0.5:0.1" -> [0.1, 0.2, 0.3, 0.4, 0.5] resolution = 0.8 # float or list; Default: 0.8 # Cluster labels prefixed with "s" (s1, s2, ...) instead of (s0, s1, ...) algorithm = 1 # int; 1=Louvain, 4=Leiden (recommended) graph.name = "pca_snn" # str; Must match FindNeighbors SNN graph random.seed = 0 # int; Reproducibility
Multi-resolution Output:
- •Multiple resolutions create columns:
<CASENAME>_0.4,<CASENAME>_0.6,<CASENAME>_0.8,<CASENAME> - •Final resolution uses last value in list
External References
Seurat Functions
- •RunPCA(): https://satijalab.org/seurat/reference/runpca
- •Principal component analysis on subset of cells
- •RunUMAP(): https://satijalab.org/seurat/reference/runumap
- •Non-linear dimensionality reduction for visualization
- •FindNeighbors(): https://satijalab.org/seurat/reference/findneighbors
- •K-nearest neighbor graph construction
- •FindClusters(): https://satijalab.org/seurat/reference/findclusters
- •Community detection (Louvain/Leiden algorithms)
tidyseurat::filter()
https://stemangiola.github.io/tidyseurat/reference/filter.html
- •Subset Seurat objects using dplyr-like filter syntax
- •Supports logical expressions:
seurat_clusters == 'c3',celltype %in% c('CD4', 'CD8') - •Can use any metadata column created by
mutaters
Configuration Examples
Minimal Configuration (Default Case)
[SeuratSubClustering] [SeuratSubClustering.in] srtobj = ["SeuratClustering"]
Result: Creates default case "subcluster" with all cells
Single Cluster Sub-clustering
[SeuratSubClustering] [SeuratSubClustering.envs] subset = "seurat_clusters == 'c3'"
Result: Re-clusters only cells in cluster c3
Metadata-Based Sub-clustering (Cell Type)
[SeuratSubClustering]
[SeuratSubClustering.envs]
# First add cell type annotation via mutaters
mutaters = {is_cd4 = "if_else(celltype == 'CD4 T cell', TRUE, FALSE)"}
[SeuratSubClustering.envs.RunPCA]
npcs = 50
[SeuratSubClustering.envs.FindClusters]
resolution = 1.2
algorithm = 4 # Leiden
Result: Creates subcluster case for CD4+ cells only
Multiple Sub-clustering Cases
[SeuratSubClustering] [SeuratSubClustering.envs] # Define multiple sub-clustering cases [SeuratSubClustering.envs.cases.TEffector] subset = "celltype == 'CD8 T cell' & state == 'Effector'" resolution = 1.0 [SeuratSubClustering.envs.cases.TNaive] subset = "celltype == 'CD8 T cell' & state == 'Naive'" resolution = 0.8 [SeuratSubClustering.envs.cases.CD4Memory] subset = "celltype == 'CD4 T cell' & state == 'Memory'" resolution = 1.5
Result: Three sub-clustering analyses with different resolutions
- •Metadata columns:
TEffector,TNaive,CD4Memory - •Reductions:
TEFFECTORPC_,TNAIVEPC_,CD4MEMORYPC_, etc. - •Clusters:
TEffector,TNaive,CD4Memory
Multi-resolution Sub-clustering
[SeuratSubClustering] [SeuratSubClustering.envs.cases.Cluster3] subset = "seurat_clusters == 'c3'" [SeuratSubClustering.envs.cases.Cluster3.FindClusters] # Test multiple resolutions resolution = "0.4:1.2:0.2" # [0.4, 0.6, 0.8, 1.0, 1.2] algorithm = 4 # Leiden
Result: Cluster3 has columns Cluster3_0.4, Cluster3_0.6, Cluster3_0.8, Cluster3_1.0, Cluster3
Using Top Markers for UMAP
[SeuratSubClustering]
[SeuratSubClustering.envs.cases.MixedCluster]
subset = "seurat_clusters == 'c5'"
[SeuratSubClustering.envs.cases.MixedCluster.RunUMAP]
# Use top 30 DEGs for UMAP instead of PCs
features = {order = "desc(abs(avg_log2FC))", n = 30}
Result: Sub-cluster based on top DEGs for better separation
Leiden Algorithm with Custom Parameters
[SeuratSubClustering] [SeuratSubClustering.envs] ncores = 4 [SeuratSubClustering.envs.FindNeighbors] k.param = 30 prune.SNN = 0.05 [SeuratSubClustering.envs.FindClusters] algorithm = 4 # Leiden resolution = 1.0 random.seed = 42
Complex Subset with Multiple Conditions
[SeuratSubClustering]
[SeuratSubClustering.envs.cases.ActivatedT]
subset = "celltype %in% c('CD4 T cell', 'CD8 T cell') & activation == 'Activated'"
[SeuratSubClustering.envs.cases.ActivatedT.RunPCA]
npcs = 40
[SeuratSubClustering.envs.cases.ActivatedT.RunUMAP]
dims = 40
n.neighbors = 20
min.dist = 0.2
Common Patterns
Pattern 1: Single Cluster Deep Dive
[SeuratSubClustering] [SeuratSubClustering.envs] # Re-cluster cluster 3 to resolve heterogeneity subset = "seurat_clusters == 'c3'"
Pattern 2: Multiple Lineage Sub-clustering
[SeuratSubClustering] [SeuratSubClustering.envs.cases.TCD4] subset = "celltype == 'CD4 T cell'" [SeuratSubClustering.envs.cases.TCD8] subset = "celltype == 'CD8 T cell'" [SeuratSubClustering.envs.cases.TGD] subset = "celltype == 'Gamma delta T cell'" [SeuratSubClustering.envs.cases.NK] subset = "celltype == 'NK cell'"
Pattern 3: Functional State Sub-clustering
[SeuratSubClustering] [SeuratSubClustering.envs.cases.Effector] subset = "state == 'Effector'" [SeuratSubClustering.envs.cases.Effector.FindClusters] resolution = 1.5 # Higher resolution for more sub-states [SeuratSubClustering.envs.cases.Memory] subset = "state == 'Memory'" [SeuratSubClustering.envs.cases.Naive] subset = "state == 'Naive'"
Pattern 4: Re-clustering Based on Clonality (TCR+)
[SeuratSubClustering] [SeuratSubClustering.envs] # After ScRepCombiningExpression adds clonality metadata [SeuratSubClustering.envs.cases.ExpandedClones] subset = "clone_size >= 5" # Large clones [SeuratSubClustering.envs.cases.ExpandedClones.FindClusters] resolution = 0.6 # Lower resolution for broader groups [SeuratSubClustering.envs.cases.RareClones] subset = "clone_size == 1" # Unique clones [SeuratSubClustering.envs.cases.RareClones.FindClusters] resolution = 1.2 # Higher resolution to capture diversity
Pattern 5: Multi-resolution Exploration
[SeuratSubClustering] [SeuratSubClustering.envs.cases.TumorCluster] subset = "seurat_clusters == 'c8'" [SeuratSubClustering.envs.cases.TumorCluster.FindClusters] resolution = "0.2:2.0:0.2" # Sweep: [0.2, 0.4, ..., 2.0] algorithm = 4 # Leiden
Dependencies
Upstream Processes
- •Required:
SeuratClustering(orSeuratClusteringOfAllCellsif TOrBCellSelection used) - •Optional:
ScRepCombiningExpression(if TCR/BCR data present, adds clonality metadata for subsetting) - •Optional:
CellTypeAnnotation(if using annotated cell types for subsetting)
Downstream Processes
- •SeuratClusterStats: Statistics for sub-clusters
- •ClusterMarkers: Differential expression between sub-clusters
- •MarkersFinder: Flexible marker finding with enrichment analysis
- •ScFGSEA: Pathway analysis on sub-cluster markers
- •ModuleScoreCalculator: Module scoring within sub-clusters
Validation Rules
Subset Expression Validation
- •Must be valid dplyr::filter() expression
- •Can reference any metadata column in Seurat object
- •Complex expressions supported:
&(AND),|(OR),%in%(in operator) - •Example:
seurat_clusters == 'c3' & percent.mt < 5 - •Example:
celltype %in% c('CD4 T cell', 'CD8 T cell')
Case Name Validation
- •Must contain only alphanumeric characters
- •Non-alphanumeric characters automatically removed
- •Used as prefix: reductions and cluster names
- •Avoid spaces, special characters in case names
Resolution Constraints
- •Must be positive (resolution > 0)
- •Single value, list, or range syntax allowed
- •Range:
"start:end:step"(step defaults to 0.1 if omitted) - •Multi-resolution creates multiple metadata columns
Dimension Requirements
- •
RunPCA.npcsmust not exceed cells in subset - •
RunUMAP.dimsautomatically truncated tomin(dims, ncol(reduction) - 1) - •Use fewer dimensions for small subsets (< 100 cells)
Graph Name Consistency
- •
FindClusters.graph.namemust matchFindNeighborsoutput - •Default:
pca_snnwhen not specified - •When using integrated reductions, ensure consistency
Troubleshooting
Issue: Subset Returns Zero Cells
Symptoms: Sub-clustering produces empty subset
Solutions:
- •Verify subset expression syntax
# Check if column exists and values are correct [SeuratSubClustering.envs] # Use single quotes for string comparison subset = "seurat_clusters == 'c3'" # Correct subset = "seurat_clusters == c3" # Wrong (treated as variable)
- •Verify column names exist in metadata
# Use existing columns only subset = "seurat_clusters == 'c3'" # seurat_clusters exists subset = "cluster_id == 'c3'" # cluster_id may not exist
- •Check for exact string matching
# Case-sensitive subset = "celltype == 'CD4 T cell'" # Exact match subset = "celltype == 'CD4 T Cell'" # Wrong case
Issue: Too Many Small Sub-clusters
Symptoms: Hundreds of tiny sub-clusters, many singletons
Solutions:
[SeuratSubClustering.envs.FindClusters] resolution = 0.4 # Lower resolution algorithm = 4 # Leiden handles singletons better
Issue: Sub-clusters Overlapping in UMAP
Symptoms: Poor separation in sub-cluster visualization
Solutions:
[SeuratSubClustering.envs.RunUMAP] min.dist = 0.1 # Tighter clusters n.neighbors = 15 # More local detail spread = 1.2 # More separation
Issue: Sub-clustering Uses Wrong Reduction
Symptoms: Clustering on raw RNA instead of integrated data
Solutions:
[SeuratSubClustering.envs.FindNeighbors] reduction = "integrated.cca" # Use integrated reduction [SeuratSubClustering.envs.RunUMAP] reduction = "integrated.cca"
Issue: Multi-resolution Columns Not Created
Symptoms: Only final resolution column appears
Solutions:
[SeuratSubClustering.envs.FindClusters] # Use list syntax (not single value with range) resolution = [0.4, 0.6, 0.8, 1.0] # Correct resolution = "0.4:1.0:0.2" # Also correct
Issue: Case Names Too Similar
Symptoms: Confusion between multiple cases
Solutions:
# Use descriptive, unique case names
[SeuratSubClustering.envs.cases]
T_CD4_Effector = {subset = "..."}
T_CD4_Naive = {subset = "..."}
B_Memory = {subset = "..."}
Issue: Sub-clustering on All Cells (Not Subset)
Symptoms: Default case runs on entire object
Solutions:
# Always specify subset or use cases [SeuratSubClustering.envs] subset = "seurat_clusters == 'c3'" # Explicit subset # Or define specific cases [SeuratSubClustering.envs.cases.MyCase] subset = "seurat_clusters == 'c3'"
Issue: Reductions Not Saved
Symptoms: Cannot find <CASENAME>PC_ or <CASENAME>UMAP_
Solutions:
# Ensure case name is alphanumeric only
[SeuratSubClustering.envs.cases]
MySubCluster1 = {subset = "..."} # Correct
Sub-Cluster = {subset = "..."} # Hyphen removed -> SubCluster
# Check metadata for actual reduction names
# Reductions are: <CASENAME>pc, <CASENAME>umap (lowercase)
Best Practices
- •Define explicit subsets: Always specify
subsetor definecasesto avoid default case on all cells - •Use descriptive case names: Make case names clear and unique (e.g.,
T_Effector, notcase1) - •Test multiple resolutions: Sweep resolution range to find optimal granularity for each subset
- •Use Leiden algorithm: Prefer
algorithm = 4for better community detection - •Leverage metadata columns: Use CellTypeAnnotation results, TCR clonality, or custom mutaters for subsetting
- •Set random seeds: Ensure reproducible sub-clustering results with
random.seed - •Parallelize large subsets: Use
ncores > 1for subsets > 10k cells - •Adjust UMAP parameters: Smaller subsets may need different
n.neighborsandmin.dist - •Document sub-clustering strategy: Comment on biological rationale for each case in config
- •Use multi-resolution: Test
[0.4, 0.6, 0.8, 1.0]to capture different granularities
Related Processes
- •SeuratClustering: Initial clustering before sub-clustering
- •SeuratClusteringOfAllCells: Clustering before T/B cell selection
- •CellTypeAnnotation: Annotate clusters before sub-clustering by cell type
- •ClusterMarkers: Find markers for sub-clusters
- •MarkersFinder: Flexible marker finding with multiple comparison groups