TOrBCellSelection Process Configuration
Purpose
Separates T and non-T cells or B and non-B cells from a mixed cell population. Uses either clonotype percentage from VDJ data, indicator gene expression (CD3 markers for T cells, CD19/CD20 for B cells), custom selector expressions, or k-means clustering for automatic selection.
When to Use
- •When dataset contains mixed cell types (T cells + other cell types, or B cells + other cell types)
- •Before TCR-specific or BCR-specific analysis to isolate relevant cells
- •After
SeuratClusteringOfAllCellsto identify which clusters are T/B cells - •When scRNA-seq data includes scTCR-seq or scBCR-seq data
- •DO NOT use if all cells in your dataset are already T/B cells
Configuration Structure
Process Enablement
[TOrBCellSelection] cache = true # Enable caching for this process
Input Specification
[TOrBCellSelection.in] # Seurat object file (RDS/qs2 format) from SeuratClusteringOfAllCells srtobj = ["SeuratClusteringOfAllCells"] # Optional: Immune repertoire data file (RDS/qs2 format) from ScRepLoading # Required unless ignore_vdj is set to true immdata = ["ScRepLoading"]
Environment Variables
[TOrBCellSelection.envs]
# Whether to ignore VDJ information and use only marker gene expression
ignore_vdj = false
# Custom R expression to identify T/B cells
# Example: "Clonotype_Pct > 0.25" selects cells with >25% clonotype percentage
# Can use indicator genes: "Clonotype_Pct > 0.25 & CD3E > 0"
# If not provided, k-means clustering will be used
selector = null
# List of indicator genes for T/B cell identification
# For T cells: ["CD3E", "CD3D", "CD3G"] (positive markers)
# or include negative markers: ["CD3E", "CD19", "CD14"]
# For B cells: ["CD19", "MS4A1", "CD79A", "CD79B"]
indicator_genes = ["CD3E"]
# Parameters for k-means clustering (if selector not provided)
# Reference: https://rdrr.io/r/stats/kmeans.html
# Note: dots in argument names should be replaced with hyphens
kmeans = {"nstart": 25}
Configuration Examples
Minimal Configuration (Default T Cell Markers)
[TOrBCellSelection] [TOrBCellSelection.in] srtobj = ["SeuratClusteringOfAllCells"]
What this does: Uses default CD3E marker + k-means clustering with VDJ data to automatically select T cell clusters.
T Cell Selection with Multiple CD3 Markers
[TOrBCellSelection] [TOrBCellSelection.in] srtobj = ["SeuratClusteringOfAllCells"] immdata = ["ScRepLoading"] [TOrBCellSelection.envs] # Use all three CD3 markers for robust T cell identification indicator_genes = ["CD3E", "CD3D", "CD3G"]
B Cell Selection (Default Markers)
[TOrBCellSelection] [TOrBCellSelection.in] srtobj = ["SeuratClusteringOfAllCells"] immdata = ["ScRepLoading"] [TOrBCellSelection.envs] # Select B cells using CD19 and CD20 (MS4A1) markers indicator_genes = ["CD19", "MS4A1"]
Selection by Clonotype Percentage Threshold
[TOrBCellSelection] [TOrBCellSelection.in] srtobj = ["SeuratClusteringOfAllCells"] immdata = ["ScRepLoading"] [TOrBCellSelection.envs] # Select cells/clusters with >25% clonotype percentage as T/B cells selector = "Clonotype_Pct > 0.25"
Selection Combined with Marker Expression
[TOrBCellSelection] [TOrBCellSelection.in] srtobj = ["SeuratClusteringOfAllCells"] immdata = ["ScRepLoading"] [TOrBCellSelection.envs] # Select cells with high clonotype percentage AND CD3E expression indicator_genes = ["CD3E"] selector = "Clonotype_Pct > 0.25 & CD3E > 0"
Selection Without VDJ Data (Markers Only)
[TOrBCellSelection] [TOrBCellSelection.in] srtobj = ["SeuratClusteringOfAllCells"] [TOrBCellSelection.envs] # Ignore VDJ data, use only marker gene expression ignore_vdj = true # Need at least 2 markers for k-means when VDJ is ignored indicator_genes = ["CD3E", "CD3D", "CD3G"] # First gene must be a positive marker for selection # (CD3E is positive for T cells)
B Cell Selection Without VDJ Data
[TOrBCellSelection] [TOrBCellSelection.in] srtobj = ["SeuratClusteringOfAllCells"] [TOrBCellSelection.envs] # Select B cells using markers only (no VDJ data) ignore_vdj = true indicator_genes = ["CD19", "MS4A1", "CD79A"]
Custom K-means Parameters
[TOrBCellSelection]
[TOrBCellSelection.in]
srtobj = ["SeuratClusteringOfAllCells"]
immdata = ["ScRepLoading"]
[TOrBCellSelection.envs]
indicator_genes = ["CD3E", "CD3D", "CD3G"]
# Custom k-means parameters
# nstart: number of random starts for stability (default: 25)
# iter.max: maximum iterations (default: 10 in R)
# Note: hyphens instead of dots in key names
kmeans = {"nstart": 50, "iter-max": 20}
Common Patterns
Pattern 1: Standard T Cell Selection (with VDJ)
[TOrBCellSelection] [TOrBCellSelection.in] srtobj = ["SeuratClusteringOfAllCells"] immdata = ["ScRepLoading"] [TOrBCellSelection.envs] # Robust T cell selection using all three CD3 markers indicator_genes = ["CD3E", "CD3D", "CD3G"]
When to use: Typical TCR-seq analysis where T cells need to be separated from other cell types.
Pattern 2: Standard B Cell Selection (with VDJ)
[TOrBCellSelection] [TOrBCellSelection.in] srtobj = ["SeuratClusteringOfAllCells"] immdata = ["ScRepLoading"] [TOrBCellSelection.envs] # B cell selection using CD19 and CD20 markers indicator_genes = ["CD19", "MS4A1"]
When to use: BCR-seq analysis where B cells need to be separated from other cell types.
Pattern 3: High-Sensitivity T Cell Selection
[TOrBCellSelection] [TOrBCellSelection.in] srtobj = ["SeuratClusteringOfAllCells"] immdata = ["ScRepLoading"] [TOrBCellSelection.envs] # Lower threshold to capture more T cells selector = "Clonotype_Pct > 0.10 & CD3E > 0"
When to use: When you suspect low-quality VDJ data or want to capture borderline T cells.
Pattern 4: High-Specificity T Cell Selection
[TOrBCellSelection] [TOrBCellSelection.in] srtobj = ["SeuratClusteringOfAllCells"] immdata = ["ScRepLoading"] [TOrBCellSelection.envs] # Higher threshold for clean T cell population selector = "Clonotype_Pct > 0.50 & CD3E > 1"
When to use: When you want only the highest-confidence T cells (e.g., for clonal expansion analysis).
Pattern 5: Auto-Selection (K-means) with Multiple Markers
[TOrBCellSelection]
[TOrBCellSelection.in]
srtobj = ["SeuratClusteringOfAllCells"]
immdata = ["ScRepLoading"]
[TOrBCellSelection.envs]
# Let k-means determine T cell clusters automatically
# No selector = automatic selection
indicator_genes = ["CD3E", "CD3D", "CD3G"]
kmeans = {"nstart": 50}
When to use: When you don't have a specific threshold in mind and want automatic unsupervised selection.
Dependencies
Upstream Processes
- •SeuratClusteringOfAllCells: Provides clustered Seurat object with
seurat_clustersmetadata - •ScRepLoading: Provides VDJ data with clonotype information (unless
ignore_vdj = true)
Downstream Processes
- •SeuratClustering: Clusters the selected T/B cells for downstream analysis
- •ScRepCombiningExpression: Combines selected cells with VDJ data
- •ModuleScoreCalculator: Calculates module scores on selected cells
- •Other TCR/BCR-specific processes (CDR3Clustering, TESSA, ClonalStats, etc.)
Workflow Integration
SeuratPreparing → SeuratClusteringOfAllCells → TOrBCellSelection → SeuratClustering → (downstream TCR/BCR analysis)
↑
ScRepLoading
Selection Methods Explained
Method 1: K-means Clustering (Default)
When selector is not provided, TOrBCellSelection performs:
- •Calculates average expression of indicator genes per cluster
- •If VDJ data available: calculates clonotype percentage per cluster
- •Performs k-means clustering (K=2) on [gene expressions + clonotype_pct]
- •Selects cluster with higher clonotype percentage (or higher expression of first indicator gene if no VDJ)
Pros: Automatic, unsupervised, adapts to data Cons: May select unexpected clusters if data is noisy
Method 2: Custom Selector Expression
Provide a custom R expression via selector:
- •Can use any metadata column:
Clonotype_Pct > 0.25 - •Can combine with gene expression:
Clonotype_Pct > 0.25 & CD3E > 0 - •Can use complex logic:
(Clonotype_Pct > 0.25 | CD3E > 1) & CD19 < 0.1
Pros: Full control, transparent selection criteria Cons: Requires domain knowledge, need to test thresholds
Method 3: Marker-Only Selection (ignore_vdj)
Set ignore_vdj = true to use only marker genes:
- •Useful when VDJ data is poor or missing
- •Requires at least 2 indicator genes for k-means
- •First gene in list must be positive marker for the target cell type
Pros: Works without VDJ data, robust marker-based selection Cons: Requires good marker genes, may include non-clonal cells
Marker Gene Recommendations
T Cell Markers
Positive markers (expressed in T cells):
- •
CD3E: Core CD3 epsilon chain (most reliable) - •
CD3D: Core CD3 delta chain - •
CD3G: Core CD3 gamma chain
Negative markers (excluded from T cells):
- •
CD19: B cell marker - •
MS4A1(CD20): B cell marker - •
CD14: Monocyte marker - •
CD68: Macrophage marker
Recommended for T cells:
indicator_genes = ["CD3E", "CD3D", "CD3G"]
B Cell Markers
Positive markers (expressed in B cells):
- •
CD19: Pan-B cell marker (most reliable) - •
MS4A1(CD20): Mature B cell marker - •
CD79A: B cell receptor component - •
CD79B: B cell receptor component
Recommended for B cells:
indicator_genes = ["CD19", "MS4A1"]
Subtype-Specific Markers
For selecting specific T/B cell subtypes:
- •T helper cells:
CD4 - •Cytotoxic T cells:
CD8A,CD8B - •Regulatory T cells:
FOXP3,IL2RA - •Memory B cells:
CD27 - •Plasma cells:
CD38,SDC1(CD138)
Validation Rules
Required Inputs
- •
srtobjmust be specified (from SeuratClusteringOfAllCells) - •
immdatarequired unlessignore_vdj = true
Marker Gene Validation
- •Must provide at least 1 indicator gene
- •If
ignore_vdj = true, must provide at least 2 indicator genes - •First gene in
indicator_genesmust be a positive marker when using k-means without VDJ data
Selector Expression Validation
- •
selectormust be a valid R expression - •Can reference: metadata columns (e.g.,
Clonotype_Pct), indicator genes (e.g.,CD3E) - •Use R logical operators:
&(and),|(or),!(not)
K-means Parameter Validation
- •
kmeansmust be a valid JSON object - •Valid keys:
nstart,iter-max,algorithm, etc. (seestats::kmeansdocumentation) - •Dots in R argument names replaced with hyphens (e.g.,
iter.max→iter-max)
Troubleshooting
Issue: "No clonotype information found"
Cause: Barcode mismatch between scRNA-seq and VDJ data Solution:
- •Check barcode formats match in both datasets
- •Verify
ScRepLoadingprocessed VDJ data correctly - •Try
ignore_vdj = trueto use marker genes only
Issue: "You need at least 2 markers to perform k-means clustering with VDJ data being ignored"
Cause: Using ignore_vdj = true with only 1 indicator gene
Solution: Add more indicator genes or use a custom selector
Issue: Selected cells are not what I expected
Cause: K-means selected wrong cluster Solution:
- •Check the k-means plot in
details/kmeans.png - •Adjust
indicator_genesto include more robust markers - •Use custom
selectorinstead of automatic selection - •Adjust
kmeans.nstartfor more stable clustering (e.g.,{"nstart": 50})
Issue: Too few or too many cells selected
Cause: Threshold too high or too low Solution:
- •Adjust
selectorthreshold (e.g.,Clonotype_Pct > 0.20vs0.30) - •Review the selection table in
details/data.txt - •Check scatter plots in
details/directory for gene vs clonotype relationships
Issue: All cells selected as T cells (or none selected)
Cause: Poor VDJ data or incorrect marker genes Solution:
- •Verify VDJ data quality in
ScRepLoadingoutput - •Check if
CD3Eis actually expressed in your data - •Use
ignore_vdj = truewith robust marker genes - •Manually inspect expression plots before running selection
Output Files
Primary Output
- •
outfile: Seurat object (qs2 format) containing only selected T/B cells- •Located at:
{{in.srtobj | stem}}.qs - •Contains all original metadata + subset of cells
- •Located at:
Detailed Output Directory (details/)
- •
data.txt: Table of indicator gene expression and clonotype percentage per cluster- •Shows: Cluster, indicator gene expression, Clonotype_Pct, Cluster_Size, is_selected
- •
kmeans.png: K-means clustering visualization (if k-means used) - •
selected_cells_per_sample.png: Bar plot of selected cells per sample - •
selected_cells_pie.png: Pie chart of selected vs other cells - •
selected-cells.png: Dimension plots showing VDJ data and selected cells - •
feature-plots.png: Feature plots of indicator genes
Report
Interactive HTML report with visualization of selection results and cell composition.
Common Use Cases
Use Case 1: TCR-seq Analysis of PBMC Data
# Standard TCR-seq workflow [SeuratClusteringOfAllCells] [TOrBCellSelection] [TOrBCellSelection.in] srtobj = ["SeuratClusteringOfAllCells"] immdata = ["ScRepLoading"] [TOrBCellSelection.envs] indicator_genes = ["CD3E", "CD3D", "CD3G"] [SeuratClustering] # Clustering of selected T cells [CDR3Clustering] [TESSA] [ClonalStats]
Use Case 2: BCR-seq Analysis of Tumor-Infiltrating Lymphocytes
[SeuratClusteringOfAllCells] [TOrBCellSelection] [TOrBCellSelection.in] srtobj = ["SeuratClusteringOfAllCells"] immdata = ["ScRepLoading"] [TOrBCellSelection.envs] # Select B cells from TILs indicator_genes = ["CD19", "MS4A1", "CD79A"] selector = "CD19 > 0.5" [SeuratClustering] [CDR3Clustering] [CellCellCommunication]
Use Case 3: RNA-only Data with T/B Cell Separation
[SeuratClusteringOfAllCells] [TOrBCellSelection] [TOrBCellSelection.in] srtobj = ["SeuratClusteringOfAllCells"] [TOrBCellSelection.envs] # No VDJ data, use markers only ignore_vdj = true indicator_genes = ["CD3E", "CD3D", "CD3G"] [SeuratClustering] [ScFGSEA] [CellCellCommunication]
Key Notes
- •
Not for Pure T/B Cell Populations: If all cells are already T or B cells, skip this process and use
SeuratClusteringdirectly. - •
Cluster-Level Selection: Selection happens at the cluster level, not single-cell level. All cells in selected clusters are kept.
- •
Normalization: Gene expression values are normalized (mean=0, SD=1) before k-means clustering.
- •
Marker First: When using k-means without VDJ data, the first indicator gene must be a positive marker for your target cell type.
- •
Report Review: Always review the HTML report and plots in
details/to verify selection quality. - •
Threshold Tuning: Start with default k-means, then adjust to custom
selectorif automatic selection is not satisfactory.