ScRepLoading Process Configuration
Purpose
Load single-cell TCR-seq or scBCR-seq data from various formats into a scRepertoire-compatible object. This process reads VDJ (variable, diversity, joining) receptor contig data from multiple single-cell sequencing platforms and prepares it for integration with scRNA-seq data.
When to Use
- •When analyzing scTCR-seq or scBCR-seq data alongside scRNA-seq
- •Required for TCR/BCR clonotype analysis (CDR3 clustering, clone expansion, TESSA analysis)
- •Enables integration of immune receptor information with single-cell expression data
- •Supports multiple sequencing platforms: 10x Genomics, AIRR, BD, Dandelion, Immcantation, MiXCR, ParseBio, TRUST4, WAT3R, Omniscope
Important: This process is automatically enabled when your sample info file contains TCRData or BCRData columns.
Configuration Structure
Process Enablement
[ScRepLoading] cache = true # Enable caching (default: true)
Input Specification
[ScRepLoading.in] # Type: file # Required: yes # Description: Sample metadata file (tab-delimited) with TCR/BCR data paths metafile = "path/to/sample_info.txt"
Required input file columns:
- •
Sample: Unique identifier for each sample (required) - •
TCRData(for TCR analysis): Directory path to scTCR-seq data - •
BCRData(for BCR analysis): Directory path to scBCR-seq data - •Additional columns: Treated as sample metadata (optional)
Data format requirements:
- •10x Genomics: Directory containing
filtered_contig_annotations.csvorall_contig_annotations.csv - •AIRR format: Directory containing
airr_rearrangement.tsv - •BD platform: Directory containing
Contigs_AIRR.tsv - •Dandelion: Directory containing
all_contig_dandelion.tsv - •Immcantation: Directory containing
_data.tsvor similar - •JSON: File with
.jsonextension - •MiXCR: Directory containing
clones.tsv - •ParseBio: Directory containing
barcode_report.tsv - •TRUST4: Directory containing
barcode_report.tsv - •WAT3R: Directory containing
barcode_results.csv - •Omniscope: Directory containing
.csvfiles
Path handling:
- •If
TCRData/BCRDataspecifies a directory: Process usesscRepertoire::loadContigs()directly - •If
TCRData/BCRDataspecifies a file: Creates symbolic link to temp directory for processing - •When filename is not recognized by scRepertoire: Set
envs.formatexplicitly
Environment Variables
[ScRepLoading.envs]
# type: choice - Data type to load (default: "auto")
# Options:
# "TCR" - T cell receptor data
# "BCR" - B cell receptor data
# "auto" - Auto-detect from column names in sample info
# Note: If both TCRData and BCRData present, TCR selected by default
type = "auto"
# format: choice - Format of TCR/BCR data files (optional)
# Options: auto, 10X, AIRR, BD, Dandelion, Immcantation,
# JSON, MiXCR, Omniscope, ParseBio, TRUST4, WAT3R
# If not provided, scRepertoire guesses from filename
format = "auto"
# combineTCR: json - Extra arguments for scRepertoire::combineTCR()
# See: https://rdrr.io/github/ncborcherding/scRepertoire/man/combinetcr
combineTCR = {"samples": true}
# combineBCR: json - Extra arguments for scRepertoire::combineBCR()
# See: https://rdrr.io/github/ncborcherding/scRepertoire/man/combinebcr
combineBCR = {"samples": true}
# exclude: auto or list - Columns to exclude from metadata (default: auto)
# auto = ["BCRData", "TCRData", "RNAData"]
# Can also be comma-separated string: "BCRData,TCRData,RNAData"
exclude = "auto"
# tmpdir: str - Temporary directory for symbolic links (default: "/tmp")
tmpdir = "/tmp"
Detailed combineTCR Parameters
[ScRepLoading.envs.combineTCR] # samples: bool or list - Sample labels (default: true) # true = use Sample column from metadata # false = no sample grouping # list = explicit sample labels samples = true # ID: str or null - Additional sample labeling (optional) # Adds prefix to barcodes to prevent duplicate issues ID = null # removeNA: bool - Remove cells with missing chain values (default: false) # true = filter out cells with NA in any chain # false = include cells with 1 NA value (default) removeNA = false # removeMulti: bool - Remove cells with >2 chains (default: false) # true = filter out multi-chain cells (>2 chains) # false = include multi-chain cells (default) removeMulti = false # filterMulti: bool - Select highest-expression chain for multi-chain (TCR default: false) # true = keep highest UMI count chain if multiple chains present # false = keep all chains (default) filterMulti = false # filterNonproductive: bool - Remove non-productive rearrangements (default: true) # true = filter out non-functional receptors # false = include all rearrangements filterNonproductive = true
Detailed combineBCR Parameters
[ScRepLoading.envs.combineBCR] # samples: bool or list - Sample labels (default: true) samples = true # ID: str or null - Additional sample labeling (optional) ID = null # call.related.clones: bool - Cluster related BCR clones (default: true) # Uses nucleotide sequence + V gene with Levenshtein distance # false = uses V gene + amino acid sequence for CTstrict call.related.clones = true # threshold: num - Normalized edit distance for clustering (default: 0.85) # Higher = more permissive clustering (more sequences grouped) # Range: 0.0 - 1.0 threshold = 0.85 # removeNA: bool - Remove cells with missing chain values (default: false) removeNA = false # removeMulti: bool - Remove cells with >2 chains (default: false) removeMulti = false # filterMulti: bool - Select highest-expression chain (default: true) # true = keep highest UMI count chain # false = keep all chains filterMulti = true # filterNonproductive: bool - Remove non-productive rearrangements (default: true) filterNonproductive = true
Configuration Examples
Minimal Configuration (10x TCR Data)
[SampleInfo.in] infile = "sample_info.txt" # Sample info file contents: # Sample Age Sex Diagnosis RNAData TCRData # C1 62 F Colitis /data/C1/rna /data/C1/tcr # C2 71 F Colitis /data/C2/rna /data/C2/tcr # ScRepLoading auto-enables when TCRData column present # No explicit ScRepLoading section needed
Single Sample with Format Specification
[ScRepLoading] cache = true [ScRepLoading.in] metafile = "metadata/single_sample.txt" [ScRepLoading.envs] type = "TCR" format = "10X" [ScRepLoading.envs.combineTCR] removeNA = true filterNonproductive = true
Multi-Sample BCR Analysis with Clustering
[ScRepLoading] cache = true [ScRepLoading.in] metafile = "metadata/bcr_samples.txt" [ScRepLoading.envs] type = "BCR" [ScRepLoading.envs.combineBCR] call.related.clones = true threshold = 0.85 # Higher threshold for more permissive clustering filterMulti = true removeMulti = false
Non-10x Format (AIRR)
[ScRepLoading] [ScRepLoading.in] metafile = "metadata/airr_samples.txt" [ScRepLoading.envs] format = "AIRR" type = "auto" [ScRepLoading.envs.combineTCR] removeNA = false removeMulti = false
TRUST4 Format
[ScRepLoading] [ScRepLoading.in] metafile = "metadata/trust4_samples.txt" [ScRepLoading.envs] format = "TRUST4" [ScRepLoading.envs.combineTCR] removeNA = true filterNonproductive = true
Common Patterns
Pattern 1: 10x Genomics TCR Data (Most Common)
# sample_info.txt # Sample RNAData TCRData # Sample1 /data/Sample1/rna /data/Sample1/vdj # Sample2 /data/Sample2/rna /data/Sample2/vdj [SampleInfo.in] infile = "sample_info.txt" # TCR directories must contain filtered_contig_annotations.csv # No ScRepLoading configuration needed - auto-detected
Pattern 2: Both TCR and BCR Data (Auto-Detect TCR)
# sample_info.txt # Sample RNAData TCRData BCRData # Sample1 /data/Sample1/rna /data/Sample1/tcr /data/Sample1/bcr [SampleInfo.in] infile = "sample_info.txt" # TCR selected by default when both columns present # To explicitly analyze BCR instead: [ScRepLoading.envs] type = "BCR"
Pattern 3: Filtered TCR Data (Remove NA and Multi-Chain)
[ScRepLoading] [ScRepLoading.in] metafile = "metadata/tcr_filtered.txt" [ScRepLoading.envs.combineTCR] removeNA = true # Remove cells with missing chains removeMulti = true # Remove cells with >2 chains filterNonproductive = true # Remove non-functional receptors
Pattern 4: Relaxed Filtering for Exploratory Analysis
[ScRepLoading] [ScRepLoading.in] metafile = "metadata/tcr_exploratory.txt" [ScRepLoading.envs.combineTCR] removeNA = false # Keep cells with single chain removeMulti = false # Include multi-chain cells for inspection filterNonproductive = false # Include non-productive rearrangements
Pattern 5: BCR Clone Clustering with Custom Threshold
[ScRepLoading] [ScRepLoading.in] metafile = "metadata/bcr_clustering.txt" [ScRepLoading.envs.combineBCR] call.related.clones = true threshold = 0.90 # More stringent clustering (lower = more permissive)
Pattern 6: Sample-Specific Labeling
[ScRepLoading] [ScRepLoading.in] metafile = "metadata/longitudinal.txt" [ScRepLoading.envs.combineTCR] samples = true # Use Sample column from metadata ID = "Timepoint" # Add Timepoint as additional label prefix # Creates barcodes like: "Sample1_Timepoint1_AAACCC..." # Prevents duplicate barcode issues across timepoints
Pattern 7: Custom Metadata Exclusion
[ScRepLoading] [ScRepLoading.in] metafile = "metadata/custom_columns.txt" [ScRepLoading.envs] exclude = ["RNAData", "TCRData", "BCRData", "ExperimentID", "Batch"] # These columns excluded from scRepertoire object metadata # Helps reduce metadata clutter in downstream analysis
Pattern 8: Paired Chain Analysis (TRA+TRB for TCR)
# Default behavior - ScRepLoading automatically pairs chains # at cell barcode level when both TRA and TRB present [ScRepLoading] [ScRepLoading.in] metafile = "metadata/tcr_paired.txt" [ScRepLoading.envs.combineTCR] removeNA = false # Keep single-chain cells for inspection filterMulti = false # Don't filter multi-chain cells # Later analysis can filter for true paired chains # Using downstream processes like CDR3Clustering
Dependencies
Upstream Processes
- •SampleInfo (required): Provides sample metadata with
TCRData/BCRDatacolumns - •LoadingRNAFromSeurat (alternative): When loading RNA from Seurat instead of SampleInfo
Downstream Processes
- •ScRepCombiningExpression: Integrates TCR/BCR data with scRNA-seq expression
- •CDR3Clustering: Clones cells by CDR3 sequence similarity
- •TESSA: TCR-specific analysis (epitope specificity prediction)
- •CDR3AAPhyschem: Physicochemical properties of CDR3 sequences
- •ClonalStats: Clonality statistics and diversity metrics
Validation Rules
Common Configuration Errors
- •
Missing TCRData/BCRData column:
- •Error: Process not enabled, no TCR/BCR analysis
- •Fix: Add
TCRDataorBCRDatacolumn to sample info file
- •
Invalid format specified:
- •Error: scRepertoire fails to recognize file format
- •Fix: Set
envs.formatto one of:10X,AIRR,BD,Dandelion,Immcantation,JSON,MiXCR,ParseBio,TRUST4,WAT3R,Omniscope
- •
Directory path not found:
- •Error: Cannot access TCR/BCR data directory
- •Fix: Verify paths in
TCRData/BCRDatacolumns exist and are readable
- •
Missing required files in directory:
- •Error: Expected contig file not found (e.g.,
filtered_contig_annotations.csv) - •Fix: Ensure directory contains appropriate file for specified format
- •Error: Expected contig file not found (e.g.,
- •
Both TCR and BCR specified without type selection:
- •Warning: TCR selected by default
- •Fix: Set
envs.type = "BCR"if BCR analysis intended
File Format Requirements
- •10x Genomics: Must have
filtered_contig_annotations.csvin directory - •AIRR: Must have
airr_rearrangement.tsvin directory - •BD: Must have
Contigs_AIRR.tsvin directory - •Dandelion: Must have
all_contig_dandelion.tsvin directory - •MiXCR: Must have
clones.tsvin directory - •TRUST4: Must have
barcode_report.tsvin directory - •ParseBio: Must have
barcode_report.tsvin directory - •WAT3R: Must have
barcode_results.csvin directory
Chain Compatibility
- •TCR chains: Supports TRA, TRB, TRG, TRD (auto-detected from data)
- •BCR chains: Supports IGH, IGL, IGK (auto-detected from data)
- •Paired analysis: Automatically pairs TRA+TRB or IGH+IGL/IGK when both present
- •Single-chain: Keeps single-chain cells when
removeNA = false
Troubleshooting
Issue: ScRepLoading not running
Cause: No TCRData or BCRData column in sample info file
Solution:
- •Add
TCRDataorBCRDatacolumn to sample info - •Verify column name exactly matches (case-sensitive)
- •Check that
SampleInfo.in.infileis correctly specified
Issue: "File format not recognized"
Cause: Filename doesn't match expected pattern for auto-detection Solution:
- •Set
envs.formatexplicitly to your format type - •Example:
format = "TRUST4"for TRUST4 output - •Verify directory contains expected file for that format
Issue: "No cells loaded" or empty output
Cause: Too aggressive filtering or mismatched barcodes Solution:
- •Set
removeNA = falseandremoveMulti = falsetemporarily - •Check that TCR/BCR barcodes match RNA barcodes
- •Verify filterMulti is appropriate for your data type
Issue: Duplicate barcode errors
Cause: Multiple samples have identical cell barcodes Solution:
- •Set
ID = "Sample"or use explicit sample labels - •This adds sample prefix to barcodes:
Sample1_AAACCC... - •Required when merging samples from same run
Issue: BCR clustering too strict/too permissive
Cause: Default threshold (0.85) not optimal for data Solution:
- •Adjust
envs.combineBCR.threshold - •Higher (0.90+): More stringent, fewer clusters
- •Lower (0.80-): More permissive, more sequences clustered together
Issue: Single-chain cells lost
Cause: filterNonproductive = true or removeNA = true
Solution:
- •For exploratory analysis, set
removeNA = false - •For developmental studies, consider
filterNonproductive = false - •Use
filterMulti = trueonly when confident in data quality
Issue: Metadata columns missing from output
Cause: Excluded by default (exclude = "auto")
Solution:
- •Set
exclude = []to keep all metadata columns - •Or specify custom list:
exclude = ["RNAData"] - •Default excludes:
RNAData,TCRData,BCRData
Issue: Cannot load from specific directory path
Cause: Path not accessible or permission issues Solution:
- •Verify directory exists and is readable
- •Check file permissions:
ls -la path/to/tcr/ - •Use absolute paths if relative paths fail
Issue: Combining TCR and BCR data separately
Cause: Need to analyze both receptor types Solution:
- •Run pipeline twice with different
typesettings - •First run:
[ScRepLoading.envs] type = "TCR" - •Second run:
[ScRepLoading.envs] type = "BCR" - •Use different output directories to avoid conflicts
Issue: Integration with ScRepCombiningExpression fails
Cause: Barcodes don't match between RNA and VDJ data Solution:
- •Ensure same samples used in both RNA and VDJ data
- •Check that SampleInfo has correct paths for both data types
- •Verify barcode prefixes match (if using
IDparameter)