ClonalStats Process Configuration
Purpose
Generate comprehensive clonality statistics and diversity visualizations for TCR/BCR repertoire analysis. Quantifies clonal expansion, measures diversity metrics (Shannon, Simpson, Gini), and creates publication-ready plots.
When to Use
- •To quantify clonal expansion patterns in TCR/BCR data
- •For diversity analysis comparing multiple samples or conditions
- •To identify hyperexpanded clones and their distribution
- •For rarefaction analysis to assess sampling depth
- •After
ScRepCombiningExpressionto analyze integrated TCR+RNA data
Configuration Structure
Process Enablement
[ClonalStats] cache = true
Input Specification
[ClonalStats.in] screpfile = ["ScRepCombiningExpression"]
Core Environment Variables
[ClonalStats.envs]
# Clone definition: "gene" (VDJC), "aa" (CDR3 amino acid), "nt" (CDR3 nucleotide)
clone_call = "aa"
# Chain analysis: "both", "TRA", "TRB", "TRG", "IGH", "IGL"
chain = "both"
# Data transformations (dplyr::mutate syntax)
mutaters = {}
# Data filtering (dplyr::filter syntax)
subset = null
# Output device parameters
devpars = {width = 800, height = 600, res = 100}
# Save code and data (large files - use with caution)
save_code = false
save_data = false
Case-Based Plot Generation
[ClonalStats.envs.cases."Case Name"]
viz_type = "volume" # volume, abundance, length, residency, stat,
# composition, overlap, diversity, geneusage,
# positional, kmer, rarefaction
Diversity Metrics
| Metric | Range | Interpretation | Best For |
|---|---|---|---|
| shannon | 0 - ∞ | Higher = more diversity | General comparison |
| inv.simpson | 1 - ∞ | Higher = more diversity | Common clones |
| gini.coeff | 0 - 1 | 0 = equality, 1 = inequality | Clonality dominance |
| norm.entropy | 0 - 1 | Higher = more diversity | Evenness-focused |
| chao1 | ≥ richness | Estimates total richness | Small samples |
| d50 | Count | Clones making up 50% | Practical dominance |
Interpretation:
- •High diversity = Many unique clones, even distribution (healthy repertoire)
- •Low diversity = Few dominant clones (antigen-specific response, infection, cancer)
- •Gini ≈ 1 = Very skewed, few clones dominate
- •Gini ≈ 0 = Even distribution
Visualization Types
viz_type options:
- •
volume- Number of clones per sample/group - •
abundance- Clone abundance distribution (trend/histogram/density) - •
length- CDR3 sequence length distribution - •
residency- Clones present across groups (venn/upset) - •
stat- Expanded clone analysis (pies/sankey) - •
diversity- Diversity metrics (bar/box/violin) - •
geneusage- V/D/J gene usage frequency - •
rarefaction- Sampling depth assessment
Configuration Examples
Minimal Configuration
[ClonalStats.in] screpfile = ["ScRepCombiningExpression"]
Standard Diversity Analysis
[ClonalStats.in] screpfile = ["ScRepCombiningExpression"] [ClonalStats.envs.cases."Diversity"] viz_type = "diversity" method = "shannon" plot_type = "box" group_by = "Diagnosis" comparisons = true [ClonalStats.envs.cases."Gini Coeff"] viz_type = "diversity" method = "gini.coeff" plot_type = "violin" group_by = "Diagnosis" add_box = true
Expanded Clone Analysis
[ClonalStats.in]
screpfile = ["ScRepCombiningExpression"]
[ClonalStats.envs.cases."Expanded Clones"]
viz_type = "stat"
plot_type = "pies"
group_by = "Diagnosis"
subgroup_by = "seurat_clusters"
clones = {"Expanded (>2)" = "sel(Colitis > 2)"}
Rarefaction Analysis
[ClonalStats.in] screpfile = ["ScRepCombiningExpression"] [ClonalStats.envs.cases."Rarefaction"] viz_type = "rarefaction" group_by = "Patient" q = 1 # 0=richness, 1=shannon, 2=simpson n_boots = 20
Complete Analysis Suite
[ClonalStats.in] screpfile = ["ScRepCombiningExpression"] [ClonalStats.envs.cases."Volume"] viz_type = "volume" [ClonalStats.envs.cases."Abundance"] viz_type = "abundance" plot_type = "density" [ClonalStats.envs.cases."Diversity"] viz_type = "diversity" method = "shannon" [ClonalStats.envs.cases."Rarefaction"] viz_type = "rarefaction"
Common Patterns
Disease vs Healthy
[ClonalStats.envs.cases."Comparison"] viz_type = "diversity" method = "gini.coeff" plot_type = "box" group_by = "Condition" comparisons = true
Time Course
[ClonalStats.envs.cases."Timepoint"] viz_type = "volume" x = "Timepoint" [ClonalStats.envs.cases."Diversity"] viz_type = "diversity" method = "shannon" group_by = "Timepoint"
Treatment Response
[ClonalStats.envs.cases."Response"] viz_type = "diversity" method = "gini.coeff" group_by = "Response" plot_type = "box" comparisons = true
Dependencies
- •Upstream:
ScRepCombiningExpression(required) - •Related:
ScRepLoading,CDR3Clustering,TESSA(optional)
Validation Rules
- •Input must be valid scRepertoire object
- •For
viz_type = "diversity", method must be supported - •For rarefaction,
n_bootsshould be ≥ 10 - •Use
sel()syntax inclonesparameter for filtering
Troubleshooting
Sample column not found: Input must have Sample column or specify x parameter.
Strange diversity values: Small repertoire sizes cause bias. Use plot_type = "box".
Rarefaction curves noisy: Increase n_boots (try 50-100).
Too many clones in stat plots: Use subset or stricter clones thresholds.
Plot generation slow: Use clone_call = "gene" for speed, apply subset.
Missing comparisons: Set comparisons = true to add significance tests.
Best Practices
- •Start with default cases to see standard visualizations
- •Use multiple diversity metrics: Shannon + Gini
- •Check rarefaction curves to ensure sufficient sampling
- •Document clone thresholds when defining expanded clones
- •Use
clone_call = "gene"for speed, "aa" for granularity - •Set
save_data = truefor debugging (watch disk space) - •Validate findings with complementary diversity indices
- •Consider sample size: small samples underestimate richness