CDR3AAPhyschem Process Configuration
Purpose
Analyzes physicochemical properties of CDR3 amino acid sequences to understand biochemical characteristics of T-cell receptor repertoires. Performs regression analysis between two cell groups at different CDR3 lengths for each physicochemical feature (hydrophobicity, volume, isoelectric point, etc.).
When to Use
- •To analyze CDR3 biochemical properties differences between cell groups (e.g., Treg vs Tconv)
- •For feature engineering in TCR machine learning models
- •To identify sequence features that distinguish cell subsets
- •After
ScRepCombiningExpression(requires combined TCR + RNA data) - •When investigating T cell fate determination (regulatory vs conventional T cells)
Configuration Structure
Process Enablement
[CDR3AAPhyschem] cache = true
Input Specification
[CDR3AAPhyschem.in] scrfile = ["ScRepCombiningExpression"]
- •
scrfile: Output fromScRepCombiningExpression(RDS or qs/qs2 format) - •Must contain both TRA and TRB chains
- •Generated by
scRepertoire::combineExpression()
Environment Variables
[CDR3AAPhyschem.envs]
# Group comparison specification
group = "CellType"
comparison = {Treg = ["CD4 CTL", "CD4 Naive", "CD4 TCM", "CD4 TEM"], Tconv = "Tconv"}
target = "Treg"
each = "Sample"
# Chain selection
chain = "TRB"
Key Parameters:
- •
group: Column name in metadata defining groups to compare (e.g.,CellType,seurat_clusters) - •
comparison: Two-group specification for regression analysis- •Format 1 (dict):
Group1 = ["cell1", "cell2"], Group2 = "cell3" - •Format 2 (list):
["Group1", "Group2"](when groups exist in column)
- •Format 1 (dict):
- •
target: Which group to label as 1 in regression (default: first group incomparison) - •
each: Column(s) to split data for separate analyses- •Single column:
"Sample" - •Multiple columns:
["Sample", "Patient"] - •Comma-separated:
"Sample,Patient" - •If not provided, all cells used together
- •Single column:
Configuration Examples
Minimal Configuration
[CDR3AAPhyschem] [CDR3AAPhyschem.in] scrfile = ["ScRepCombiningExpression"]
Standard Treg vs Tconv Analysis
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
# Define cell type groups for comparison
group = "CellType"
comparison = {Treg = ["Treg"], Tconv = ["Tconv"]}
target = "Treg"
chain = "TRB"
Multi-Sample Analysis
[CDR3AAPhyschem] [CDR3AAPhyschem.envs] group = "CellType" comparison = ["Treg", "Tconv"] target = "Treg" # Run regression separately for each sample each = "Sample" chain = "TRB"
Custom Group Definition
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
group = "Cluster"
# Define clusters to compare
comparison = {
HighQuality = ["c1", "c2", "c5"],
LowQuality = ["c3", "c4"]
}
target = "HighQuality"
chain = "TRB"
Physicochemical Properties
Available Properties
The process calculates 8 key physicochemical properties from CDR3 amino acid sequences:
| Property | Description | Biological Significance |
|---|---|---|
| length | Total amino acid count in CDR3 | Influences binding loop size and flexibility |
| gravy | Grand Average of Hydrophobicity (Kyte-Doolittle scale) | Hydrophobic CDR3s associate with self-reactivity and Treg fate |
| bulkiness | Average bulkiness (Zimmerman scale) | Measures steric bulk of amino acids |
| polarity | Average polarity (Grantham scale) | Influences interactions with peptide-MHC |
| aliphatic | Normalized aliphatic index (Ikai scale) | Related to thermal stability |
| charge | Normalized net charge at physiological pH | Affects electrostatic interactions |
| acidic | Acidic side chain residue content (D, E proportion) | Contributes to negative charge |
| aromatic | Aromatic side chain content (F, W, Y proportion) | Important for π-π interactions |
Property Calculation Methods
- •Default scales: Standard biophysical scales from peer-reviewed literature
- •GRAVY: Kyte & Doolittle (1982) hydropathy scale
- •Bulkiness: Zimmerman et al. (1968) bulkiness parameters
- •Polarity: Grantham (1974) amino acid difference index
- •Aliphatic index: Ikai (1980) thermodynamic stability scale
- •Charge: Normalized based on pKa values (EMBOSS database)
- •Acidic/Basic/Aromatic: Direct residue counting proportions
Regression Analysis
- •Performed for each physicochemical property independently
- •Compares properties across CDR3 length distributions
- •Binary classification: target group (1) vs non-target (0)
- •Output: Statistical significance of property differences
Common Patterns
Pattern 1: Treg vs Tconv (TRB Chain)
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
# Literature-based: hydrophobic CDR3β promotes Treg fate
group = "CellType"
comparison = {Treg = ["Treg", "CD4+Treg"], Tconv = ["Tconv", "CD4+Tconv"]}
target = "Treg"
chain = "TRB"
each = "" # Analyze all samples together
Pattern 2: Selected Properties Only
[CDR3AAPhyschem] [CDR3AAPhyschem.envs] # Focus on hydrophobicity (key Treg feature) group = "CellType" comparison = ["Treg", "Tconv"] target = "Treg" chain = "TRB" # To analyze specific chains separately
Pattern 3: Multi-Chain Analysis
Run separate processes for different chains:
# TRB analysis [CDR3AAPhyschem] [CDR3AAPhyschem.envs] chain = "TRB" group = "CellType" comparison = ["Treg", "Tconv"] # Note: Create separate config for TRA analysis if needed
Pattern 4: Multi-Group Comparisons
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
group = "CellType"
comparison = {
Naive = ["CD4 Naive", "CD8 Naive"],
Memory = ["CD4 TEM", "CD4 TCM", "CD8 TEM", "CD8 TCM"],
Effector = ["CD4 CTL", "CD8 CTL"]
}
target = "Naive"
chain = "TRB"
Dependencies
- •Upstream:
ScRepCombiningExpression(required) - •Downstream: Feature analysis, ML model training, publication figures
- •Required data: Both TRA and TRB chains in combined object
Validation Rules
- •CDR3 sequence requirements: Must have valid amino acid sequences (no Ns)
- •Chain requirement: Data must contain specified chain (TRA or TRB)
- •Group specification: Groups must exist in metadata
- •Minimum cells: Sufficient cells per group for statistical regression
- •Length distribution: CDR3 length range must be adequate for regression
Troubleshooting
Issue: "Missing chain in data"
Cause: Specified chain (TRA/TRB) not found in combined object Solution:
# Change to available chain [CDR3AAPhyschem.envs] chain = "TRA" # or "TRB"
Issue: "Group not found in metadata"
Cause: group column or comparison values don't exist
Solution:
- •Check available metadata columns in
ScRepCombiningExpressionoutput - •Verify group names match exactly (case-sensitive)
[CDR3AAPhyschem.envs] group = "seurat_clusters" # If CellType not available comparison = ["0", "1"] # Use cluster IDs
Issue: "Insufficient cells for regression"
Cause: Too few cells in one or more groups Solution:
- •Use
eachto analyze samples separately if pooled analysis fails - •Combine similar cell types in
comparison
[CDR3AAPhyschem.envs]
# Combine rare subtypes
comparison = {HighExpander = ["Treg", "Tconv"], LowExpander = ["Tfh"]}
Issue: "No significant property differences"
Cause: Groups may not differ in physicochemical properties Solution:
- •Check if
comparisongroups are biologically distinct - •Consider different
groupcolumn (e.g., gene expression clusters) - •Verify CDR3 sequences are high-quality
Scientific Context
Key Publications
- •Stadinski et al. (2016): "Hydrophobic CDR3 residues promote development of self-reactive T cells" - Nature Immunology
- •Lagattuta et al. (2022): "TCR sequence features influence T cell fate" - Nature Immunology
- •Ostmeyer et al. (2019): "Biophysicochemical motifs distinguish TILs from healthy tissue" - Cancer Research
Interpretation Guidelines
- •High GRAVY: More hydrophobic CDR3 (associated with self-reactivity, Treg)
- •High charge: Electrostatic potential may affect binding affinity
- •High aromaticity: Increased π-π interactions, structural stability
- •Length distribution: Longer CDR3s may provide broader specificity
Feature Engineering Applications
Use properties as features for:
- •TCR specificity prediction models
- •T cell fate classification (Treg vs Tconv)
- •Antigen binding affinity estimation
- •Cross-reactivity assessment
Output Format
- •Directory:
{{in.scrfile | stem}}.cdr3aaphyschem/ - •Files:
- •Regression plots per property (hydrophobicity, volume, pI)
- •Statistical tables comparing groups
- •CDR3 length distributions
- •Property correlation matrices
- •Visualizations:
- •Property vs length scatter plots
- •Group-wise property boxplots
- •Regression curves with confidence intervals
Advanced Usage
Custom Property Scales
If using non-default scales (requires modifying underlying R script):
# Note: Advanced usage - may require script modification [CDR3AAPhyschem] [CDR3AAPhyschem.envs] # Specify alternative hydrophobicity scale hydro_scale = "Wimley" pK_source = "Murray"
Length-Based Stratification
[CDR3AAPhyschem] [CDR3AAPhyschem.envs] # Analyze by CDR3 length bins group = "CellType" comparison = ["Treg", "Tconv"] # Use metadata column with length information each = "CDR3_Length_Bin" chain = "TRB"
Publication-Ready Plots
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
group = "CellType"
comparison = {Treg = "Treg", Tconv = "Tconv"}
target = "Treg"
chain = "TRB"
# Publication parameters
plot_theme = "nature"
fig_dpi = 300
fig_format = "pdf"