AgentSkillsCN

cdr3aaphyschem

分析 CDR3 氨基酸序列的物理化学特性,以深入理解 T 细胞受体库的生化特征。针对每种物理化学特性(如疏水性、体积、等电点等),对不同 CDR3 长度的两组细胞进行回归分析。

SKILL.md
--- frontmatter
name: cdr3aaphyschem
description: Analyzes physicochemical properties of CDR3 amino acid sequences to understand biochemical characteristics of T-cell receptor repertoires. Performs regression analysis between two cell groups at different CDR3 lengths for each physicochemical feature (hydrophobicity, volume, isoelectric point, etc.).

CDR3AAPhyschem Process Configuration

Purpose

Analyzes physicochemical properties of CDR3 amino acid sequences to understand biochemical characteristics of T-cell receptor repertoires. Performs regression analysis between two cell groups at different CDR3 lengths for each physicochemical feature (hydrophobicity, volume, isoelectric point, etc.).

When to Use

  • To analyze CDR3 biochemical properties differences between cell groups (e.g., Treg vs Tconv)
  • For feature engineering in TCR machine learning models
  • To identify sequence features that distinguish cell subsets
  • After ScRepCombiningExpression (requires combined TCR + RNA data)
  • When investigating T cell fate determination (regulatory vs conventional T cells)

Configuration Structure

Process Enablement

toml
[CDR3AAPhyschem]
cache = true

Input Specification

toml
[CDR3AAPhyschem.in]
scrfile = ["ScRepCombiningExpression"]
  • scrfile: Output from ScRepCombiningExpression (RDS or qs/qs2 format)
  • Must contain both TRA and TRB chains
  • Generated by scRepertoire::combineExpression()

Environment Variables

toml
[CDR3AAPhyschem.envs]
# Group comparison specification
group = "CellType"
comparison = {Treg = ["CD4 CTL", "CD4 Naive", "CD4 TCM", "CD4 TEM"], Tconv = "Tconv"}
target = "Treg"
each = "Sample"

# Chain selection
chain = "TRB"

Key Parameters:

  • group: Column name in metadata defining groups to compare (e.g., CellType, seurat_clusters)
  • comparison: Two-group specification for regression analysis
    • Format 1 (dict): Group1 = ["cell1", "cell2"], Group2 = "cell3"
    • Format 2 (list): ["Group1", "Group2"] (when groups exist in column)
  • target: Which group to label as 1 in regression (default: first group in comparison)
  • each: Column(s) to split data for separate analyses
    • Single column: "Sample"
    • Multiple columns: ["Sample", "Patient"]
    • Comma-separated: "Sample,Patient"
    • If not provided, all cells used together

Configuration Examples

Minimal Configuration

toml
[CDR3AAPhyschem]
[CDR3AAPhyschem.in]
scrfile = ["ScRepCombiningExpression"]

Standard Treg vs Tconv Analysis

toml
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
# Define cell type groups for comparison
group = "CellType"
comparison = {Treg = ["Treg"], Tconv = ["Tconv"]}
target = "Treg"
chain = "TRB"

Multi-Sample Analysis

toml
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
group = "CellType"
comparison = ["Treg", "Tconv"]
target = "Treg"
# Run regression separately for each sample
each = "Sample"
chain = "TRB"

Custom Group Definition

toml
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
group = "Cluster"
# Define clusters to compare
comparison = {
  HighQuality = ["c1", "c2", "c5"],
  LowQuality = ["c3", "c4"]
}
target = "HighQuality"
chain = "TRB"

Physicochemical Properties

Available Properties

The process calculates 8 key physicochemical properties from CDR3 amino acid sequences:

PropertyDescriptionBiological Significance
lengthTotal amino acid count in CDR3Influences binding loop size and flexibility
gravyGrand Average of Hydrophobicity (Kyte-Doolittle scale)Hydrophobic CDR3s associate with self-reactivity and Treg fate
bulkinessAverage bulkiness (Zimmerman scale)Measures steric bulk of amino acids
polarityAverage polarity (Grantham scale)Influences interactions with peptide-MHC
aliphaticNormalized aliphatic index (Ikai scale)Related to thermal stability
chargeNormalized net charge at physiological pHAffects electrostatic interactions
acidicAcidic side chain residue content (D, E proportion)Contributes to negative charge
aromaticAromatic side chain content (F, W, Y proportion)Important for π-π interactions

Property Calculation Methods

  • Default scales: Standard biophysical scales from peer-reviewed literature
  • GRAVY: Kyte & Doolittle (1982) hydropathy scale
  • Bulkiness: Zimmerman et al. (1968) bulkiness parameters
  • Polarity: Grantham (1974) amino acid difference index
  • Aliphatic index: Ikai (1980) thermodynamic stability scale
  • Charge: Normalized based on pKa values (EMBOSS database)
  • Acidic/Basic/Aromatic: Direct residue counting proportions

Regression Analysis

  • Performed for each physicochemical property independently
  • Compares properties across CDR3 length distributions
  • Binary classification: target group (1) vs non-target (0)
  • Output: Statistical significance of property differences

Common Patterns

Pattern 1: Treg vs Tconv (TRB Chain)

toml
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
# Literature-based: hydrophobic CDR3β promotes Treg fate
group = "CellType"
comparison = {Treg = ["Treg", "CD4+Treg"], Tconv = ["Tconv", "CD4+Tconv"]}
target = "Treg"
chain = "TRB"
each = ""  # Analyze all samples together

Pattern 2: Selected Properties Only

toml
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
# Focus on hydrophobicity (key Treg feature)
group = "CellType"
comparison = ["Treg", "Tconv"]
target = "Treg"
chain = "TRB"
# To analyze specific chains separately

Pattern 3: Multi-Chain Analysis

Run separate processes for different chains:

toml
# TRB analysis
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
chain = "TRB"
group = "CellType"
comparison = ["Treg", "Tconv"]

# Note: Create separate config for TRA analysis if needed

Pattern 4: Multi-Group Comparisons

toml
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
group = "CellType"
comparison = {
  Naive = ["CD4 Naive", "CD8 Naive"],
  Memory = ["CD4 TEM", "CD4 TCM", "CD8 TEM", "CD8 TCM"],
  Effector = ["CD4 CTL", "CD8 CTL"]
}
target = "Naive"
chain = "TRB"

Dependencies

  • Upstream: ScRepCombiningExpression (required)
  • Downstream: Feature analysis, ML model training, publication figures
  • Required data: Both TRA and TRB chains in combined object

Validation Rules

  • CDR3 sequence requirements: Must have valid amino acid sequences (no Ns)
  • Chain requirement: Data must contain specified chain (TRA or TRB)
  • Group specification: Groups must exist in metadata
  • Minimum cells: Sufficient cells per group for statistical regression
  • Length distribution: CDR3 length range must be adequate for regression

Troubleshooting

Issue: "Missing chain in data"

Cause: Specified chain (TRA/TRB) not found in combined object Solution:

toml
# Change to available chain
[CDR3AAPhyschem.envs]
chain = "TRA"  # or "TRB"

Issue: "Group not found in metadata"

Cause: group column or comparison values don't exist Solution:

  1. Check available metadata columns in ScRepCombiningExpression output
  2. Verify group names match exactly (case-sensitive)
toml
[CDR3AAPhyschem.envs]
group = "seurat_clusters"  # If CellType not available
comparison = ["0", "1"]  # Use cluster IDs

Issue: "Insufficient cells for regression"

Cause: Too few cells in one or more groups Solution:

  1. Use each to analyze samples separately if pooled analysis fails
  2. Combine similar cell types in comparison
toml
[CDR3AAPhyschem.envs]
# Combine rare subtypes
comparison = {HighExpander = ["Treg", "Tconv"], LowExpander = ["Tfh"]}

Issue: "No significant property differences"

Cause: Groups may not differ in physicochemical properties Solution:

  1. Check if comparison groups are biologically distinct
  2. Consider different group column (e.g., gene expression clusters)
  3. Verify CDR3 sequences are high-quality

Scientific Context

Key Publications

  1. Stadinski et al. (2016): "Hydrophobic CDR3 residues promote development of self-reactive T cells" - Nature Immunology
  2. Lagattuta et al. (2022): "TCR sequence features influence T cell fate" - Nature Immunology
  3. Ostmeyer et al. (2019): "Biophysicochemical motifs distinguish TILs from healthy tissue" - Cancer Research

Interpretation Guidelines

  • High GRAVY: More hydrophobic CDR3 (associated with self-reactivity, Treg)
  • High charge: Electrostatic potential may affect binding affinity
  • High aromaticity: Increased π-π interactions, structural stability
  • Length distribution: Longer CDR3s may provide broader specificity

Feature Engineering Applications

Use properties as features for:

  • TCR specificity prediction models
  • T cell fate classification (Treg vs Tconv)
  • Antigen binding affinity estimation
  • Cross-reactivity assessment

Output Format

  • Directory: {{in.scrfile | stem}}.cdr3aaphyschem/
  • Files:
    • Regression plots per property (hydrophobicity, volume, pI)
    • Statistical tables comparing groups
    • CDR3 length distributions
    • Property correlation matrices
  • Visualizations:
    • Property vs length scatter plots
    • Group-wise property boxplots
    • Regression curves with confidence intervals

Advanced Usage

Custom Property Scales

If using non-default scales (requires modifying underlying R script):

toml
# Note: Advanced usage - may require script modification
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
# Specify alternative hydrophobicity scale
hydro_scale = "Wimley"
pK_source = "Murray"

Length-Based Stratification

toml
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
# Analyze by CDR3 length bins
group = "CellType"
comparison = ["Treg", "Tconv"]
# Use metadata column with length information
each = "CDR3_Length_Bin"
chain = "TRB"

Publication-Ready Plots

toml
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
group = "CellType"
comparison = {Treg = "Treg", Tconv = "Tconv"}
target = "Treg"
chain = "TRB"
# Publication parameters
plot_theme = "nature"
fig_dpi = 300
fig_format = "pdf"