AgentSkillsCN

bio-variant-calling-structural-variant-calling

利用 Manta、Delly 与 LUMPY 从短读长测序中呼叫结构变异(SV)。这些工具能够检测超出常规 SNV 呼叫工具检测范围的大片段缺失、插入、倒位、重复与易位。在从短读长数据中检测结构变异时使用。

SKILL.md
--- frontmatter
name: bio-variant-calling-structural-variant-calling
description: Call structural variants (SVs) from short-read sequencing using Manta, Delly, and LUMPY. Detects deletions, insertions, inversions, duplications, and translocations that are too large for standard SNV callers. Use when detecting structural variants from short-read data.
tool_type: cli
primary_tool: manta

Structural Variant Calling (Short Reads)

Manta (Recommended)

bash
# Configure Manta run (creates runWorkflow.py)
configManta.py \
    --bam sample.bam \
    --referenceFasta reference.fa \
    --runDir manta_run

# Execute
manta_run/runWorkflow.py -j 8

# Output: manta_run/results/variants/
# - diploidSV.vcf.gz (germline SVs)
# - candidateSV.vcf.gz (all candidates)
# - candidateSmallIndels.vcf.gz (small indels)

Manta Tumor-Normal Mode

bash
# Somatic SV calling
configManta.py \
    --tumorBam tumor.bam \
    --normalBam normal.bam \
    --referenceFasta reference.fa \
    --runDir manta_somatic

manta_somatic/runWorkflow.py -j 8

# Output includes:
# - somaticSV.vcf.gz (somatic SVs)
# - diploidSV.vcf.gz (germline SVs)

Manta Options

bash
# WES mode (for exome data)
configManta.py \
    --bam sample.bam \
    --referenceFasta reference.fa \
    --exome \                          # Use exome settings
    --callRegions regions.bed.gz \     # Restrict to regions
    --runDir manta_exome

# RNA-seq mode
configManta.py \
    --bam rnaseq.bam \
    --referenceFasta reference.fa \
    --rna \                            # RNA-seq mode
    --runDir manta_rna

Delly

bash
# Call SVs
delly call \
    -g reference.fa \
    -o sv_calls.bcf \
    sample.bam

# Convert to VCF
bcftools view sv_calls.bcf > sv_calls.vcf

# Multiple samples (joint calling)
delly call \
    -g reference.fa \
    -o joint_svs.bcf \
    sample1.bam sample2.bam sample3.bam

Delly Somatic Mode

bash
# Call with tumor-normal
delly call \
    -g reference.fa \
    -o svs.bcf \
    tumor.bam normal.bam

# Create sample file
echo -e "tumor\ttumor\nnormal\tcontrol" > samples.tsv

# Filter for somatic
delly filter \
    -f somatic \
    -o somatic_svs.bcf \
    -s samples.tsv \
    svs.bcf

Delly SV Types

bash
# Call specific SV type
delly call -t DEL -g ref.fa -o deletions.bcf sample.bam
delly call -t DUP -g ref.fa -o duplications.bcf sample.bam
delly call -t INV -g ref.fa -o inversions.bcf sample.bam
delly call -t BND -g ref.fa -o translocations.bcf sample.bam
delly call -t INS -g ref.fa -o insertions.bcf sample.bam

LUMPY

bash
# Extract split reads and discordant pairs
samtools view -b -F 1294 sample.bam > discordant.bam
samtools view -h sample.bam | \
    /path/to/lumpy-sv/scripts/extractSplitReads_BwaMem -i stdin | \
    samtools view -Sb - > splitters.bam

# Run LUMPY
lumpyexpress \
    -B sample.bam \
    -S splitters.bam \
    -D discordant.bam \
    -o lumpy_svs.vcf

Smoove (LUMPY Wrapper)

bash
# Simplified LUMPY workflow
smoove call \
    --name sample \
    --fasta reference.fa \
    --outdir smoove_output \
    -p 8 \
    sample.bam

# Output: smoove_output/sample-smoove.genotyped.vcf.gz

Merge Multiple Callers

bash
# Use SURVIVOR to merge callsets
# Create file listing VCFs
ls manta_svs.vcf delly_svs.vcf lumpy_svs.vcf > vcf_list.txt

# Merge with parameters
SURVIVOR merge vcf_list.txt 1000 2 1 1 0 50 merged_svs.vcf

# Parameters: max_dist min_callers type_agree strand_agree estimate_dist min_size

Filter SV Calls

bash
# Filter by quality
bcftools view -i 'QUAL >= 20' svs.vcf > svs.filtered.vcf

# Filter by size
bcftools view -i 'ABS(SVLEN) >= 50' svs.vcf > svs.min50.vcf

# Filter by SV type
bcftools view -i 'SVTYPE="DEL"' svs.vcf > deletions.vcf
bcftools view -i 'SVTYPE="INS"' svs.vcf > insertions.vcf
bcftools view -i 'SVTYPE="INV"' svs.vcf > inversions.vcf
bcftools view -i 'SVTYPE="DUP"' svs.vcf > duplications.vcf
bcftools view -i 'SVTYPE="BND"' svs.vcf > translocations.vcf

# Keep only PASS
bcftools view -f PASS svs.vcf > svs.pass.vcf

Annotate SVs

bash
# AnnotSV annotation
AnnotSV \
    -SVinputFile svs.vcf \
    -genomeBuild GRCh38 \
    -outputFile annotated_svs

# Output includes: genes, DGV, gnomAD-SV, ClinVar

SV Types

TypeCodeDescription
DeletionDELSequence removed
InsertionINSSequence inserted
InversionINVSequence reversed
DuplicationDUPSequence duplicated
TranslocationBNDBreakend (inter-chromosomal)

Comparison: Manta vs Delly vs LUMPY

FeatureMantaDellyLUMPY
SpeedFastMediumMedium
SensitivityHighHighHigh
Small SVsGoodModerateGood
Large SVsGoodGoodGood
RNA-seqYesNoNo
SomaticYesYesLimited

Coverage Guidelines

CoverageDetection Ability
10xLarge SVs (>1kb)
30xMost SVs
50x+Small SVs, better breakpoints

Long-Read SV Callers

For long-read data (ONT/PacBio HiFi), use specialized callers with higher sensitivity:

CallerBest ForNotes
CuteSVONT/HiFiFast, accurate for all SV types
Sniffles2ONT/HiFiPopulation-scale, multisample
PBSVPacBioOfficial PacBio caller

See long-read-sequencing/structural-variants for long-read SV workflows.

Related Skills

  • long-read-sequencing/structural-variants - Long-read SV calling
  • copy-number/cnvkit-analysis - Copy number variants
  • variant-calling/filtering-best-practices - Filter VCF files
  • alignment-files/alignment-filtering - Prepare BAM files