AgentSkillsCN

bio-variant-calling

利用 bcftools mpileup 和 call 从比对后的读长中呼叫 SNPs 和插入/缺失变异。在从 BAM 文件中检测变异,或从比对结果生成 VCF 时使用。

SKILL.md
--- frontmatter
name: bio-variant-calling
description: Call SNPs and indels from aligned reads using bcftools mpileup and call. Use when detecting variants from BAM files or generating VCF from alignments.
tool_type: cli
primary_tool: bcftools

Variant Calling

Call SNPs and indels from aligned reads using bcftools.

Basic Workflow

code
BAM file + Reference FASTA
         |
         v
   bcftools mpileup (generate pileup)
         |
         v
   bcftools call (call variants)
         |
         v
   VCF file

bcftools mpileup + call

Basic Variant Calling

bash
bcftools mpileup -f reference.fa input.bam | bcftools call -mv -o variants.vcf

Output Compressed VCF

bash
bcftools mpileup -f reference.fa input.bam | bcftools call -mv -Oz -o variants.vcf.gz
bcftools index variants.vcf.gz

Call Specific Region

bash
bcftools mpileup -f reference.fa -r chr1:1000000-2000000 input.bam | \
    bcftools call -mv -o region.vcf

Call from Multiple BAMs

bash
bcftools mpileup -f reference.fa sample1.bam sample2.bam sample3.bam | \
    bcftools call -mv -o variants.vcf

BAM List File

bash
# bams.txt: one BAM path per line
bcftools mpileup -f reference.fa -b bams.txt | bcftools call -mv -o variants.vcf

mpileup Options

Quality Filtering

bash
bcftools mpileup -f reference.fa \
    -q 20 \           # Min mapping quality
    -Q 20 \           # Min base quality
    input.bam | bcftools call -mv -o variants.vcf

Annotate with Read Depth

bash
bcftools mpileup -f reference.fa -a DP,AD input.bam | bcftools call -mv -o variants.vcf

Full Annotation Set

bash
bcftools mpileup -f reference.fa \
    -a FORMAT/DP,FORMAT/AD,FORMAT/ADF,FORMAT/ADR,INFO/AD \
    input.bam | bcftools call -mv -o variants.vcf

Target Regions (BED)

bash
bcftools mpileup -f reference.fa -R targets.bed input.bam | \
    bcftools call -mv -o variants.vcf

Max Depth

bash
bcftools mpileup -f reference.fa -d 1000 input.bam | bcftools call -mv -o variants.vcf

call Options

Calling Models

FlagModelUse Case
-mMultiallelic callerDefault, recommended
-cConsensus callerLegacy, single sample

Output Variants Only

bash
bcftools mpileup -f reference.fa input.bam | bcftools call -mv -o variants.vcf
# -v outputs variant sites only (not reference calls)

Output All Sites

bash
bcftools mpileup -f reference.fa input.bam | bcftools call -m -o all_sites.vcf
# Without -v, outputs all sites including reference

Ploidy

bash
# Haploid calling
bcftools mpileup -f reference.fa input.bam | bcftools call -m --ploidy 1 -o variants.vcf

# Specify ploidy file
bcftools mpileup -f reference.fa input.bam | bcftools call -m --ploidy-file ploidy.txt -o variants.vcf

Prior Probability

bash
# Adjust variant prior (default 1.1e-3)
bcftools mpileup -f reference.fa input.bam | bcftools call -m -P 0.001 -o variants.vcf

Common Pipelines

Standard SNP/Indel Calling

bash
bcftools mpileup -Ou -f reference.fa \
    -q 20 -Q 20 \
    -a FORMAT/DP,FORMAT/AD \
    input.bam | \
bcftools call -mv -Oz -o variants.vcf.gz

bcftools index variants.vcf.gz

Multi-sample Calling

bash
bcftools mpileup -Ou -f reference.fa \
    -a FORMAT/DP,FORMAT/AD \
    sample1.bam sample2.bam sample3.bam | \
bcftools call -mv -Oz -o cohort.vcf.gz

bcftools index cohort.vcf.gz

Calling with Regions

bash
bcftools mpileup -Ou -f reference.fa \
    -R targets.bed \
    -a FORMAT/DP,FORMAT/AD \
    input.bam | \
bcftools call -mv -Oz -o targets.vcf.gz

Parallel by Chromosome

bash
for chr in chr1 chr2 chr3; do
    bcftools mpileup -Ou -f reference.fa -r "$chr" input.bam | \
        bcftools call -mv -Oz -o "${chr}.vcf.gz" &
done
wait

# Concatenate results
bcftools concat -Oz -o all.vcf.gz chr*.vcf.gz
bcftools index all.vcf.gz

Annotation Tags

INFO Tags

TagDescription
DPTotal read depth
ADAllelic depths
MQMapping quality
FSFisher strand bias
SGBSegregation based metric

FORMAT Tags

TagDescription
GTGenotype
DPRead depth per sample
ADAllelic depths per sample
ADFForward strand allelic depths
ADRReverse strand allelic depths
GQGenotype quality
PLPhred-scaled likelihoods

Request Specific Annotations

bash
bcftools mpileup -f reference.fa \
    -a FORMAT/DP,FORMAT/AD,FORMAT/SP,INFO/AD \
    input.bam | bcftools call -mv -o variants.vcf

Performance Options

Multi-threading

bash
bcftools mpileup -f reference.fa --threads 4 input.bam | \
    bcftools call -mv --threads 4 -o variants.vcf

Uncompressed BCF for Speed

bash
bcftools mpileup -Ou -f reference.fa input.bam | bcftools call -mv -Ou | \
    bcftools filter -Oz -o filtered.vcf.gz

Quick Reference

TaskCommand
Basic callingbcftools mpileup -f ref.fa in.bam | bcftools call -mv -o out.vcf
With quality filterbcftools mpileup -f ref.fa -q 20 -Q 20 in.bam | bcftools call -mv
Regionbcftools mpileup -f ref.fa -r chr1:1-1000 in.bam | bcftools call -mv
Multi-samplebcftools mpileup -f ref.fa s1.bam s2.bam | bcftools call -mv
With annotationsbcftools mpileup -f ref.fa -a DP,AD in.bam | bcftools call -mv

Common Errors

ErrorCauseSolution
no FASTA referenceMissing -fAdd -f reference.fa
reference mismatchWrong referenceUse same reference as alignment
no variants calledLow quality/depthLower quality thresholds

Related Skills

  • vcf-basics - View and query resulting VCF
  • filtering-best-practices - Filter variants by quality
  • variant-normalization - Normalize indels
  • alignment-files/pileup-generation - Alternative pileup generation