AgentSkillsCN

bio-longread-alignment

利用minimap2为牛津纳米孔与PacBio数据进行长读长比对。支持多种预设,适用于不同读长类型与应用场景。适用于在将ONT或PacBio读长比对至参考基因组,以进行变异调用、SV检测或覆盖度分析时使用。

SKILL.md
--- frontmatter
name: bio-longread-alignment
description: Align long reads using minimap2 for Oxford Nanopore and PacBio data. Supports various presets for different read types and applications. Use when aligning ONT or PacBio reads to a reference genome for variant calling, SV detection, or coverage analysis.
tool_type: cli
primary_tool: minimap2

Long-Read Alignment with minimap2

Oxford Nanopore Alignment

bash
# Basic ONT alignment
minimap2 -ax map-ont reference.fa reads.fastq.gz | \
    samtools sort -o aligned.bam
samtools index aligned.bam

PacBio HiFi Alignment

bash
# PacBio HiFi reads (high accuracy)
minimap2 -ax map-hifi reference.fa reads.fastq.gz | \
    samtools sort -o aligned.bam
samtools index aligned.bam

PacBio CLR Alignment

bash
# PacBio CLR (continuous long reads, lower accuracy)
minimap2 -ax map-pb reference.fa reads.fastq.gz | \
    samtools sort -o aligned.bam
samtools index aligned.bam

Pre-Build Index for Multiple Runs

bash
# Build index once
minimap2 -d reference.mmi reference.fa

# Use index for alignment
minimap2 -ax map-ont reference.mmi reads.fastq.gz | samtools sort -o aligned.bam

Common Options

bash
minimap2 -ax map-ont \
    -t 8 \                         # Threads
    -R '@RG\tID:sample\tSM:sample' \  # Read group
    --secondary=no \               # No secondary alignments
    --MD \                         # Generate MD tag for variants
    -Y \                           # Use soft clipping for supplementary
    reference.fa reads.fastq.gz | \
    samtools sort -@ 4 -o aligned.bam

Splice-Aware Alignment (RNA)

bash
# For direct RNA or cDNA sequencing
minimap2 -ax splice reference.fa reads.fastq.gz | \
    samtools sort -o aligned.bam

With Junction BED (Known Splice Sites)

bash
# Provide known splice junctions
minimap2 -ax splice --junc-bed junctions.bed \
    reference.fa reads.fastq.gz | samtools sort -o aligned.bam

Assembly to Reference Alignment

bash
# Assembly with ~0.1% divergence
minimap2 -ax asm5 reference.fa assembly.fa > aligned.sam

# Assembly with higher divergence (~5%)
minimap2 -ax asm20 reference.fa assembly.fa > aligned.sam

Output PAF (Faster, No BAM)

bash
# PAF format (faster, for quick analysis)
minimap2 -x map-ont reference.fa reads.fastq.gz > alignments.paf

Keep Secondary and Supplementary

bash
# Keep all alignments (for SV calling)
minimap2 -ax map-ont \
    --secondary=yes \
    -N 5 \                         # Max secondary alignments
    reference.fa reads.fastq.gz | samtools sort -o aligned.bam

Filter Alignments

bash
# During alignment pipeline
minimap2 -ax map-ont reference.fa reads.fastq.gz | \
    samtools view -b -q 10 | \     # Min mapping quality 10
    samtools sort -o aligned.bam

Multiple FASTQ Files

bash
# Concatenate inputs
minimap2 -ax map-ont reference.fa reads1.fastq.gz reads2.fastq.gz | \
    samtools sort -o aligned.bam

# Or use file list
cat file_list.txt | xargs minimap2 -ax map-ont reference.fa | \
    samtools sort -o aligned.bam

Output Statistics

bash
# Get alignment statistics
samtools flagstat aligned.bam

# Detailed stats
samtools stats aligned.bam | grep ^SN

Convert PAF to BED

bash
# Extract alignments to BED
awk 'OFS="\t" {print $6, $8, $9, $1, $12, ($5=="+")?"+":"-"}' alignments.paf > alignments.bed

Key Presets

PresetDescriptionBest For
map-ontONT readsNanopore genomic
map-hifiPacBio HiFiPacBio genomic
map-pbPacBio CLRPacBio CLR
spliceLong RNA readscDNA, direct RNA
asm5Low divergenceSame species assembly
asm20High divergenceCross-species assembly
srShort readsIllumina (basic)

Key Parameters

ParameterDefaultDescription
-t3CPU threads
-k15K-mer size
-w10Minimizer window
-aoffOutput SAM
-xnonePreset
--secondaryyesOutput secondary
-N5Max secondary alignments
--MDoffGenerate MD tag
-RnoneRead group header
-YoffSoft clipping for supplementary

Output Formats

FormatFlagDescription
PAF(default)Pairwise Alignment Format
SAM-aSequence Alignment Map
BAM-a | samtoolsBinary SAM

Related Skills

  • medaka-polishing - Polish consensus with medaka
  • structural-variants - Call SVs from alignments
  • alignment-files/sam-bam-basics - BAM manipulation