miRDeep2 Analysis

Name: bio-small-rna-seq-mirdeep2-analysis
Rating: 78
Author: GPTomics

Workflow Overview

code

Collapsed reads (FASTA)
    |
    v
mapper.pl ---------> Align to genome, create ARF file
    |
    v
miRDeep2.pl -------> Predict novel miRNAs, quantify known
    |
    v
quantifier.pl -----> Quantify known miRNAs only (optional)

Step 1: Prepare Genome Index

bash

# Build bowtie index for miRDeep2 mapper
bowtie-build genome.fa genome_index

Step 2: Map Reads with mapper.pl

bash

# Collapse reads and map to genome
mapper.pl reads.fastq \
    -e \
    -h \
    -i \
    -j \
    -k TGGAATTCTCGGGTGCCAAGG \
    -l 18 \
    -m \
    -p genome_index \
    -s reads_collapsed.fa \
    -t reads_vs_genome.arf \
    -v

# Key options:
# -e: Input is FASTQ
# -h: Parse Illumina headers
# -k: Clip 3' adapter
# -l 18: Discard reads < 18 nt
# -m: Collapse reads
# -p: Bowtie index prefix
# -s: Output collapsed FASTA
# -t: Output ARF alignment file

Step 3: Run miRDeep2 Prediction

bash

# Predict novel miRNAs
miRDeep2.pl \
    reads_collapsed.fa \
    genome.fa \
    reads_vs_genome.arf \
    mature_ref.fa \
    mature_other.fa \
    hairpin_ref.fa \
    -t Human \
    2> report.log

# Arguments:
# 1. Collapsed reads FASTA
# 2. Genome FASTA
# 3. Alignment ARF file
# 4. Known mature miRNAs (same species)
# 5. Known mature miRNAs (other species, for conservation)
# 6. Known hairpin precursors
# -t: Species for miRBase lookup

Prepare miRBase References

bash

# Download from miRBase
wget https://www.mirbase.org/download/mature.fa
wget https://www.mirbase.org/download/hairpin.fa

# Extract species-specific sequences
grep -A1 ">hsa-" mature.fa > mature_human.fa
grep -A1 ">hsa-" hairpin.fa > hairpin_human.fa

Step 4: Quantify Known miRNAs Only

bash

# If not doing novel discovery
quantifier.pl \
    -p hairpin_human.fa \
    -m mature_human.fa \
    -r reads_collapsed.fa \
    -t hsa

# Output: miRNAs_expressed_all_samples.csv

Output Files

File	Description
result_*.html	Interactive results report
result_*.csv	Predicted novel miRNAs with scores
miRNAs_expressed_all_samples*.csv	Expression quantification
pdfs_*.pdf	Secondary structure plots

Interpret miRDeep2 Scores

code

Score interpretation:
>10: High confidence novel miRNA
5-10: Medium confidence
1-5: Low confidence, needs validation
<1: Likely false positive

Key metrics:
- miRDeep2 score: Overall confidence
- Total read count: Expression level
- Mature/star ratio: Strand bias (expect asymmetry)
- Randfold p-value: Structural stability

Parse Results in Python

python

import pandas as pd

def parse_mirdeep2_results(csv_path):
    '''Parse miRDeep2 novel miRNA predictions'''
    df = pd.read_csv(csv_path, sep='\t', skiprows=1)

    # Filter high-confidence predictions
    # Score > 10 indicates high confidence novel miRNA
    high_conf = df[df['miRDeep2 score'] > 10]

    return high_conf

# Parse quantification results
def parse_quantifier_output(csv_path):
    '''Parse quantifier.pl expression matrix'''
    df = pd.read_csv(csv_path, sep='\t')
    return df

Related Skills

•smrna-preprocessing - Prepare reads for miRDeep2
•mirge3-analysis - Faster quantification alternative
•differential-mirna - DE analysis of miRNA counts