AgentSkillsCN

bio-read-qc-quality-reports

利用 FastQC 和 MultiQC 从 FASTQ 文件生成并解读质量报告。评估每碱基质量、接头含量、GC 偏倚、重复率,以及高丰度序列。当您需要对原始测序数据进行初步 QC,或验证预处理结果时,请使用此方法。

SKILL.md
--- frontmatter
name: bio-read-qc-quality-reports
description: Generate and interpret quality reports from FASTQ files using FastQC and MultiQC. Assess per-base quality, adapter content, GC bias, duplication levels, and overrepresented sequences. Use when performing initial QC on raw sequencing data or validating preprocessing results.
tool_type: cli
primary_tool: fastqc

Quality Reports

Generate quality reports for FASTQ files using FastQC and aggregate multiple reports with MultiQC.

FastQC - Single Sample Reports

Basic Usage

bash
# Single file
fastqc sample.fastq.gz

# Multiple files
fastqc *.fastq.gz

# Specify output directory
fastqc -o qc_reports/ sample_R1.fastq.gz sample_R2.fastq.gz

# Set threads
fastqc -t 4 *.fastq.gz

Output Files

FastQC produces two files per input:

  • sample_fastqc.html - Interactive HTML report
  • sample_fastqc.zip - Data files and images

Key Modules

ModuleWhat It ShowsWarning Signs
Per base sequence qualityQuality scores across readDrop below Q20 at 3' end
Per sequence qualityQuality score distributionBimodal distribution
Per base sequence contentNucleotide compositionImbalance at start (normal)
Per sequence GC contentGC distributionSecondary peak (contamination)
Per base N contentUnknown basesHigh N content
Sequence length distributionRead lengthsUnexpected variation
Sequence duplicationDuplicate readsHigh duplication (PCR)
Overrepresented sequencesCommon sequencesAdapter contamination
Adapter contentAdapter sequencesVisible adapter curves

Extract Data from ZIP

bash
# Unzip to access raw data
unzip sample_fastqc.zip

# View summary
cat sample_fastqc/summary.txt

# Get per-base quality
cat sample_fastqc/fastqc_data.txt | grep -A 50 ">>Per base sequence quality"

MultiQC - Aggregate Reports

Basic Usage

bash
# Aggregate all FastQC reports in current directory
multiqc .

# Specify input and output
multiqc qc_reports/ -o multiqc_output/

# Custom report name
multiqc . -n my_project_qc

# Force overwrite
multiqc . -f

Common Options

bash
# Flat directory (no sample subdirs)
multiqc --flat .

# Export data as TSV
multiqc . --export

# Only specific modules
multiqc . -m fastqc

# Exclude patterns
multiqc . --ignore '*_trimmed*'

# Include patterns
multiqc . --ignore-samples '*negative*'

Output Files

  • multiqc_report.html - Interactive HTML report
  • multiqc_data/ - Directory with data tables
    • multiqc_fastqc.txt - FastQC metrics
    • multiqc_general_stats.txt - Summary statistics
    • multiqc_sources.txt - Source files used

Extract Data Programmatically

python
import pandas as pd

general_stats = pd.read_csv('multiqc_data/multiqc_general_stats.txt', sep='\t')
print(general_stats.columns)

fastqc_data = pd.read_csv('multiqc_data/multiqc_fastqc.txt', sep='\t')

Batch Processing

Process Multiple Samples

bash
# All FASTQ files in parallel
fastqc -t 8 -o qc_reports/ raw_data/*.fastq.gz

# Then aggregate
multiqc qc_reports/ -o multiqc_output/

Before and After Trimming

bash
# Create separate directories
mkdir -p qc_reports/raw qc_reports/trimmed

# QC raw reads
fastqc -o qc_reports/raw/ raw_data/*.fastq.gz

# After trimming (using fastp, cutadapt, etc.)
fastqc -o qc_reports/trimmed/ trimmed_data/*.fastq.gz

# Compare with MultiQC
multiqc qc_reports/ -o qc_comparison/

Interpretation Guide

Quality Scores

Phred ScoreError RateInterpretation
Q400.0001Excellent
Q300.001Good (Illumina target)
Q200.01Acceptable
Q100.1Poor

Common Issues

IssueLikely CauseAction
Low quality at 3' endNormal degradationTrim 3' end
Adapter contaminationShort insertsTrim adapters
GC biasLibrary prepConsider correction
High duplicationLow complexity, PCRMark/remove duplicates
Overrepresented seqsAdapters, primersCheck sequences

Configuration

Custom Adapters

Create ~/.fastqc/Configuration/adapter_list.txt:

code
Custom_Adapter_Name    ACGTACGTACGT

Custom Limits

Create ~/.fastqc/Configuration/limits.txt to customize thresholds:

code
# Warn if mean quality below 25
quality_sequence    warn    25
quality_sequence    error   20

Related Skills

  • adapter-trimming - Remove adapters detected by FastQC
  • fastp-workflow - All-in-one QC and trimming
  • sequence-io/read-sequences - FASTQ file reading/writing