Peak Calling
Overview
This skill automatically performs core peak calling with MACS2 for ChIP-seq and ATAC-seq data, based on the BAM files in the current directory. It includes automatic experiment recognition and parameter selection.
Main steps include:
- •Refer to the Inputs & Outputs section to check inputs and build the output architecture. All the output file should located in
${proj_dir}in Step 0. - •Always prompt user for
genome_sizeto use (e.g. hs or mm). Never decide by yourself. - •Always prompt user if required control files are missing for ChIP-seq data.
- •Always prompt user for the q value cutoff for peak calling.
- •Detect experiment type (TF, histone mark, or ATAC-seq).
- •Automatically decide whether to call narrow or broad peaks.
- •Always use filtered BAM file (
filtered.bam) if available. - •Detect sequencing type (single-end or paired-end) using SAM/BAM flags.
- •Perform MACS3 peak calling accordingly.
- •Generate a parameter log file (
${sample}_used_parameters.txt) with justification for each chosen option.
Inputs & Outputs
Inputs
bash
${sample}.bam # filtered bam files
Outputs
bash
all_peak_calling/
peaks/
${sample}.narrowPeak # or ${sample}.broadPeak
temp/
logs/
${sample}_used_parameters.txt
Decision Tree
Step 0: Initialize Project
Call:
- •
mcp__project-init-tools__project_init
with:
- •
sample: all - •
task: peak_calling - •
genome: provided by user
The tool will:
- •Create
${sample}_peak_callingdirectory. - •Return the full path of the
${sample}_peak_callingdirectory, which will be used as${proj_dir}.
Step 1. Identify and Classify BAM Files
Command Example
bash
find . -name "*.bam" | sort
- •Treatment BAMs: filenames contain TF names or histone marks (e.g.,
CTCF,H3K27me3,ATAC). - •Control BAMs: filenames contain “input”, “control”, or “IgG”.
- •Prefer filtered BAMs (
*.filtered.bam) if available.
Step 2. Detect Sequencing Type (Single-End or Paired-End)
Command Example
bash
samtools flagstat sample.bam | egrep "properly paired|singletons"
- •If
properly paired > 0→ Paired-end (-f BAMPE) - •If
singletons ≈ total→ Single-end (-f BAM)
Step 3. Detect Experiment Type and Choose Peak Mode
| Detected Pattern | Experiment Type | Peak Type | Parameter Key Options |
|---|---|---|---|
| TF name (CTCF, GATA1, MYC, TP53…) | TF ChIP-seq | Narrow | --call-summits -q 0.01 |
| Active histone marks (H3K4me3, H3K27ac, H3K9ac) | Histone (sharp) | Narrow | --call-summits -q 0.05 |
| Broad histone marks (H3K27me3, H3K9me3, H3K36me3) | Histone (broad) | Broad | --broad --broad-cutoff 0.1 -q 0.05 |
| H3K4me1 | Intermediate | Narrow | --call-summits -q 0.05 (optional --broad) |
| ATAC | ATAC-seq | Narrow | --nomodel --shift -100 --extsize 200 -q 0.05 |
Step 4. Execute MACS3 with Auto Parameters
Call:
- •mcp__macs2-tools__run_macs2
with:
- •
treatment_file: Path to treatment BAM file. - •
control_file: Path to control/input BAM file. Required for ChIP-seq data. Prompt the user for the required file if not provided. - •
genome_size: Always provided by user. - •
name: Experiment name (prefix for output files). - •
out_dir: ${proj_dir}/peaks - •
broad: If True, call broad peaks (for histone marks). - •
broad_cutoff: Cutoff for broad region calling. - •
qvalue: Q-value cutoff for peak detection. Prompt the user for the q value cutoff. - •
format: use BAMPE for pair-end data, BAM for single-end data. - •
nomodel: True for ATAC-seq, False for ChIP-seq. - •
shift: Shift size in bp (e.g., -100 for ATAC-seq). - •
extsize:"Extension size in bp (e.g., 200 for ATAC-seq).
Step 5. Generate Parameter Log File
After auto-selection, the skill writes a log file:
Example content:
code
Genome detected: Experiment type: H3K27me3 (broad histone) Sequencing type: paired-end Control used: input_control.bam MACS3 mode: --broad --broad-cutoff 0.1 -q 0.05 Reasoning: - Broad mark (H3K27me3) requires domain-level detection - Control detected and applied - Genome identified as <*>; using -g <*> - Paired-end library; use -f BAMPE