Replicates Incorporation Skill
Overview
This skill provides two modes for replicates incorporation:
- •Refer to the Inputs & Outputs section to check inputs and build the output architecture. All the output file should located in
${proj_dir}in Step 0. - •Always use filtered BAM file (
*.filtered.bam) if available. - •Always prompt user for whether generate psedo-replicates if more then 2 replicates.
- •Pre-Peak Calling (BAM Mode): If provided with >2 biological replicates, it merges all BAMs, generate the merge BAM file to prepare for track generation and splits them into 2 balanced "pseudo-replicates" to prepare for peak calling only if user required.
- •Post-Peak Calling (Peak Mode): If provided with peak files (only support two replicates, derived from either 2 true replicates or 2 pseudo-replicates), it performs IDR (Irreproducible Discovery Rate) analysis, filters non-reproducible peaks, and generates a final "conservative" or "optimal" consensus peak set
Decision Tree
Step 0: Initialize Project
Call:
- •
mcp__project-init-tools__project_init
with:
- •
sample: all - •
task: rep_merge
The tool will:
- •Create
all_rep_mergedirectory. - •Return the full path of the
all_rep_mergedirectory, which will be used as${proj_dir}
Pre-Peak Calling (BAM Mode)
Call:
- •
mcp__bw_tools__pool_bamswith: - •
bam_files:[${rep1_bam}, ${rep2_bam}, ${rep3_bam}](Add as many as needed) - •
output_bam:${proj_dir}/temp/${sample}.pooled.bam
Call: (call this only when more than two replicates are provided and user prompt for generating pseudo replicates)
- •
mcp__bw_tools__split_pseudo_replicateswith: bam_file:${proj_dir}/temp/${sample}.pooled.bamoutput_rep1:${proj_dir}/temp/${sample}.pseudo1.bamoutput_rep2:${proj_dir}/temp/${sample}.pseudo2.bam
Post-Peak Calling (Peak Mode)
A. Narrow Peaks / ATAC (IDR) Use this to combine reproducible peaks. You should ideally run IDR on:
- •True Replicates
- •Pseudo-Replicates
Call:
- •
mcp__bw_tools__filter_idr_peakswith: - •
peak_file_a: Path to Replicate 1 narrowPeak file. - •
peak_file_b: Path to Replicate 2 narrowPeak file. - •
output_optimal:${proj_dir}/peaks/${sample}.idr.narrowPeaks - •
output_raw_idr:${proj_dir}/temp/${sample}_idr_results.tsv - •
input_file_type: narrowPeak - •
rank_measure: q.value
B. Broad Peaks (Consensus) Call:
- •
mcp__bw_tools__merge_consensus_peakswith:peak_file_a: Path to Replicate 1 broadPeak file.peak_file_b: Path to Replicate 2 broadPeak file.output_peak:${proj_dir}/peaks/${sample}.consensus.broadPeaksoverlap_fraction: 0.5
Best Practices
- •Use pooled tracks for visualization and differential analysis.
- •Keep individual replicate tracks for QC and reproducibility evaluation.
- •Use IDR ≤ 0.05 for reproducible narrow ChIP-seq peaks and ATAC-seq.
- •**Use overlap ≥50% ** for broad histone mark peaks.