Motif Analysis
Identify DNA sequence motifs enriched in ChIP-seq or ATAC-seq peaks to discover transcription factor binding sites.
Tool Comparison
| Tool | Strengths | Use Case |
|---|---|---|
| HOMER | Fast, comprehensive, built-in databases | General motif analysis |
| MEME-ChIP | Multiple algorithms, web interface | Publication-quality |
| MEME | De novo discovery only | Simple discovery |
| FIMO | Known motif scanning | Genome-wide scanning |
HOMER
Installation
bash
conda install -c bioconda homer # Configure genome (required once) perl /path/to/homer/configureHomer.pl -install hg38 perl /path/to/homer/configureHomer.pl -install mm10
De Novo Motif Discovery
bash
# Basic motif finding findMotifsGenome.pl peaks.bed hg38 output_dir/ -size 200 # With background regions findMotifsGenome.pl peaks.bed hg38 output_dir/ -size 200 -bg background.bed # Specify motif lengths to search findMotifsGenome.pl peaks.bed hg38 output_dir/ -size 200 -len 8,10,12
Key Options
| Option | Description |
|---|---|
-size <#> | Fragment size for analysis (default 200) |
-size given | Use actual peak sizes |
-bg <file> | Background regions (BED) |
-len <#,#,...> | Motif lengths to search |
-mask | Mask repeats |
-p <#> | Number of CPUs |
-S <#> | Number of motifs to find (default 25) |
-mis <#> | Mismatches allowed (default 2) |
-noweight | Don't adjust for GC content |
Output Files
code
output_dir/ ├── homerResults.html # Main results page ├── knownResults.html # Known motif enrichment ├── homerMotifs.all.motifs # All discovered motifs ├── knownResults.txt # Known motif statistics └── motif1.motif # Individual motif files
Known Motif Enrichment Only
bash
# Skip de novo, only check known motifs findMotifsGenome.pl peaks.bed hg38 output_dir/ -size 200 -nomotif
Scan for Specific Motifs
bash
# Find instances of motif in peaks annotatePeaks.pl peaks.bed hg38 -m motif.motif > annotated.txt # Scan genome for motif occurrences scanMotifGenomeWide.pl motif.motif hg38 > motif_sites.bed
Motif Comparison
bash
# Compare discovered motifs to known database compareMotifs.pl motifs.motif output_dir/ -known
Create Custom Motif
bash
# From consensus sequence seq2profile.pl CACGTG 4 > MYC.motif # From aligned sequences cat aligned_seqs.txt | alignAndConvert.pl - > custom.motif
MEME Suite
Installation
bash
conda install -c bioconda meme
Extract Sequences from Peaks
bash
# Get FASTA sequences under peaks
bedtools getfasta -fi genome.fa -bed peaks.bed -fo peaks.fa
# Center peaks and resize
bedtools slop -i peaks.bed -g genome.sizes -b 100 | \
bedtools getfasta -fi genome.fa -bed - -fo peaks_centered.fa
MEME (De Novo Discovery)
bash
# Basic de novo discovery meme peaks.fa -dna -oc meme_output -mod zoops -nmotifs 10 -minw 6 -maxw 20 # With Markov background fasta-get-markov peaks.fa > background.model meme peaks.fa -dna -oc meme_output -bfile background.model -mod zoops -nmotifs 10
MEME Options
| Option | Description |
|---|---|
-mod zoops | Zero or one per sequence (default for ChIP) |
-mod oops | Exactly one per sequence |
-mod anr | Any number of repeats |
-nmotifs <#> | Number of motifs to find |
-minw <#> | Minimum motif width |
-maxw <#> | Maximum motif width |
-revcomp | Search both strands |
-bfile <file> | Background model file |
MEME-ChIP (Comprehensive Pipeline)
bash
# All-in-one ChIP-seq motif analysis meme-chip -oc meme_chip_output -db motif_database.meme peaks.fa
MEME-ChIP runs:
- •MEME - De novo discovery (central enrichment)
- •DREME - Short motif discovery
- •CentriMo - Central enrichment analysis
- •TOMTOM - Compare to known motifs
- •FIMO - Find motif instances
DREME (Short Motifs)
bash
# Find short enriched motifs dreme -oc dreme_output -p peaks.fa -n background.fa
CentriMo (Central Enrichment)
bash
# Test for central enrichment of known motifs centrimo -oc centrimo_output peaks.fa motif_database.meme
TOMTOM (Motif Comparison)
bash
# Compare discovered motifs to database tomtom -oc tomtom_output discovered.meme database.meme
FIMO (Motif Scanning)
bash
# Scan sequences for motif matches fimo --oc fimo_output motif.meme sequences.fa # Scan genome fimo --oc fimo_output --max-stored-scores 1000000 motif.meme genome.fa
Motif Databases
HOMER Built-in
bash
# List available motif sets ls /path/to/homer/data/knownTFs/ # Vertebrate, known motifs (default) findMotifsGenome.pl peaks.bed hg38 output/ -mknown vertebrates/known.motifs
JASPAR
bash
# Download JASPAR motifs wget https://jaspar.genereg.net/download/data/2024/CORE/JASPAR2024_CORE_vertebrates_non-redundant_pfms_meme.txt # Use with MEME suite meme-chip -db JASPAR2024_CORE_vertebrates_non-redundant_pfms_meme.txt peaks.fa
HOCOMOCO
bash
# Download HOCOMOCO wget https://hocomoco11.autosome.org/final_bundle/hocomoco11/core/HUMAN/mono/HOCOMOCOv11_core_HUMAN_mono_meme_format.meme # Use with MEME suite tomtom discovered.meme HOCOMOCOv11_core_HUMAN_mono_meme_format.meme
Python: Parse HOMER Results
python
import pandas as pd
def parse_homer_known(results_file):
'''Parse HOMER knownResults.txt.'''
df = pd.read_csv(results_file, sep='\t')
df.columns = ['Motif', 'Consensus', 'P-value', 'Log P-value',
'q-value', 'Targets', 'Target%', 'Background', 'Background%']
df['P-value'] = df['P-value'].astype(float)
return df.sort_values('P-value')
known = parse_homer_known('output_dir/knownResults.txt')
print(known[['Motif', 'P-value', 'Target%']].head(20))
Python: Parse MEME Results
python
from Bio import motifs
def parse_meme_file(meme_file):
'''Parse MEME output file.'''
with open(meme_file) as f:
record = motifs.parse(f, 'meme')
return record
record = parse_meme_file('meme_output/meme.txt')
for m in record:
print(f'{m.name}: {m.consensus}')
print(m.counts)
Complete Workflows
ChIP-seq Motif Analysis
bash
#!/bin/bash
set -euo pipefail
PEAKS=$1 # narrowPeak or BED file
GENOME=$2 # hg38, mm10, etc.
OUTDIR=$3
mkdir -p $OUTDIR
# HOMER analysis
echo "Running HOMER..."
findMotifsGenome.pl $PEAKS $GENOME ${OUTDIR}/homer \
-size 200 -p 8 -mask
# Extract sequences for MEME
echo "Extracting sequences..."
bedtools slop -i $PEAKS -g ${GENOME}.chrom.sizes -b 0 | \
awk 'BEGIN{OFS="\t"} {center=int(($2+$3)/2); print $1,center-100,center+100}' | \
bedtools getfasta -fi ${GENOME}.fa -bed - -fo ${OUTDIR}/peaks.fa
# MEME-ChIP analysis
echo "Running MEME-ChIP..."
meme-chip -oc ${OUTDIR}/meme_chip \
-db /path/to/JASPAR.meme \
${OUTDIR}/peaks.fa
echo "Done. Results in ${OUTDIR}/"
ATAC-seq Footprint Motifs
bash
# Analyze motifs in footprint regions
findMotifsGenome.pl footprints.bed hg38 footprint_motifs/ \
-size given -mask -p 8
# Compare to accessible regions background
findMotifsGenome.pl footprints.bed hg38 footprint_motifs/ \
-size given -bg accessible_peaks.bed -mask -p 8
Visualization
HOMER Logo
bash
# Generate sequence logo motif2Logo.pl motif.motif > logo.eps
Plot with Python
python
import logomaker
import pandas as pd
import matplotlib.pyplot as plt
def plot_motif(pwm_file):
'''Plot sequence logo from HOMER PWM.'''
pwm = pd.read_csv(pwm_file, sep='\t', skiprows=1, header=None)
pwm.columns = ['A', 'C', 'G', 'T']
logo = logomaker.Logo(pwm, shade_below=0.5, fade_below=0.5)
plt.show()
Quality Metrics
| Metric | Good | Concerning |
|---|---|---|
| P-value | < 1e-10 | > 1e-5 |
| Target % | > 20% | < 5% |
| Background % | < Target/2 | Similar to Target |
| Bit score | > 10 | < 5 |
Common Issues
No Significant Motifs
- •Check peak quality (too few peaks?)
- •Try different peak sizes (
-size) - •Ensure genome build matches
- •Check for repeat masking issues
Too Many Motifs
- •Increase significance threshold
- •Use
-Sto limit number of motifs - •Filter by target percentage
Wrong Background
- •Use matched GC content background
- •Consider using input/control peaks
- •Try shuffled sequences
Related Skills
- •peak-calling - Generate input peaks
- •peak-annotation - Annotate peaks with genes
- •atac-seq/footprinting - TF footprint analysis
- •genome-intervals - BED file operations