Functional Enrichment (HOMER + R)
Overview
- •Validate input: Accept BED/peak files with genomic coordinates or gene lists; check format and genome assembly.
- •Map regions to genes: Convert regions to a unique gene set using HOMER
annotatePeaks.pl. - •Run GO enrichment: Use HOMER
findGO.pl(orannotatePeaks.pl -go) for BP/MF/CC. - •Run KEGG enrichment: Use HOMER
findGO.pl -kegg(orannotatePeaks.pl -kegg). - •Collect outputs: Save tidy tables for downstream plotting and a compact summary of top terms.
- •Visualize in R: Create barplots and dotplots (GO/KEGG) with
ggplot2from standardized outputs. - •QC & troubleshooting: Provide checks for genome mismatch, chromosome naming, and low-signal inputs.
Inputs & Outputs
Inputs (choose one):
Option 1: Input is a genomic region file (BED/narrowPeak/broadPeak)
Genomic region formats supported:
- •BED files: Standard genomic interval format
- •narrowPeak: narrow peak format
- •broadPeak: broad peak format
Option 2: Input is a gene list (txt)
- •
gene_list.txtwith one official gene symbol per line (no header). And an optionalgene_list_background.txtwith one official gene symbol per line (no header).
Outputs (directory layout):
${sample}_functional_enrichment/
results/
${sample}.anno_genomic_features.txt
${sample}.anno_genomic_features_stats.txt
biological_process.txt
cellular_component.txt
molecular_function.txt
kegg.txt
biocyc.txt
chromosome.txt
cosmic.txt
interactions.txt
interpro.txt
gene3d.txt
pathwayInteractionDB.txt
pfam.txt
prints.txt
prosite.txt
reactome.txt
smpdb.txt
wikipathways.txt
gwas.txt
lipidmaps.txt
msigdb.txt
smart.txt
tables/
${sample}.gene_list.txt
go_bp.tsv
go_mf.tsv
go_cc.tsv
kegg.tsv
logs/
${sample}.anno_genomic_features.log # if genome region file is provided
findGO.log
Decision Tree
Step 0 — Gather Required Information from the User
Before calling any tool, ask the user:
- •Sample name (
sample): used as prefix and for the output directory${sample}_functional_enrichment. - •Genome assembly (
genome): e.g.hg38,mm10,danRer11.- •Never guess or auto-detect.
Step 1: Initialize Project
- •Make director for this project:
Call:
- •
mcp__project-init-tools__project_init
with:
- •
sample: the user-provided sample name - •
task: de_novo_motif_discovery
The tool will:
- •Create
${sample}_functional_enrichmentdirectory. - •Get the full path of the
${sample}_functional_enrichmentdirectory, which will be used as${proj_dir}.
Step 2: Prepare genome file for homer
Call:
- •
mcp__homer-tools__check_genome_installation
With:
- •
genome: the user-provided genome assembly, e.g.hg38,mm10,danRer11
The tool will:
- •Check if the genome is installed in HOMER.
- •If not, install the genome.
Step 3 (Optional): Standardize chromosome names for BED files
This step is optional. Only perform this step if the input file is a BED file. If the input file is a gene list, skip this step.
From 1 format to chr1 format
From MT format to chrM format
Call:
- •
mcp__file-format-tools__standardize_bed_chrom_names
with:
- •
input_bed: the user-provided BED file - •
output_bed: the path to save the standardized BED file
The tool will:
- •Standardize the chromosome names in the BED file.
- •Return the path of the standardized BED file.
Step 4 (Optional): Convert gene ID to gene symbol
This step is optional. Only perform this step if the input file is a gene list file. If the input file is a BED file, skip this step.
Call:
- •
mcp__mygene-tools__convert_gene_ids_mygene
With:
- •
input_ids_file: the user-provided gene list file. May end with.txt. - •
scopes: the source ID type for mygene (e.g., 'ensembl.gene', 'symbol', 'entrezgene', 'uniprot', or a comma-separated list). - •
fields: the comma-separated target fields to retrieve from mygene (e.g., 'symbol,ensembl.gene,uniprot,entrezgene'). - •
species: the species for mygene (e.g., 'human', 'mouse', 'zebrafish', or NCBI taxon ID like '9606'). - •
out_file: the path to save the converted gene list file. In this skill, it is the full path of the${sample}_functional_enrichmentdirectory returned bymcp__project-init-tools__project_init - •
batch_size: the batch size for mygene.querymany (default 1000).
The tool will:
- •Convert the gene ID to gene symbol.
- •Return the path of the converted gene list file.
Step 5: GO enrichment analysis
Option 1: from genomic regions file
Only if the input file is a BED file. If the input file is a gene list, call tools in Option 2.
- •annotate the genomic regions using Homer's
annotatePeaks.plwith-gooption. If user also provides a background genome region file, like a control peak file, also call this tool for the background genome region file. Use a different${sample}as the sample name for the background sample.
Call:
mcp__homer-tools__annotate_genomic_features
With:
- •
sample: the user-provided sample name - •
proj_dir: directory to save the genomic feature annotation results. In this skill, it is the full path of the${sample}_functional_enrichmentdirectory returned bymcp__project-init-tools__project_init - •
regions_bed: the user-provided regions file in BED format. May end with.bed,.narrowPeak,.broadPeak, etc. - •
genome: the user-provided genome assembly, e.g.hg38,mm10,danRer11 - •
ann: "custom homer annotation file (created by assignGenomeAnnotation.pl), (default: None). - •
size_given: keep original region sizes (default: True) - •
cpg: include CpG information (default: False) - •
go:Trueto perform GO enrichment analysis.
The tool will:
- •Annotate the genomic regions using Homer's
annotatePeaks.pl. - •Return the path of the annotated regions file under
${proj_dir}/results/directory, and the path to the log file under${proj_dir}/logs/directory.- •
${proj_dir}/results/${sample}.anno_genomic_features.txt - •
${proj_dir}/results/${sample}.anno_genomic_features_stats.txt - •
${proj_dir}/logs/${sample}.anno_genomic_features.log
- •
- •(optional) extract the genes from the annotated regions file if neccessary for future analysis or the target gene list is requested by user. If not requested, skip this step.
Call:
mcp__file-format-tools__extract_gene_list
With:
- •
sample: the user-provided sample name - •
proj_dir: directory to save the genomic feature annotation results. In this skill, it is the full path of the${sample}_functional_enrichmentdirectory returned bymcp__project-init-tools__project_init
The tool will:
- •Extract the genes from the annotated regions file.
- •Return the path of the gene list file under
${proj_dir}/tables/directory.- •
${proj_dir}/tables/${sample}.gene_list.txt
- •
Option 2: from gene list file
Only if the input file is a gene list file. If the input file is a BED file, call tools in Option 1.
Call:
mcp__homer-tools__gene_function_enrichment
With:
- •
sample: the user-provided sample name - •
proj_dir: directory to save the GO & KEGG enrichment results. In this skill, it is the full path of the${sample}_functional_enrichmentdirectory returned bymcp__project-init-tools__project_init - •
gene_list_file: the user-provided gene list file. May end with.txt. - •
organism: the user-provided organism name, e.g.human,mouse,zebrafish, etc. - •
background_gene_list_file: the user-provided background gene list file. May end with.txt. If not provided, set this parameter toNone.
The tool will:
- •Find the GO enrichment for the gene list.
- •Return the path of the GO & KEGG enrichment results under
${proj_dir}/results/directory.- •
${proj_dir}/results/biological_process.txt - •
${proj_dir}/results/kegg.txt - •... other GO and KEGG enrichment results files.
- •
- •Return the path of the log file under
${proj_dir}/logs/directory.- •
${proj_dir}/logs/${sample}.find_go_and_kegg_enrichment.log
- •
Alternative direct from BED
annotatePeaks.pl peaks.bed hg38 -go results/{run}/tables/go_dir -genomeOntology
annotatePeaks.pl peaks.bed hg38 -kegg results/{run}/tables/kegg_dir
Notes & Best Practices
- •Genome & naming: Ensure the HOMER genome key matches the species; chromosome naming must be consistent (
chr1vs1). - •BED format: Tab-delimited, ≥3 columns, 0-based coordinates, no header.
- •Multiple testing: Prefer FDR (BH) if provided; otherwise fallback to P-value.
- •Background set:
-bghelps reduce bias; choose a reasonable universe (e.g., all expressed or all accessible regions → genes). - •Direct-from-BED:
annotatePeaks.pl -go/-keggis convenient; the gene-list route yields uniform TSVs for plotting.
Troubleshooting
- •Many NAs after annotation: Check genome version, chromosome naming, BED formatting, and headers.
- •Empty/weak enrichment: Ensure sufficient genes (suggest ≥50), verify species of symbols, tune thresholds or background.
- •Column name drift: HOMER versions may differ; adjust R column mappings if needed.