Pathway Enrichment Analysis

Name: gcell-pathway
Rating: 92
Author: GET-Foundation

Quick Enrichment with gprofiler

python

from gcell.ontology.pathway import gprofiler_enrichment

# Basic enrichment analysis
gene_list = ['TP53', 'BRCA1', 'MYC', 'EGFR', 'KRAS']
results = gprofiler_enrichment(gene_list, organism='hsapiens')

# Specify data sources
results = gprofiler_enrichment(
    gene_list,
    organism='hsapiens',
    sources=['GO:BP', 'GO:MF', 'GO:CC', 'KEGG', 'REAC']
)

# Sources available:
# - GO:BP (Biological Process)
# - GO:MF (Molecular Function)
# - GO:CC (Cellular Component)
# - KEGG (KEGG pathways)
# - REAC (Reactome)
# - WP (WikiPathways)
# - TF (Transcription factors)
# - MIRNA (microRNA targets)
# - HPA (Human Protein Atlas)
# - CORUM (Protein complexes)
# - HP (Human Phenotype Ontology)

Working with Results

python

# Results is a pandas DataFrame
print(results.columns)
# ['source', 'term_id', 'term_name', 'p_value', 'significant',
#  'term_size', 'query_size', 'intersection_size', 'intersections']

# Filter significant results
significant = results[results['p_value'] < 0.05]

# Sort by p-value
top_terms = results.sort_values('p_value').head(20)

# Get genes in each term
for _, row in top_terms.iterrows():
    print(f"{row['term_name']}: {row['intersections']}")

Mouse and Other Organisms

python

# Mouse
results = gprofiler_enrichment(gene_list, organism='mmusculus')

# Rat
results = gprofiler_enrichment(gene_list, organism='rnorvegicus')

# Other organisms: use Ensembl species codes

Custom Pathways from GMT Files

python

from gcell.ontology.pathway import Pathways

# Load custom gene sets from GMT file
pathways = Pathways.from_gmt('custom_pathways.gmt')

# Run enrichment against custom pathways
background_genes = [...]  # All expressed genes
enriched = pathways.enrichment(gene_list, background_genes)

Key Functions and Classes

Name	Purpose
`gprofiler_enrichment()`	Quick enrichment via g:Profiler
`Pathways`	Custom pathway collections
`Pathways.from_gmt()`	Load GMT format gene sets
`Pathways.enrichment()`	Run enrichment analysis

Tips

•Always use appropriate background genes when possible
•Multiple testing correction is applied automatically
•Use specific sources (e.g., just 'GO:BP') to reduce multiple testing burden
•Gene symbols should match the organism (human: HUGO symbols)