Bulk RNA-seq differential expression with omicverse
Overview
Follow this skill to run the end-to-end differential expression (DEG) workflow showcased in t_deg.ipynb. It assumes the user provides a raw gene-level count matrix (e.g., from featureCounts) and wants to analyse bulk RNA-seq cohorts inside omicverse.
Instructions
- •Set up the session
- •Import
omicverse as ov,scanpy as sc, andmatplotlib.pyplot as plt. - •Call
ov.plot_set()so downstream plots adopt omicverse styling.
- •Import
- •Prepare ID mapping assets
- •When gene IDs must be converted to gene symbols, instruct the user to download mapping pairs via
ov.utils.download_geneid_annotation_pair()and store them undergenesets/. - •Mention the available prebuilt genomes (T2T-CHM13, GRCh38, GRCh37, GRCm39, danRer7, danRer11) and that users can generate their own mapping from GTF files if needed.
- •When gene IDs must be converted to gene symbols, instruct the user to download mapping pairs via
- •Load the raw counts
- •Read tab-delimited featureCounts output with
ov.pd.read_csv(..., sep='\t', header=1, index_col=0). - •Strip trailing
.bamsegments from column names using list comprehension so sample IDs are clean.
- •Read tab-delimited featureCounts output with
- •Map gene identifiers
- •Run
ov.bulk.Matrix_ID_mapping(counts_df, 'genesets/pair_<GENOME>.tsv')to replacegene_identries with gene symbols.
- •Run
- •Initialise the DEG object
- •Create
dds = ov.bulk.pyDEG(mapped_counts). - •Handle duplicate gene symbols with
dds.drop_duplicates_index()to keep the highest expressed version.
- •Create
- •Normalise and estimate size factors
- •Execute
dds.normalize()to calculate DESeq2 size factors, correcting for library size and batch differences.
- •Execute
- •Run differential testing
- •Collect treatment and control replicate labels into lists.
- •Call
dds.deg_analysis(treatment_groups, control_groups, method='ttest')for the default Welch t-test. - •Offer optional alternatives:
method='edgepy'for edgeR-like tests andmethod='limma'for limma-style modelling.
- •Filter and threshold results
- •Note that lowly expressed genes are retained by default; filter using
dds.result.loc[dds.result['log2(BaseMean)'] > 1]when needed. - •Set dynamic fold-change and significance cutoffs via
dds.foldchange_set(fc_threshold=-1, pval_threshold=0.05, logp_max=6)(fc_threshold=-1auto-selects based on log2FC distribution).
- •Note that lowly expressed genes are retained by default; filter using
- •Visualise differential expression
- •Produce volcano plots with
dds.plot_volcano(title=..., figsize=..., plot_genes=... or plot_genes_num=...)to highlight key genes. - •Generate per-gene boxplots using
dds.plot_boxplot(genes=[...], treatment_groups=..., control_groups=..., figsize=..., legend_bbox=...); adjust y-axis tick labels if required.
- •Produce volcano plots with
- •Perform pathway enrichment (optional)
- •Download curated pathway libraries through
ov.utils.download_pathway_database(). - •Load genesets with
ov.utils.geneset_prepare(<path>, organism='Mouse'|'Human'|...). - •Build the DEG gene list from
dds.result.loc[dds.result['sig'] != 'normal'].index. - •Run enrichment with
ov.bulk.geneset_enrichment(gene_list=deg_genes, pathways_dict=..., pvalue_type='auto', organism=...). Encourage users without internet access to provide abackgroundgene list. - •Visualise single-library results via
ov.bulk.geneset_plot(...)and combine multiple ontologies usingov.bulk.geneset_plot_multi(enr_dict, colors_dict, num=...).
- •Download curated pathway libraries through
- •Document outputs
- •Suggest exporting
dds.resultand enrichment tables to CSV for downstream reporting. - •Encourage users to save figures generated by matplotlib (
plt.savefig(...)) when running outside notebooks.
- •Suggest exporting
- •Troubleshooting tips
- •Ensure sample labels in
treatment_groups/control_groupsexactly match column names post-cleanup. - •Verify required packages (
omicverse,pyComplexHeatmap,gseapy) are installed for enrichment visualisations. - •Remind users that internet access is required the first time they download gene mappings or pathway databases.
- •Ensure sample labels in
Examples
- •"I have a featureCounts matrix for mouse tumour samples—normalize it with DESeq2, run t-test DEG, and highlight the top 8 genes in a volcano plot."
- •"Use omicverse to compute edgeR-style differential expression between treated and control replicates, then run GO enrichment on significant genes."
- •"Guide me through converting Ensembl IDs to symbols, performing limma DEG, and plotting boxplots for Krtap9-5 and Lef1."
References
- •Detailed walkthrough notebook:
t_deg.ipynb - •Sample count matrix for testing:
sample/counts.txt - •Quick copy/paste commands:
reference.md