Bulk RNA-seq deconvolution with Bulk2Single
Overview
Use this skill when a user wants to reconstruct single-cell profiles from bulk RNA-seq together with a matched reference scRNA-seq atlas. It follows t_bulk2single.ipynb, which demonstrates how to harmonise PDAC bulk replicates, train the beta-VAE generator, and benchmark the output cells against dentate gyrus scRNA-seq.
Instructions
- •Load libraries and data
- •Import
omicverse as ov,scanpy as sc,scvelo as scv,anndata, andmatplotlib.pyplot as plt, then callov.plot_set()to match omicverse styling. - •Read the bulk counts table with
ov.read(...)/ov.utils.read(...)and harmonise gene identifiers viaov.bulk.Matrix_ID_mapping(<df>, 'genesets/pair_GRCm39.tsv'). - •Load the reference scRNA-seq AnnData (e.g.,
scv.datasets.dentategyrus()) and confirm the cluster labels (stored inadata.obs['clusters']).
- •Import
- •Initialise the Bulk2Single model
- •Instantiate
ov.bulk2single.Bulk2Single(bulk_data=bulk_df, single_data=adata, celltype_key='clusters', bulk_group=['dg_d_1', 'dg_d_2', 'dg_d_3'], top_marker_num=200, ratio_num=1, gpu=0). - •Explain GPU selection (
gpu=-1forces CPU) and howbulk_groupnames align with column IDs in the bulk matrix.
- •Instantiate
- •Estimate cell fractions
- •Call
model.predicted_fraction()to run the integrated TAPE estimator, then plot stacked bar charts per sample to validate proportions. - •Encourage saving the fraction table for downstream reporting (
df.to_csv(...)).
- •Call
- •Preprocess for beta-VAE
- •Execute
model.bulk_preprocess_lazy(),model.single_preprocess_lazy(), andmodel.prepare_input()to produce matched feature spaces. - •Clarify that the lazy preprocessing expects raw counts; skip if the user has already log-normalised data and instead provide aligned matrices manually.
- •Execute
- •Train or load the beta-VAE
- •Train with
model.train(batch_size=512, learning_rate=1e-4, hidden_size=256, epoch_num=3500, vae_save_dir='...', vae_save_name='dg_vae', generate_save_dir='...', generate_save_name='dg'). - •Mention early stopping via
patienceand how to resume by reloading weights withmodel.load('.../dg_vae.pth'). - •Use
model.plot_loss()to monitor convergence.
- •Train with
- •Generate and filter synthetic cells
- •Produce an AnnData using
model.generate()and reduce noise throughmodel.filtered(generate_adata, leiden_size=25). - •Store the filtered AnnData (
.write_h5ad) for reuse, noting it contains PCA embeddings inobsm['X_pca'].
- •Produce an AnnData using
- •Benchmark against the reference atlas
- •Plot cell-type compositions with
ov.bulk2single.bulk2single_plot_cellprop(...)for both generated and reference data. - •Assess correlation using
ov.bulk2single.bulk2single_plot_correlation(single_data, generate_adata, celltype_key='clusters'). - •Embed with
generate_adata.obsm['X_mde'] = ov.utils.mde(generate_adata.obsm['X_pca'])and visualise viaov.utils.embedding(..., color=['clusters'], palette=ov.utils.pyomic_palette()).
- •Plot cell-type compositions with
- •Troubleshooting tips
- •If marker selection fails, increase
top_marker_numor provide a curated marker list. - •Alignment errors typically stem from mismatched
bulk_groupnames—double-check column IDs in the bulk matrix. - •Training on CPU can take several hours; advise switching
gputo an available CUDA device for speed.
- •If marker selection fails, increase
Examples
- •"Estimate cell fractions for PDAC bulk replicates and generate synthetic scRNA-seq using Bulk2Single."
- •"Load a pre-trained Bulk2Single model, regenerate cells, and compare cluster proportions to the dentate gyrus atlas."
- •"Plot correlation heatmaps between generated cells and reference clusters after filtering noisy synthetic cells."
References
- •Tutorial notebook:
t_bulk2single.ipynb - •Example data and weights:
omicverse_guide/docs/Tutorials-bulk2single/data/ - •Quick copy/paste commands:
reference.md