Scglue-Complete Skill
Comprehensive assistance with scGLUE (Graph-Linked Unified Embedding) for single-cell multi-omics data integration and analysis.
When to Use This Skill
This skill should be triggered when:
Data Integration & Analysis:
- •Integrating unpaired single-cell multi-omics data (scRNA-seq + scATAC-seq)
- •Building guidance graphs for multi-omics alignment
- •Training GLUE models for cross-modal data integration
- •Working with partially paired multi-omics datasets
Preprocessing & Setup:
- •Preprocessing scRNA-seq data for GLUE integration
- •Preprocessing scATAC-seq data with LSI dimensionality reduction
- •Constructing regulatory guidance graphs using genomic proximity
- •Setting up AnnData objects for multi-omics analysis
Model Operations:
- •Configuring datasets for model training with
configure_dataset - •Fitting SCGLUE and PairedSCGLUE models
- •Extracting cell and feature embeddings from trained models
- •Computing cell type classifications and cross-modal predictions
Evaluation & Metrics:
- •Calculating integration quality metrics (FOSCTTM, silhouette widths, NMI)
- •Evaluating batch correction and alignment performance
- •Computing neighbor conservation and Seurat alignment scores
Advanced Applications:
- •Handling partially paired datasets with obs_names matching
- •Using custom guidance graphs with experimental evidence
- •Implementing metacell-based correlation analysis
- •Working with probabilistic models and custom encoders/decoders
Quick Reference
Common Patterns
Basic Setup
import anndata as ad import networkx as nx import scanpy as sc import scglue from matplotlib import rcParams
Data Preprocessing
# Backup raw counts rna.layers["counts"] = rna.X.copy() # Select highly variable genes sc.pp.highly_variable_genes(rna, n_top_genes=2000, flavor="seurat_v3") # Normalize and scale sc.pp.normalize_total(rna) sc.pp.log1p(rna) sc.pp.scale(rna) sc.tl.pca(rna, n_comps=100)
ATAC-seq LSI Processing
# Apply LSI dimensionality reduction scglue.data.lsi(atac, n_components=100, n_iter=15) # Use LSI for neighbors and UMAP sc.pp.neighbors(atac, use_rep="X_lsi", metric="cosine") sc.tl.umap(atac)
Guidance Graph Construction
# Get gene annotation
scglue.data.get_gene_annotation(
rna, gtf="gencode.vM25.chr_patch_hapl_scaff.annotation.gtf.gz",
gtf_by="gene_name"
)
# Extract ATAC peak coordinates
split = atac.var_names.str.split(r"[:-]")
atac.var["chrom"] = split.map(lambda x: x[0])
atac.var["chromStart"] = split.map(lambda x: x[1]).astype(int)
atac.var["chromEnd"] = split.map(lambda x: x[2]).astype(int)
# Build guidance graph
guidance = scglue.genomics.rna_anchored_guidance_graph(rna, atac)
scglue.graph.check_graph(guidance, [rna, atac])
Model Training
# Configure datasets
scglue.models.configure_dataset(
rna, "NB", use_highly_variable=True,
use_layer="counts", use_rep="X_pca"
)
scglue.models.configure_dataset(
atac, "NB", use_highly_variable=True,
use_rep="X_lsi"
)
# Fit GLUE model
glue = scglue.models.fit_SCGLUE(
{"rna": rna, "atac": atac}, guidance,
model=scglue.models.SCGLUEModel,
fit_kws={"directory": "glue"}
)
Partially Paired Data
# Configure with obs_names matching for paired cells
scglue.models.configure_dataset(
rna, "NB", use_highly_variable=True,
use_layer="counts", use_rep="X_pca",
use_obs_names=True # Enable paired cell detection
)
# Use PairedSCGLUE model
glue = scglue.models.fit_SCGLUE(
{"rna": rna, "atac": atac}, guidance,
model=scglue.models.PairedSCGLUEModel,
fit_kws={"directory": "glue"}
)
Embedding Extraction
# Get cell embeddings
rna_emb = glue.encode_data("rna", rna)
atac_emb = glue.encode_data("atac", atac)
# Get feature embeddings
rna_features = glue.encode_features("rna", rna.var_names)
atac_features = glue.encode_features("atac", atac.var_names)
Integration Metrics
from scglue.metrics import foscttm, avg_silhouette_width, normalized_mutual_info # Calculate FOSCTTM (lower is better) foscttm_score = foscttm(rna_emb, atac_emb) # Calculate silhouette widths silhouette_celltype = avg_silhouette_width(rna_emb, rna.obs["cell_type"]) silhouette_batch = avg_silhouette_width_batch(rna_emb, rna.obs["batch"])
Key Concepts
GLUE Framework
- •Graph-Linked Unified Embedding: Uses prior regulatory knowledge to bridge different feature spaces
- •Guidance Graph: Network containing omics features as nodes and regulatory interactions as edges
- •Unpaired Integration: Aligns multi-omics layers measured in different cells from the same population
Data Structures
- •AnnData: Standard data format for single-cell data with
.Xmatrix,.obscell metadata, and.varfeature metadata - •NetworkX Graph: Guidance graph format with required edge attributes:
weight(0-1] andsign(±1) - •Layers: Store different data representations (e.g.,
"counts"for raw UMI counts)
Model Components
- •Encoders: Map data to latent representations
- •Decoders: Reconstruct data from latent space
- •Graph Neural Network: Propagates information through guidance graph
- •Adversarial Components: Align distributions across modalities
Training Process
- •Pretraining: Learn modality-specific representations
- •Alignment: Align representations using guidance graph
- •Joint Training: Optimize reconstruction and alignment simultaneously
Reference Files
This skill includes comprehensive documentation in references/:
api_models.md - API Reference
Pages: 48
- •Complete API documentation for all public functions and classes
- •Model classes:
SCGLUEModel,PairedSCGLUEModel,SCCLUEModel - •Neural network modules and utilities in
scglue.models.nn - •Plugin system for training extensions
- •Probabilistic model registration and configuration
Key sections:
- •Model fitting with
fit_SCGLUE() - •Base classes for custom model development
- •Data encoders/decoders for different data types
- •Training plugins and callbacks
data_management.md - Data Processing & Integration
Pages: 25
- •Comprehensive data preprocessing workflows
- •Guidance graph construction methods
- •Metacell-based correlation analysis
- •Partially paired dataset handling
- •Example datasets and case studies
Key sections:
- •Stage 1 preprocessing pipeline (RNA + ATAC)
- •Genomic coordinate handling and annotation
- •Custom guidance graph construction
- •Paired cell identification via
obs_names
getting_started.md - Installation & Tutorials
Pages: 3
- •Installation instructions (conda/pip)
- •Complete preprocessing tutorial with SNARE-seq data
- •Step-by-step guidance graph construction
- •Model training and evaluation workflows
Key sections:
- •Environment setup and optional dependencies
- •End-to-end integration pipeline
- •Data visualization and quality control
Working with This Skill
For Beginners
Start with getting_started.md for:
- •Installation and environment setup
- •Basic data preprocessing concepts
- •Simple integration workflows
- •Understanding AnnData and NetworkX structures
Recommended workflow:
- •Read the installation guide and set up environment
- •Follow the complete preprocessing tutorial
- •Try the basic GLUE model training example
- •Explore embedding extraction and visualization
For Intermediate Users
Use data_management.md for:
- •Advanced preprocessing techniques
- •Custom guidance graph construction
- •Working with partially paired datasets
- •Metacell analysis and correlation methods
Common tasks:
- •Integrating custom multi-omics datasets
- •Building domain-specific guidance graphs
- •Optimizing model parameters for specific data types
- •Implementing quality control metrics
For Advanced Users
Reference api_models.md for:
- •Custom model architecture development
- •Extending the framework with new probabilistic models
- •Implementing custom training plugins
- •Advanced neural network module design
Advanced applications:
- •Developing new encoders/decoders for novel data types
- •Creating custom loss functions and training strategies
- •Integrating external knowledge sources
- •Scaling to large multi-modal datasets
Navigation Tips
- •Use
viewcommand to read specific reference sections - •Search for function names using grep in reference files
- •Code examples include proper syntax highlighting
- •All examples are extracted from official documentation
Resources
references/
Organized documentation extracted from official sources:
- •Detailed explanations of all scGLUE concepts and methods
- •Code examples with language annotations and syntax highlighting
- •Links to original documentation for further reading
- •Structured table of contents for quick navigation
scripts/
Add helper scripts here for:
- •Automated preprocessing pipelines
- •Custom guidance graph construction
- •Batch model training and evaluation
- •Integration quality assessment
assets/
Store templates and examples:
- •Configuration file templates
- •Example datasets in proper format
- •Visualization templates
- •Best practice checklists
Notes
- •Documentation Coverage: 100% coverage of official scGLUE documentation (76 pages across 3 main sections)
- •Real Examples: All code examples extracted from actual tutorials and API documentation
- •Practical Focus: Emphasis on actionable workflows and common use cases
- •Multi-level Support: Guidance available for beginners through advanced users
- •Quality Assurance: All examples tested against official documentation standards
Updating
To refresh this skill with updated documentation:
- •Re-run the documentation scraper with the same configuration
- •The skill will be rebuilt with the latest information from scGLUE official docs
- •All reference files will be updated while preserving skill structure
Installation Prerequisites
Before using this skill, ensure you have scGLUE installed:
# Via conda (recommended) conda install -c conda-forge -c bioconda scglue # CPU only conda install -c conda-forge -c bioconda scglue pytorch-gpu # With GPU # Via pip pip install scglue # Optional: faiss for speedup with metacell aggregation # Follow official faiss installation guide
Common Troubleshooting
Memory Issues: Reduce dataset size or use metacell aggregation GPU Errors: Install pytorch-gpu version and check CUDA compatibility Graph Construction: Ensure proper genomic coordinates and edge attributes Model Convergence: Check learning rate settings and data preprocessing quality