Monocle3-Truly-Complete Skill

Comprehensive assistance with Monocle 3 for single-cell trajectory analysis, including co-embedding, projection, and advanced visualization techniques.

When to Use This Skill

This skill should be triggered when:

Data Analysis & Processing

•Loading and preprocessing single-cell data - Working with CellDataSet objects, UMI filtering, size factor estimation
•Co-embedding multiple datasets - Combining reference and query datasets for comparative analysis
•Projecting query data onto reference - Using transform models to map new data into existing reference space
•Cell type label transfer - Transferring annotations from reference to query cells using nearest neighbor indexing

Installation & Setup

•Installing Monocle 3 - Setting up R environment, Bioconductor dependencies, GitHub installation
•Troubleshooting installation issues - Resolving gdal, Xcode, gfortran, or reticulate errors
•Testing installation - Verifying that Monocle 3 is properly installed and functional

Visualization & Analysis

•Creating trajectory plots - Generating 2D/3D UMAP visualizations with cell type annotations
•Comparing datasets - Visualizing combined reference and query datasets
•Interactive plotting - Working with plotly for 3D trajectory visualizations

Quick Reference

Essential Code Examples

Example 1: Basic Installation

# Install Bioconductor
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install(version = "3.21")

# Install Monocle 3
devtools::install_github('cole-trapnell-lab/monocle3')

Example 2: Loading Reference and Query Datasets

library(monocle3)
library(Matrix)

# Load reference dataset
cds_ref <- new_cell_data_set(matrix_ref,
                             cell_metadata = cell_ann_ref,
                             gene_metadata = gene_ann_ref)

# Load query dataset
cds_qry <- new_cell_data_set(matrix_qry,
                             cell_metadata = cell_ann_qry,
                             gene_metadata = gene_ann_qry)

Example 3: Gene Filtering and UMI Cutoffs

# Find shared genes
genes_shared <- intersect(row.names(cds_ref), row.names(cds_qry))

# Keep only shared genes
cds_ref <- cds_ref[genes_shared,]
cds_qry <- cds_qry[genes_shared,]

# Apply UMI cutoffs (example: 1000)
cds_ref <- cds_ref[, colData(cds_ref)[['Total_mRNAs']] >= 1000]
cds_qry <- cds_qry[, colData(cds_qry)[['n.umi']] >= 1000]

Example 4: Processing Reference Dataset

# Estimate size factors
cds_ref <- estimate_size_factors(cds_ref)
cds_qry <- estimate_size_factors(cds_qry)

# Process reference with PCA and UMAP
cds_ref <- preprocess_cds(cds_ref, num_dim=100)
cds_ref <- reduce_dimension(cds_ref, build_nn_index=TRUE)

# Save transform models for projection
save_transform_models(cds_ref, 'cds_ref_test_models')

Example 5: Project Query Data into Reference Space

# Load reference transform models
cds_qry <- load_transform_models(cds_qry, 'cds_ref_test_models')

# Apply transformations to query data
cds_qry <- preprocess_transform(cds_qry)
cds_qry <- reduce_dimension_transform(cds_qry)

Example 6: Cell Type Label Transfer

# Transfer cell type labels from reference to query
cds_qry <- transfer_cell_labels(cds_qry,
                                reduction_method='UMAP',
                                ref_coldata=colData(cds_ref),
                                ref_column_name='Main_cell_type',
                                query_column_name='cell_type_xfr',
                                transform_models_dir='cds_ref_test_models')

# Fix any missing labels
cds_qry <- fix_missing_cell_labels(cds_qry,
                                   reduction_method='UMAP',
                                   from_column_name='cell_type_xfr',
                                   to_column_name='cell_type_fix')

Example 7: Combining and Visualizing Datasets

# Label datasets for visualization
colData(cds_ref)[['data_set']] <- 'reference'
colData(cds_qry)[['data_set']] <- 'query'

# Combine datasets
cds_combined <- combine_cds(list(cds_ref, cds_qry),
                            keep_all_genes=TRUE,
                            cell_names_unique=TRUE,
                            keep_reduced_dims=TRUE)

# Plot combined data
plot_cells(cds_combined, color_cells_by='data_set')

Example 8: Basic Visualization

# Plot individual datasets
plot_cells(cds_ref)
plot_cells(cds_qry)

# Color by specific metadata
plot_cells(cds_combined, color_cells_by='Main_cell_type')

Key Concepts

Core Monocle 3 Objects

•CellDataSet (cds) - Primary data structure containing expression matrix, cell metadata, and gene metadata
•Transform Models - Saved PCA/UMAP transformations from reference data for projecting query data
•Nearest Neighbor Index - Spatial index used for efficient cell type label transfer

Analysis Workflow

•Data Loading - Import expression matrices and metadata
•Preprocessing - Filter genes, apply UMI cutoffs, estimate size factors
•Reference Processing - Create PCA/UMAP embeddings with nearest neighbor indexing
•Projection - Transform query data into reference space using saved models
•Label Transfer - Transfer annotations from reference to query cells
•Visualization - Plot trajectories and compare datasets

Key Parameters

•build_nn_index=TRUE - Required for cell type label transfer
•num_dim - Number of PCA dimensions (typically 50-100)
•reduction_method - 'UMAP' or 'PCA' for visualization and label transfer

Reference Files

This skill includes comprehensive documentation in references/:

getting_started.md

•17 pages of detailed installation and projection workflows
•Installation Guide - Complete setup with troubleshooting for gdal, Xcode, gfortran errors
•Projection Tutorial - Step-by-step co-embedding and label transfer workflow
•Code Examples - 26 practical examples covering data loading, processing, and visualization

visualization.md

•Interactive 3D plotting with plotly integration
•Advanced trajectory visualizations showing cell partitions
•Web-based exploration tools for large datasets

Use view to read specific reference files when detailed information is needed.

Working with This Skill

For Beginners

•Start with installation - Follow the getting_started.md installation guide carefully
•Use the projection workflow - The co-embedding tutorial provides a complete end-to-end example
•Master the basics first - Focus on data loading, gene filtering, and basic visualization before attempting advanced projection

For Intermediate Users

•Customize projection parameters - Adjust num_dim, UMI cutoffs, and visualization options
•Batch process multiple datasets - Use the transform model system for efficient analysis of many query datasets
•Troubleshoot common issues - Reference the installation troubleshooting section for gdal, Xcode, and gfortran problems

For Advanced Users

•Optimize performance - Use BPCells for large datasets and tune nearest neighbor indexing
•Custom visualization - Extend plotly visualizations for interactive exploration
•Pipeline integration - Incorporate Monocle 3 into larger single-cell analysis workflows

Navigation Tips

•Search by function name - Quick reference includes the most commonly used functions
•Check examples first - Each concept has multiple code examples with different approaches
•Reference the original URLs - Documentation includes links to official Monocle 3 documentation

Resources

references/

Organized documentation extracted from official sources:

•Step-by-step tutorials with complete code workflows
•Installation troubleshooting with specific error solutions
•Multiple code examples showing different approaches to the same task
•Links to original documentation for further reading

scripts/

Add helper scripts for common automation tasks such as:

•Batch projection of multiple query datasets
•Automated installation scripts
•Custom visualization functions

assets/

Add templates and examples such as:

•Example metadata files showing proper formatting
•Configuration templates for common analysis scenarios
•Boilerplate code for starting new projects

Notes

•This skill covers Monocle 3 version 1.4.25+ with Bioconductor 3.21 and R 4.4.1+
•Projection workflow is the key feature - enabling comparison of large datasets without memory issues
•Transform models can be reused across multiple query datasets for consistent analysis
•Code examples include both simple and advanced approaches for flexibility

Updating

To refresh this skill with updated documentation:

•Re-run the scraper with the same configuration
•The skill will be rebuilt with the latest information from the official Monocle 3 documentation