Archr-Local Skill
Comprehensive assistance with ArchR (Analysis of Regulatory Regions) - the premier toolkit for single-cell ATAC-seq data analysis, generated from official documentation.
When to Use This Skill
This skill should be triggered when:
Core Analysis Tasks
- •Working with scATAC-seq data: Processing fragment files, creating Arrow files, quality control
- •Dimensionality reduction and clustering: LSI analysis, UMAP visualization, cell clustering
- •Peak analysis: Differential accessibility, co-accessibility, peak-to-gene links
- •Motif enrichment: Finding regulatory motifs, transcription factor analysis
- •Integration: Multiome analysis (scATAC + scRNA), batch correction
- •Visualization: Browser tracks, embedding plots, TSS enrichment plots
- •Project management: Creating projects, subsetting, doublet detection
Specific Use Cases
- •Setting up new ArchR projects from fragment files or BAM files
- •Creating UMAP embeddings and clustering scATAC-seq data
- •Identifying cell clusters and markers
- •Performing differential accessibility analysis
- •Generating browser tracks for genomic regions
- •Adding motif annotations and performing motif enrichment
- •Integrating scATAC-seq with scRNA-seq data
- •Exporting results and creating publication-ready plots
Questions That Need This Skill
- •"How do I create an ArchR project from my scATAC-seq data?"
- •"How do I identify cell types in my ATAC-seq data?"
- •"How do I find differentially accessible peaks between clusters?"
- •"How do I add motif information to my peaks?"
- •"How do I visualize my data with UMAP plots?"
Quick Reference
Common Patterns
Pattern 1: Setting up ArchR and creating a project
library(ArchR)
addArchRGenome("hg38")
addArchRThreads(8)
addArchRLocking(locking = TRUE)
set.seed(1)
# Create Arrow files from fragment data
ArrowFiles <- createArrowFiles(
inputFiles = atacFiles,
sampleNames = names(atacFiles),
minTSS = 4,
minFrags = 1000,
addTileMat = TRUE,
addGeneScoreMat = TRUE
)
# Create ArchR project
proj <- ArchRProj(
ArrowFiles = ArrowFiles,
outputDirectory = "ArchR-Output",
copyArrows = TRUE
)
Pattern 2: Quality control and filtering
# Plot TSS enrichment for QC p <- plotTSSEnrichment(proj, groupBy = "Sample") plotPDF(p, name = "TSS-Enrich", ArchRProj = proj) # Filter doublets proj <- filterDoublets(proj) # Subset to high-quality cells proj <- subsetArchRProject( ArchRProj = proj, cells = proj$cellColData[proj$cellColData$Clusters != "Doublet", ] )
Pattern 3: Dimensionality reduction and clustering
# Add Iterative LSI proj <- addIterativeLSI( ArchRProj = proj, useMatrix = "TileMatrix", name = "IterativeLSI", force = TRUE ) # Add UMAP embedding proj <- addUMAP( ArchRProj = proj, reducedDims = "IterativeLSI", name = "UMAP", nNeighbors = 30, minDist = 0.5, metric = "cosine" ) # Add clusters proj <- addClusters( input = proj, reducedDims = "IterativeLSI", method = "Seurat", name = "Clusters", resolution = 0.8 )
Pattern 4: Visualization and plotting
# Plot UMAP colored by sample p1 <- plotEmbedding(ArchRProj = proj, colorBy = "cellColData", name = "Sample", embedding = "UMAP") # Plot UMAP colored by clusters p2 <- plotEmbedding(ArchRProj = proj, colorBy = "cellColData", name = "Clusters", embedding = "UMAP") # Side-by-side comparison ggAlignPlots(p1, p2, type = "h") plotPDF(p1, p2, name = "Plot-UMAP-Sample-Clusters.pdf", ArchRProj = proj, addDOC = FALSE)
Pattern 5: Gene accessibility and marker analysis
# Add gene expression scores proj <- addGeneScoreMatrix( ArchRProj = proj, useMatrix = "TileMatrix", matrixName = "GeneScoreMatrix" ) # Plot gene scores on UMAP p <- plotEmbedding( ArchRProj = proj, embedding = "UMAP", colorBy = "GeneScoreMatrix", name = "CD34", size = 1 )
Pattern 6: Differential accessibility analysis
# Identify marker peaks
markersPeaks <- getMarkerFeatures(
ArchRProj = proj,
useMatrix = "PeakMatrix",
groupBy = "Clusters",
bias = c("TSSEnrichment", "log10(nFrags)"),
testMethod = "wilcoxon"
)
# Extract region matrix
markerList <- getFeatures(markersPeaks, name = "PeakMatrix")
heatmapPeaks <- plotMarkerHeatmap(
peakMatrix = getMatrixFromProject(proj, useMatrix = "PeakMatrix"),
features = markerList,
cutOff = "FDR <= 0.1 & Log2FC >= 1"
)
Pattern 7: Co-accessibility analysis
# Add co-accessibility
proj <- addCoAccessibility(
ArchRProj = proj,
reducedDims = "IterativeLSI"
)
# Get co-accessibility loops
cA <- getCoAccessibility(
ArchRProj = proj,
corCutOff = 0.5,
resolution = 1000,
returnLoops = TRUE
)
# Plot browser tracks with co-accessibility loops
p <- plotBrowserTrack(
ArchRProj = proj,
groupBy = "Clusters",
geneSymbol = c("CD14", "CD3D"),
upstream = 50000,
downstream = 50000,
loops = cA
)
Pattern 8: Motif enrichment analysis
# Add motif annotations proj <- addMotifAnnotations(ArchRProj = proj, motifSet = "cisbp", name = "Motif") # Perform motif enrichment enrichMotifs <- peakAnnoEnrichment( seMarker = getFeatures(markersPeaks, name = "PeakMatrix"), ArchRProj = proj, peakAnnotation = "Motif" ) # Plot motif enrichment plotEnrichHeatmap(enrichMotifs, cutOff = "FDR <= 0.1 & Log2FC >= 1")
Pattern 9: Multiome data integration
# Import scRNA-seq data
rnaMatrix <- import10xFeatureMatrix(inputFiles = rnaFiles)
# Add gene expression matrix to ArchR project
proj <- addGeneExpressionMatrix(
input = proj,
matrices = rnaMatrix,
strictMatch = TRUE
)
# Create dimensionality reduction using both modalities
proj <- addCombinedDims(
ArchRProj = proj,
reducedDims = c("IterativeLSI", "GeneIntegrationMatrix"),
name = "CombinedDims"
)
Pattern 10: Exporting results and data
# Export group BigWig files bw <- getGroupBW( ArchRProj = proj, groupBy = "Clusters", normMethod = "ReadsInTSS", tileSize = 100 ) # Export group fragment files frags <- getGroupFragments( ArchRProj = proj, groupBy = "Clusters" ) # Save project proj <- saveArchRProject(proj, outputDirectory = "ArchR-Project-Output", load = FALSE)
Key Concepts
Core ArchR Objects
- •ArchRProject: Main container for single-cell ATAC-seq data and analyses
- •ArrowFiles: Efficient storage format for fragment data
- •PeakMatrix: Peak-by-cell accessibility matrix
- •TileMatrix: Fixed-size tile-by-cell accessibility matrix
- •GeneScoreMatrix: Gene-by-cell accessibility score matrix
Analysis Workflow
- •Data Input: Import fragment files/BAM files → Create Arrow files
- •Quality Control: TSS enrichment → Doublet filtering → Cell subsetting
- •Dimensionality Reduction: Iterative LSI → UMAP
- •Clustering: Identify cell populations
- •Downstream Analysis: Differential accessibility, motif analysis, co-accessibility
Important Parameters
- •minTSS: Minimum TSS enrichment score for cell retention (usually 4-10)
- •minFrags: Minimum fragments per cell (usually 1000-5000)
- •resolution: Clustering resolution (higher = more clusters)
- •corCutOff: Correlation cutoff for co-accessibility (usually 0.3-0.5)
Reference Files
This skill includes comprehensive documentation in references/:
Core Documentation
- •getting_started.md - Installation, basic setup, and introduction to ArchR
- •data_preparation.md - Input file formats, project creation, and data import
- •dimensionality_reduction.md - LSI, UMAP, and other dimensionality reduction methods
- •clustering.md - Cell clustering, group creation, and population identification
- •visualization.md - Plotting functions and data visualization
Analysis Functions
- •analysis_functions.md - Core analysis functions for modalities and integrative analysis
- •peak_analysis.md - Peak calling, differential accessibility, and peak-to-gene linking
- •gene_analysis.md - Gene scoring, expression integration, and gene-based analyses
- •enrichment_analysis.md - GO analysis, motif enrichment, and pathway analysis
- •trajectory_analysis.md - Pseudotime analysis, lineage trajectories, and differentiation
Advanced Topics
- •advanced.md - Advanced techniques and specialized analyses
- •integration.md - Multi-omics integration and batch correction
- •export.md - Data export, result saving, and sharing
Utilities
- •project_management.md - Project organization, subsetting, and management
- •utility_functions.md - Helper functions and utilities
- •visualization_functions.md - Additional plotting and visualization tools
- •other.md - Miscellaneous topics and supplementary information
Reference Content Structure
Each reference file contains:
- •Detailed explanations of functions and workflows
- •Code examples with syntax highlighting
- •Parameter descriptions and usage notes
- •Best practices and troubleshooting tips
Use file names to navigate specific topics (e.g., view clustering.md for clustering guidance).
Working with This Skill
For Beginners
- •Start with getting_started.md - Learn ArchR installation and basic concepts
- •Read data_preparation.md - Understand how to format and import your data
- •Follow dimensionality_reduction.md and clustering.md - Create your first analyses
- •Use visualization.md - Learn to plot and interpret your results
For Intermediate Users
- •Review peak_analysis.md and gene_analysis.md - Perform differential analyses
- •Explore enrichment_analysis.md - Add regulatory insights to your work
- •Check integration.md - Combine multiple modalities or datasets
- •Use export.md - Generate publication-ready outputs
For Advanced Users
- •Study advanced.md - Implement sophisticated analytical techniques
- •Review trajectory_analysis.md - Study cellular differentiation
- •Customize utility_functions.md - Extend ArchR functionality
- •Optimize workflows using project_management.md
Navigation Tips
- •Use specific file names in your queries (e.g., "show me clustering.md")
- •Ask for specific functions by name (e.g., "how does addIterativeLSI work?")
- •Request examples from particular documentation sections
- •Use the Quick Reference patterns for common workflows
Resources
Documentation Structure
- •references/ - Complete extracted documentation organized by topic
- •Quick Reference - Frequently used code patterns and workflows
- •Key Concepts - Essential terminology and best practices
Getting Help
- •Reference specific files for detailed function descriptions
- •Use Quick Reference patterns for common tasks
- •Ask about specific parameters or troubleshooting scenarios
Notes
- •ArchR is specifically designed for single-cell ATAC-seq data analysis
- •The Arrow file format enables memory-efficient processing of large datasets
- •ArchR integrates seamlessly with other Bioconductor packages
- •All major scATAC-seq file formats are supported (10x, sci-ATAC, etc.)
- •The toolkit includes extensive QC metrics and validation steps
Updating
To refresh this skill with updated documentation:
- •Re-run the scraper with the same configuration
- •This skill will be rebuilt with the latest information from the ArchR documentation