Archr-Local Skill

Comprehensive assistance with ArchR (Analysis of Regulatory Regions) - the premier toolkit for single-cell ATAC-seq data analysis, generated from official documentation.

When to Use This Skill

This skill should be triggered when:

Core Analysis Tasks

•Working with scATAC-seq data: Processing fragment files, creating Arrow files, quality control
•Dimensionality reduction and clustering: LSI analysis, UMAP visualization, cell clustering
•Peak analysis: Differential accessibility, co-accessibility, peak-to-gene links
•Motif enrichment: Finding regulatory motifs, transcription factor analysis
•Integration: Multiome analysis (scATAC + scRNA), batch correction
•Visualization: Browser tracks, embedding plots, TSS enrichment plots
•Project management: Creating projects, subsetting, doublet detection

Specific Use Cases

•Setting up new ArchR projects from fragment files or BAM files
•Creating UMAP embeddings and clustering scATAC-seq data
•Identifying cell clusters and markers
•Performing differential accessibility analysis
•Generating browser tracks for genomic regions
•Adding motif annotations and performing motif enrichment
•Integrating scATAC-seq with scRNA-seq data
•Exporting results and creating publication-ready plots

Questions That Need This Skill

•"How do I create an ArchR project from my scATAC-seq data?"
•"How do I identify cell types in my ATAC-seq data?"
•"How do I find differentially accessible peaks between clusters?"
•"How do I add motif information to my peaks?"
•"How do I visualize my data with UMAP plots?"

Quick Reference

Common Patterns

Pattern 1: Setting up ArchR and creating a project

library(ArchR)
addArchRGenome("hg38")
addArchRThreads(8)
addArchRLocking(locking = TRUE)
set.seed(1)

# Create Arrow files from fragment data
ArrowFiles <- createArrowFiles(
  inputFiles = atacFiles,
  sampleNames = names(atacFiles),
  minTSS = 4,
  minFrags = 1000,
  addTileMat = TRUE,
  addGeneScoreMat = TRUE
)

# Create ArchR project
proj <- ArchRProj(
  ArrowFiles = ArrowFiles,
  outputDirectory = "ArchR-Output",
  copyArrows = TRUE
)

Pattern 2: Quality control and filtering

# Plot TSS enrichment for QC
p <- plotTSSEnrichment(proj, groupBy = "Sample")
plotPDF(p, name = "TSS-Enrich", ArchRProj = proj)

# Filter doublets
proj <- filterDoublets(proj)

# Subset to high-quality cells
proj <- subsetArchRProject(
  ArchRProj = proj,
  cells = proj$cellColData[proj$cellColData$Clusters != "Doublet", ]
)

Pattern 3: Dimensionality reduction and clustering

# Add Iterative LSI
proj <- addIterativeLSI(
  ArchRProj = proj,
  useMatrix = "TileMatrix",
  name = "IterativeLSI",
  force = TRUE
)

# Add UMAP embedding
proj <- addUMAP(
  ArchRProj = proj,
  reducedDims = "IterativeLSI",
  name = "UMAP",
  nNeighbors = 30,
  minDist = 0.5,
  metric = "cosine"
)

# Add clusters
proj <- addClusters(
  input = proj,
  reducedDims = "IterativeLSI",
  method = "Seurat",
  name = "Clusters",
  resolution = 0.8
)

Pattern 4: Visualization and plotting

# Plot UMAP colored by sample
p1 <- plotEmbedding(ArchRProj = proj, colorBy = "cellColData", name = "Sample", embedding = "UMAP")

# Plot UMAP colored by clusters
p2 <- plotEmbedding(ArchRProj = proj, colorBy = "cellColData", name = "Clusters", embedding = "UMAP")

# Side-by-side comparison
ggAlignPlots(p1, p2, type = "h")
plotPDF(p1, p2, name = "Plot-UMAP-Sample-Clusters.pdf", ArchRProj = proj, addDOC = FALSE)

Pattern 5: Gene accessibility and marker analysis

# Add gene expression scores
proj <- addGeneScoreMatrix(
  ArchRProj = proj,
  useMatrix = "TileMatrix",
  matrixName = "GeneScoreMatrix"
)

# Plot gene scores on UMAP
p <- plotEmbedding(
  ArchRProj = proj,
  embedding = "UMAP",
  colorBy = "GeneScoreMatrix",
  name = "CD34",
  size = 1
)

Pattern 6: Differential accessibility analysis

# Identify marker peaks
markersPeaks <- getMarkerFeatures(
  ArchRProj = proj,
  useMatrix = "PeakMatrix",
  groupBy = "Clusters",
  bias = c("TSSEnrichment", "log10(nFrags)"),
  testMethod = "wilcoxon"
)

# Extract region matrix
markerList <- getFeatures(markersPeaks, name = "PeakMatrix")
heatmapPeaks <- plotMarkerHeatmap(
  peakMatrix = getMatrixFromProject(proj, useMatrix = "PeakMatrix"),
  features = markerList,
  cutOff = "FDR <= 0.1 & Log2FC >= 1"
)

Pattern 7: Co-accessibility analysis

# Add co-accessibility
proj <- addCoAccessibility(
  ArchRProj = proj,
  reducedDims = "IterativeLSI"
)

# Get co-accessibility loops
cA <- getCoAccessibility(
  ArchRProj = proj,
  corCutOff = 0.5,
  resolution = 1000,
  returnLoops = TRUE
)

# Plot browser tracks with co-accessibility loops
p <- plotBrowserTrack(
  ArchRProj = proj,
  groupBy = "Clusters",
  geneSymbol = c("CD14", "CD3D"),
  upstream = 50000,
  downstream = 50000,
  loops = cA
)

Pattern 8: Motif enrichment analysis

# Add motif annotations
proj <- addMotifAnnotations(ArchRProj = proj, motifSet = "cisbp", name = "Motif")

# Perform motif enrichment
enrichMotifs <- peakAnnoEnrichment(
  seMarker = getFeatures(markersPeaks, name = "PeakMatrix"),
  ArchRProj = proj,
  peakAnnotation = "Motif"
)

# Plot motif enrichment
plotEnrichHeatmap(enrichMotifs, cutOff = "FDR <= 0.1 & Log2FC >= 1")

Pattern 9: Multiome data integration

# Import scRNA-seq data
rnaMatrix <- import10xFeatureMatrix(inputFiles = rnaFiles)

# Add gene expression matrix to ArchR project
proj <- addGeneExpressionMatrix(
  input = proj,
  matrices = rnaMatrix,
  strictMatch = TRUE
)

# Create dimensionality reduction using both modalities
proj <- addCombinedDims(
  ArchRProj = proj,
  reducedDims = c("IterativeLSI", "GeneIntegrationMatrix"),
  name = "CombinedDims"
)

Pattern 10: Exporting results and data

# Export group BigWig files
bw <- getGroupBW(
  ArchRProj = proj,
  groupBy = "Clusters",
  normMethod = "ReadsInTSS",
  tileSize = 100
)

# Export group fragment files
frags <- getGroupFragments(
  ArchRProj = proj,
  groupBy = "Clusters"
)

# Save project
proj <- saveArchRProject(proj, outputDirectory = "ArchR-Project-Output", load = FALSE)

Key Concepts

Core ArchR Objects

•ArchRProject: Main container for single-cell ATAC-seq data and analyses
•ArrowFiles: Efficient storage format for fragment data
•PeakMatrix: Peak-by-cell accessibility matrix
•TileMatrix: Fixed-size tile-by-cell accessibility matrix
•GeneScoreMatrix: Gene-by-cell accessibility score matrix

Analysis Workflow

•Data Input: Import fragment files/BAM files → Create Arrow files
•Quality Control: TSS enrichment → Doublet filtering → Cell subsetting
•Dimensionality Reduction: Iterative LSI → UMAP
•Clustering: Identify cell populations
•Downstream Analysis: Differential accessibility, motif analysis, co-accessibility

Important Parameters

•minTSS: Minimum TSS enrichment score for cell retention (usually 4-10)
•minFrags: Minimum fragments per cell (usually 1000-5000)
•resolution: Clustering resolution (higher = more clusters)
•corCutOff: Correlation cutoff for co-accessibility (usually 0.3-0.5)

Reference Files

This skill includes comprehensive documentation in references/:

Core Documentation

•getting_started.md - Installation, basic setup, and introduction to ArchR
•data_preparation.md - Input file formats, project creation, and data import
•dimensionality_reduction.md - LSI, UMAP, and other dimensionality reduction methods
•clustering.md - Cell clustering, group creation, and population identification
•visualization.md - Plotting functions and data visualization

Analysis Functions

•analysis_functions.md - Core analysis functions for modalities and integrative analysis
•peak_analysis.md - Peak calling, differential accessibility, and peak-to-gene linking
•gene_analysis.md - Gene scoring, expression integration, and gene-based analyses
•enrichment_analysis.md - GO analysis, motif enrichment, and pathway analysis
•trajectory_analysis.md - Pseudotime analysis, lineage trajectories, and differentiation

Advanced Topics

•advanced.md - Advanced techniques and specialized analyses
•integration.md - Multi-omics integration and batch correction
•export.md - Data export, result saving, and sharing

Utilities

•project_management.md - Project organization, subsetting, and management
•utility_functions.md - Helper functions and utilities
•visualization_functions.md - Additional plotting and visualization tools
•other.md - Miscellaneous topics and supplementary information

Reference Content Structure

Each reference file contains:

•Detailed explanations of functions and workflows
•Code examples with syntax highlighting
•Parameter descriptions and usage notes
•Best practices and troubleshooting tips

Use file names to navigate specific topics (e.g., view clustering.md for clustering guidance).

Working with This Skill

For Beginners

•Start with getting_started.md - Learn ArchR installation and basic concepts
•Read data_preparation.md - Understand how to format and import your data
•Follow dimensionality_reduction.md and clustering.md - Create your first analyses
•Use visualization.md - Learn to plot and interpret your results

For Intermediate Users

•Review peak_analysis.md and gene_analysis.md - Perform differential analyses
•Explore enrichment_analysis.md - Add regulatory insights to your work
•Check integration.md - Combine multiple modalities or datasets
•Use export.md - Generate publication-ready outputs

For Advanced Users

•Study advanced.md - Implement sophisticated analytical techniques
•Review trajectory_analysis.md - Study cellular differentiation
•Customize utility_functions.md - Extend ArchR functionality
•Optimize workflows using project_management.md

Navigation Tips

•Use specific file names in your queries (e.g., "show me clustering.md")
•Ask for specific functions by name (e.g., "how does addIterativeLSI work?")
•Request examples from particular documentation sections
•Use the Quick Reference patterns for common workflows

Resources

Documentation Structure

•references/ - Complete extracted documentation organized by topic
•Quick Reference - Frequently used code patterns and workflows
•Key Concepts - Essential terminology and best practices

Getting Help

•Reference specific files for detailed function descriptions
•Use Quick Reference patterns for common tasks
•Ask about specific parameters or troubleshooting scenarios

Notes

•ArchR is specifically designed for single-cell ATAC-seq data analysis
•The Arrow file format enables memory-efficient processing of large datasets
•ArchR integrates seamlessly with other Bioconductor packages
•All major scATAC-seq file formats are supported (10x, sci-ATAC, etc.)
•The toolkit includes extensive QC metrics and validation steps

Updating

To refresh this skill with updated documentation:

•Re-run the scraper with the same configuration
•This skill will be rebuilt with the latest information from the ArchR documentation