ChromHMM Chromatin State Inference
Overview
This skill enables comprehensive chromatin state analysis using chromHMM for histone modification ChIP-seq data. ChromHMM uses a multivariate Hidden Markov Model to segment the genome into discrete chromatin states based on combinatorial patterns of histone modifications.
Main steps include:
- •Refer to Inputs & Outputs to verify necessary files.
- •Always prompt user if required files are missing.
- •Always prompt user for genome assembly used.
- •Always prompt user for the bin size for generating binarized files.
- •Always prompt user for the bin size for the number of states the ChromHMM target.
- •Run chromHMM workflow: Binarization → Learning.
When to use this skill
Use this skill when you need to infer chromatin states from histone modification ChIP-seq data using chromHMM.
Inputs & Outputs
Inputs
(1) Option 1: BED files of aligned reads
<mark1>.bed <mark2>.bed ... # Other marks
(1) Option 2: BAM files of aligned reads
<mark1>.bam <mark2>.bam ... # Other marks
Outputs
chromhmm_output/
binarized/
*.txt
model/
*.txt
... # other files output by the ChromHMM
Decision Tree
Step 0: Initialize Project
Call:
- •
mcp__project-init-tools__project_init
with:
- •
sample: all - •
task: chromhmm
Step 1: Prepare the cellmarkfile (skip this step if signal files are provided)
- •
Prepare a .txt file (without header) containing following three columns:
- •sample name
- •marker name
- •name of the BED/BAM file
- •control file of the sample (only provided if the input/control file is available)
- •
example of the cellmark.txt file
cell1 mark1 cell1_mark2.bam cell1_control.bam cell1 mark2 cell1_mark2.bam cell1/control.bam
Step 2: Data Binarization
- •
For BAM inputs:
Call:- •
mcp__chromhmm-tools__binarize_bamwith: - •
path_chrom_sized: Provide by user or detect from the working directory - •
input_dir: Directory containing BAM files - •
cellmarkfile: Cell mark file defining histone modifications - •
output_dir: (e.g.binarized/) - •
bin_size: Provided by user
- •
- •
For BED inputs:
Callmcp__chromhmm-tools__binarize_bedinstead. - •
For Signal inputs:
Call:mcp__chromhmm-tools__binarize_signalwith:- •
input_dir: Directory of signals - •
output_dir: (e.g.binarized/)
- •
Step 3: Model Learning
Call
- •
mcp__chromhmm-tools__learn_model
with:
- •
binarized_dir: Directory binarized file located in - •
num_states: Provide by user (e.g. 15) - •
output_model_dir: (e.g.model_15_states/) - •
genome: Provide by user (e.g.hg38) - •
threads: Provide by user (e.g. 16)
Parameter Optimization
Number of States
- •8 states: Basic chromatin states
- •15 states: Standard comprehensive states
- •25 states: High-resolution states
- •Optimization: Use Bayesian Information Criterion (BIC)
Bin Size
- •200bp: Standard resolution
- •100bp: High resolution (requires more memory)
- •500bp: Low resolution (faster computation)
State Interpretation
Common Chromatin States
- •Active Promoter: H3K4me3, H3K27ac
- •Weak Promoter: H3K4me3
- •Poised Promoter: H3K4me3, H3K27me3
- •Strong Enhancer: H3K27ac, H3K4me1
- •Weak Enhancer: H3K4me1
- •Insulator: CTCF
- •Transcribed: H3K36me3
- •Repressed: H3K27me3
- •Heterochromatin: Low signal across marks
Troubleshooting
- •Memory errors: Reduce bin size or number of states
- •Convergence problems: Increase iterations or adjust learning rate
- •Uninterpretable states: Check input data quality and mark combinations
- •Missing chromosomes: Verify chromosome naming consistency