kt-experiment Skill
A comprehensive skill for running, analyzing, and comparing Knowledge Tracing experiments using the pyKT toolkit.
When to Use This Skill
Use this skill when you need to:
- •Run systematic experiments with multiple models, datasets, and cross-validation folds
- •Compare model performance across different configurations
- •Parse training logs to extract key metrics (AUC, ACC, best epoch)
- •Generate comparison tables for research papers or reports
- •Automate hyperparameter tuning workflows
This skill is designed for:
- •Researchers comparing KT models
- •Students learning KT experimentation
- •Practitioners benchmarking models on new datasets
Prerequisites
Before using this skill, ensure you have:
- •
pyKT installed and working
bashgit clone https://github.com/pykt-team/pykt-toolkit.git cd pykt-toolkit pip install -e .
- •
Data prepared using the
dataset-prepskill- •Data should be in pyKT format (see
skills/dataset-prep/references/data_format.md) - •Preprocessed data located in
pykt-toolkit/data/
- •Data should be in pyKT format (see
- •
Python environment with dependencies:
- •
numpy,pandas,torch - •Standard library:
json,csv,argparse
- •
Workflow Overview
1. Prepare Data └─ Use dataset-prep skill 2. Run Experiments └─ run_cv.py → Train models with CV 3. Parse Results └─ parse_logs.py → Extract metrics 4. Compare Models └─ compare_results.py → Generate tables
Quick Start
Step 1: Run Cross-Validation
# Run single model on single dataset python scripts/run_cv.py --models dkt --datasets assist2009 --folds 0,1,2,3,4 # Run multiple models on multiple datasets (sequential) python scripts/run_cv.py --models dkt,akt,simplekt --datasets assist2009,assist2015 # Run in parallel (faster, more GPU memory) python scripts/run_cv.py --models dkt,akt --datasets assist2009 --parallel --max_workers 2
Step 2: Parse Training Logs
# Parse all logs and save as JSON python scripts/parse_logs.py --log_dir ./experiment_results --output parsed_results.json # Also search subdirectories python scripts/parse_logs.py --log_dir ./experiment_results --recursive --output results.json
Step 3: Generate Comparison Tables
# Generate Markdown table (default) python scripts/compare_results.py --input parsed_results.json --output comparison.md # Generate LaTeX table for papers python scripts/compare_results.py --input parsed_results.json --format latex --output table.tex # Compare by test AUC instead of validation AUC python scripts/compare_results.py --input parsed_results.json --metric testauc --output test_comparison.md
Script Reference
run_cv.py - Cross-Validation Runner
Run experiments systematically across model × dataset × fold combinations.
Arguments:
| Argument | Short | Required | Default | Description |
|---|---|---|---|---|
--models | -m | Yes | - | Comma-separated model names |
--datasets | -d | Yes | - | Comma-separated dataset names |
--folds | -f | No | 0,1,2,3,4 | Comma-separated fold numbers |
--output_dir | -o | No | ./experiment_results | Output directory |
--workdir | - | No | ../examples | pyKT examples directory |
--config_dir | - | No | ../configs | Config files directory |
--parallel | -p | No | False | Enable parallel execution |
--max_workers | - | No | 2 | Parallel workers count |
--seq_len | - | No | 200 | Maximum sequence length |
--num_epochs | - | No | 200 | Training epochs |
--learning_rate | - | No | Auto | Override learning rate |
Examples:
# Basic usage python run_cv.py --models dkt --datasets assist2009 --folds 0,1,2 # Multiple models and datasets python run_cv.py --models dkt,dkvmn,akt --datasets assist2009,assist2015 --folds 0,1 # Parallel execution python run_cv.py --models dkt,akt --datasets assist2009 --parallel --max_workers 2 # Custom parameters python run_cv.py --models dkt --datasets assist2009 --seq_len 100 --num_epochs 100
Output Structure:
experiment_results/
├── dkt/
│ └── assist2009/
│ ├── fold_0/
│ │ ├── config.json
│ │ ├── qid_model.ckpt
│ │ └── train.log
│ └── fold_1/
└── akt/
└── assist2009/
└── fold_0/
└── ...
Supported Models:
| Model | Batch Size | Notes |
|---|---|---|
| dkt, dkt+, dkt_forget, kqn, atkt, atktfix, hawkes | 256 | Standard |
| dkvmn, deep_irt, sakt, saint, akt, simplekt, etc. | 64 | Memory-heavy |
| dtransformer | 32 | High memory |
| gkt | 16 | Graph-based, highest memory |
See references/models.md for full model documentation.
parse_logs.py - Log Parser
Extract training metrics from log files.
Arguments:
| Argument | Short | Required | Default | Description |
|---|---|---|---|---|
--log_dir | -i | Yes | - | Directory containing logs |
--output | -o | No | parsed_results.json | Output file path |
--format | - | No | json | Output format (json, csv) |
--recursive | -r | No | False | Search subdirectories |
Examples:
# Parse logs from default output directory python parse_logs.py --log_dir ./experiment_results --output results.json # Search recursively python parse_logs.py --log_dir ./ --recursive -o all_results.json # Export as CSV python parse_logs.py --log_dir ./experiment_results --format csv -o results.csv
Extracted Metrics:
- •
validauc- Validation AUC - •
validacc- Validation accuracy - •
testauc- Test AUC - •
testacc- Test accuracy - •
window_testauc- Windowed test AUC - •
window_testacc- Windowed test accuracy - •
best_epoch- Epoch with best validation AUC
compare_results.py - Results Comparator
Generate comparison tables from parsed results.
Arguments:
| Argument | Short | Required | Default | Description |
|---|---|---|---|---|
--input | -i | Yes | - | Input file (JSON/CSV) |
--output | -o | No | comparison.md | Output file path |
--format | - | No | markdown | Output format (markdown, csv, latex, json) |
--metric | - | No | validauc | Metric to compare |
--group_by | - | No | dataset | Group results by (dataset, model) |
Examples:
# Generate Markdown comparison table python compare_results.py --input parsed_results.json --output comparison.md # Generate LaTeX for papers python compare_results.py --input parsed_results.json --format latex --output table.tex # Compare by test AUC python compare_results.py --input parsed_results.json --metric testauc --output test_comparison.md # Export as CSV for Excel python compare_results.py --input parsed_results.json --format csv --output comparison.csv # Get JSON summary python compare_results.py --input parsed_results.json --format json --output summary.json
Output Example (Markdown):
# KT Experiment Results Comparison ## Model Comparison by Dataset (VALIDAUC) ### assist2009 | Model | VALIDAUC | Mean±Std | Runs | Rank | |------|----------|----------|------|------| | akt | 0.8234±0.012 | 0.8234 | 5 | 1 | | simplekt | 0.8123±0.015 | 0.8123 | 5 | 2 | | dkt | 0.7567±0.018 | 0.7567 | 5 | 3 | ## Overall Ranking | Rank | Model | Avg Rank | |------|-------|----------| | 1 | akt | 1.00 | | 2 | simplekt | 2.00 | | 3 | dkt | 3.00 |
End-to-End Example
# 1. Run experiments (multiple models, 5-fold CV)
python run_cv.py \
--models dkt,akt,simplekt \
--datasets assist2009 \
--folds 0,1,2,3,4 \
--output_dir ./kt_experiments
# 2. Parse results
python parse_logs.py \
--log_dir ./kt_experiments \
--output ./kt_experiments/results.json
# 3. Generate comparison
python compare_results.py \
--input ./kt_experiments/results.json \
--output ./kt_experiments/comparison.md
# 4. View results
cat ./kt_experiments/comparison.md
Model Selection Guide
By Task Type
| Task | Recommended Models |
|---|---|
| Quick baseline | dkt, kqn |
| Best performance | akt, dtransformer, simplekt |
| Long sequences | sparsekt, ukt |
| With timestamps | hawkes, lpkt, dkt_forget |
| With concept graph | gkt, hcgkt |
| Noisy data | robustkt, atktfix |
| Interpretability | dkvmn, deep_irt |
By Data Type
| Data Available | Suitable Models |
|---|---|
| Concepts only | dkt, dkvmn, kqn, gkt |
| Concepts + Questions | akt, simplekt, saint, rkt |
| With timestamps | dkt_forget, hawkes, lpkt |
| Question-rich | qdkt, qikt, iekt |
See references/models.md for detailed model documentation with hyperparameter settings.
Troubleshooting
Out of Memory (OOM)
- •
Reduce batch size:
bash# Models that need batch_size=64 python run_cv.py --models akt,saint --datasets assist2009
- •
Reduce sequence length:
bashpython run_cv.py --models dkt --datasets assist2009 --seq_len 100
- •
Use simpler model:
bashpython run_cv.py --models dkt,dkvmn --datasets assist2009
Training Instability
- •
Lower learning rate:
bashpython run_cv.py --models akt --datasets assist2009 --learning_rate 1e-5
- •
Use stable model variant:
bashpython run_cv.py --models stablekt --datasets assist2009
No Results Parsed
- •
Check log file location:
bashpython parse_logs.py --log_dir ./kt_experiments --recursive
- •
Verify training completed successfully:
bashcat ./kt_experiments/dkt/assist2009/fold_0/train.log | grep -E "(success|ERROR|failed)"
- •
Check final output line:
bashgrep "fold.*modelname" ./kt_experiments/dkt/assist2009/fold_0/train.log
Slow Training
- •
Use simpler model:
bashpython run_cv.py --models dkt,dkvmn --datasets assist2009
- •
Reduce epochs:
bashpython run_cv.py --models dkt --datasets assist2009 --num_epochs 50
- •
Use parallel execution:
bashpython run_cv.py --models dkt,dkvmn --datasets assist2009 --parallel
Common Issues
| Issue | Cause | Solution |
|---|---|---|
| Config directory not found | Wrong --workdir | Ensure wandb_train.py is in workdir |
| Missing config files | Wrong --config_dir | Point to pyKT configs directory |
| CUDA out of memory | Batch size too large | Use model-specific batch sizes |
| No metrics extracted | Training failed | Check train.log for errors |
| Parallel fails on Windows | ProcessPoolExecutor limitation | Use sequential mode (--parallel not set) |
Output Files
Generated by run_cv.py:
- •
model/dataset/fold_X/config.json- Experiment configuration - •
model/dataset/fold_X/qid_model.ckpt- Trained model checkpoint - •
model/dataset/fold_X/train.log- Training log
Generated by parse_logs.py:
- •
parsed_results.json- Metrics in JSON format
Generated by compare_results.py:
- •
comparison.md- Markdown comparison table - •
table.tex- LaTeX table for papers - •
comparison.csv- CSV for spreadsheet analysis
Advanced Usage
Custom Hyperparameters
python run_cv.py \
--models dkt \
--datasets assist2009 \
--learning_rate 0.0005 \
--seq_len 150 \
--num_epochs 150
Specific Folds Only
python run_cv.py \
--models dkt \
--datasets assist2009 \
--folds 0,1 # Only first 2 folds
Multiple Output Runs
# Run with different random seeds
for seed in 42 123 456; do
python run_cv.py \
--models dkt \
--datasets assist2009 \
--folds 0 \
--output_dir ./seeds/$seed
done
Related Skills
- •dataset-prep: Prepare pyKT-compatible datasets
- •code-review: Review your KT model implementations
- •kt-model-dev: Templates for developing new KT models (future)
- •kt-hyperopt: Hyperparameter optimization (future)
References
- •pyKT Toolkit: https://github.com/pykt-team/pykt-toolkit
- •Model Documentation:
references/models.md - •Data Format:
skills/dataset-prep/references/data_format.md - •Dataset Config:
skills/dataset-prep/references/datasets_config.md
Last updated: 2026-02-02