AgentSkillsCN

Kt Experiment

KT 实验

SKILL.md

kt-experiment Skill

A comprehensive skill for running, analyzing, and comparing Knowledge Tracing experiments using the pyKT toolkit.

When to Use This Skill

Use this skill when you need to:

  • Run systematic experiments with multiple models, datasets, and cross-validation folds
  • Compare model performance across different configurations
  • Parse training logs to extract key metrics (AUC, ACC, best epoch)
  • Generate comparison tables for research papers or reports
  • Automate hyperparameter tuning workflows

This skill is designed for:

  • Researchers comparing KT models
  • Students learning KT experimentation
  • Practitioners benchmarking models on new datasets

Prerequisites

Before using this skill, ensure you have:

  1. pyKT installed and working

    bash
    git clone https://github.com/pykt-team/pykt-toolkit.git
    cd pykt-toolkit
    pip install -e .
    
  2. Data prepared using the dataset-prep skill

    • Data should be in pyKT format (see skills/dataset-prep/references/data_format.md)
    • Preprocessed data located in pykt-toolkit/data/
  3. Python environment with dependencies:

    • numpy, pandas, torch
    • Standard library: json, csv, argparse

Workflow Overview

code
1. Prepare Data
   └─ Use dataset-prep skill

2. Run Experiments
   └─ run_cv.py → Train models with CV

3. Parse Results  
   └─ parse_logs.py → Extract metrics

4. Compare Models
   └─ compare_results.py → Generate tables

Quick Start

Step 1: Run Cross-Validation

bash
# Run single model on single dataset
python scripts/run_cv.py --models dkt --datasets assist2009 --folds 0,1,2,3,4

# Run multiple models on multiple datasets (sequential)
python scripts/run_cv.py --models dkt,akt,simplekt --datasets assist2009,assist2015

# Run in parallel (faster, more GPU memory)
python scripts/run_cv.py --models dkt,akt --datasets assist2009 --parallel --max_workers 2

Step 2: Parse Training Logs

bash
# Parse all logs and save as JSON
python scripts/parse_logs.py --log_dir ./experiment_results --output parsed_results.json

# Also search subdirectories
python scripts/parse_logs.py --log_dir ./experiment_results --recursive --output results.json

Step 3: Generate Comparison Tables

bash
# Generate Markdown table (default)
python scripts/compare_results.py --input parsed_results.json --output comparison.md

# Generate LaTeX table for papers
python scripts/compare_results.py --input parsed_results.json --format latex --output table.tex

# Compare by test AUC instead of validation AUC
python scripts/compare_results.py --input parsed_results.json --metric testauc --output test_comparison.md

Script Reference

run_cv.py - Cross-Validation Runner

Run experiments systematically across model × dataset × fold combinations.

Arguments:

ArgumentShortRequiredDefaultDescription
--models-mYes-Comma-separated model names
--datasets-dYes-Comma-separated dataset names
--folds-fNo0,1,2,3,4Comma-separated fold numbers
--output_dir-oNo./experiment_resultsOutput directory
--workdir-No../examplespyKT examples directory
--config_dir-No../configsConfig files directory
--parallel-pNoFalseEnable parallel execution
--max_workers-No2Parallel workers count
--seq_len-No200Maximum sequence length
--num_epochs-No200Training epochs
--learning_rate-NoAutoOverride learning rate

Examples:

bash
# Basic usage
python run_cv.py --models dkt --datasets assist2009 --folds 0,1,2

# Multiple models and datasets
python run_cv.py --models dkt,dkvmn,akt --datasets assist2009,assist2015 --folds 0,1

# Parallel execution
python run_cv.py --models dkt,akt --datasets assist2009 --parallel --max_workers 2

# Custom parameters
python run_cv.py --models dkt --datasets assist2009 --seq_len 100 --num_epochs 100

Output Structure:

code
experiment_results/
├── dkt/
│   └── assist2009/
│       ├── fold_0/
│       │   ├── config.json
│       │   ├── qid_model.ckpt
│       │   └── train.log
│       └── fold_1/
└── akt/
    └── assist2009/
        └── fold_0/
            └── ...

Supported Models:

ModelBatch SizeNotes
dkt, dkt+, dkt_forget, kqn, atkt, atktfix, hawkes256Standard
dkvmn, deep_irt, sakt, saint, akt, simplekt, etc.64Memory-heavy
dtransformer32High memory
gkt16Graph-based, highest memory

See references/models.md for full model documentation.


parse_logs.py - Log Parser

Extract training metrics from log files.

Arguments:

ArgumentShortRequiredDefaultDescription
--log_dir-iYes-Directory containing logs
--output-oNoparsed_results.jsonOutput file path
--format-NojsonOutput format (json, csv)
--recursive-rNoFalseSearch subdirectories

Examples:

bash
# Parse logs from default output directory
python parse_logs.py --log_dir ./experiment_results --output results.json

# Search recursively
python parse_logs.py --log_dir ./ --recursive -o all_results.json

# Export as CSV
python parse_logs.py --log_dir ./experiment_results --format csv -o results.csv

Extracted Metrics:

  • validauc - Validation AUC
  • validacc - Validation accuracy
  • testauc - Test AUC
  • testacc - Test accuracy
  • window_testauc - Windowed test AUC
  • window_testacc - Windowed test accuracy
  • best_epoch - Epoch with best validation AUC

compare_results.py - Results Comparator

Generate comparison tables from parsed results.

Arguments:

ArgumentShortRequiredDefaultDescription
--input-iYes-Input file (JSON/CSV)
--output-oNocomparison.mdOutput file path
--format-NomarkdownOutput format (markdown, csv, latex, json)
--metric-NovalidaucMetric to compare
--group_by-NodatasetGroup results by (dataset, model)

Examples:

bash
# Generate Markdown comparison table
python compare_results.py --input parsed_results.json --output comparison.md

# Generate LaTeX for papers
python compare_results.py --input parsed_results.json --format latex --output table.tex

# Compare by test AUC
python compare_results.py --input parsed_results.json --metric testauc --output test_comparison.md

# Export as CSV for Excel
python compare_results.py --input parsed_results.json --format csv --output comparison.csv

# Get JSON summary
python compare_results.py --input parsed_results.json --format json --output summary.json

Output Example (Markdown):

markdown
# KT Experiment Results Comparison

## Model Comparison by Dataset (VALIDAUC)

### assist2009

| Model | VALIDAUC | Mean±Std | Runs | Rank |
|------|----------|----------|------|------|
| akt | 0.8234±0.012 | 0.8234 | 5 | 1 |
| simplekt | 0.8123±0.015 | 0.8123 | 5 | 2 |
| dkt | 0.7567±0.018 | 0.7567 | 5 | 3 |

## Overall Ranking

| Rank | Model | Avg Rank |
|------|-------|----------|
| 1 | akt | 1.00 |
| 2 | simplekt | 2.00 |
| 3 | dkt | 3.00 |

End-to-End Example

bash
# 1. Run experiments (multiple models, 5-fold CV)
python run_cv.py \
    --models dkt,akt,simplekt \
    --datasets assist2009 \
    --folds 0,1,2,3,4 \
    --output_dir ./kt_experiments

# 2. Parse results
python parse_logs.py \
    --log_dir ./kt_experiments \
    --output ./kt_experiments/results.json

# 3. Generate comparison
python compare_results.py \
    --input ./kt_experiments/results.json \
    --output ./kt_experiments/comparison.md

# 4. View results
cat ./kt_experiments/comparison.md

Model Selection Guide

By Task Type

TaskRecommended Models
Quick baselinedkt, kqn
Best performanceakt, dtransformer, simplekt
Long sequencessparsekt, ukt
With timestampshawkes, lpkt, dkt_forget
With concept graphgkt, hcgkt
Noisy datarobustkt, atktfix
Interpretabilitydkvmn, deep_irt

By Data Type

Data AvailableSuitable Models
Concepts onlydkt, dkvmn, kqn, gkt
Concepts + Questionsakt, simplekt, saint, rkt
With timestampsdkt_forget, hawkes, lpkt
Question-richqdkt, qikt, iekt

See references/models.md for detailed model documentation with hyperparameter settings.

Troubleshooting

Out of Memory (OOM)

  1. Reduce batch size:

    bash
    # Models that need batch_size=64
    python run_cv.py --models akt,saint --datasets assist2009
    
  2. Reduce sequence length:

    bash
    python run_cv.py --models dkt --datasets assist2009 --seq_len 100
    
  3. Use simpler model:

    bash
    python run_cv.py --models dkt,dkvmn --datasets assist2009
    

Training Instability

  1. Lower learning rate:

    bash
    python run_cv.py --models akt --datasets assist2009 --learning_rate 1e-5
    
  2. Use stable model variant:

    bash
    python run_cv.py --models stablekt --datasets assist2009
    

No Results Parsed

  1. Check log file location:

    bash
    python parse_logs.py --log_dir ./kt_experiments --recursive
    
  2. Verify training completed successfully:

    bash
    cat ./kt_experiments/dkt/assist2009/fold_0/train.log | grep -E "(success|ERROR|failed)"
    
  3. Check final output line:

    bash
    grep "fold.*modelname" ./kt_experiments/dkt/assist2009/fold_0/train.log
    

Slow Training

  1. Use simpler model:

    bash
    python run_cv.py --models dkt,dkvmn --datasets assist2009
    
  2. Reduce epochs:

    bash
    python run_cv.py --models dkt --datasets assist2009 --num_epochs 50
    
  3. Use parallel execution:

    bash
    python run_cv.py --models dkt,dkvmn --datasets assist2009 --parallel
    

Common Issues

IssueCauseSolution
Config directory not foundWrong --workdirEnsure wandb_train.py is in workdir
Missing config filesWrong --config_dirPoint to pyKT configs directory
CUDA out of memoryBatch size too largeUse model-specific batch sizes
No metrics extractedTraining failedCheck train.log for errors
Parallel fails on WindowsProcessPoolExecutor limitationUse sequential mode (--parallel not set)

Output Files

Generated by run_cv.py:

  • model/dataset/fold_X/config.json - Experiment configuration
  • model/dataset/fold_X/qid_model.ckpt - Trained model checkpoint
  • model/dataset/fold_X/train.log - Training log

Generated by parse_logs.py:

  • parsed_results.json - Metrics in JSON format

Generated by compare_results.py:

  • comparison.md - Markdown comparison table
  • table.tex - LaTeX table for papers
  • comparison.csv - CSV for spreadsheet analysis

Advanced Usage

Custom Hyperparameters

bash
python run_cv.py \
    --models dkt \
    --datasets assist2009 \
    --learning_rate 0.0005 \
    --seq_len 150 \
    --num_epochs 150

Specific Folds Only

bash
python run_cv.py \
    --models dkt \
    --datasets assist2009 \
    --folds 0,1  # Only first 2 folds

Multiple Output Runs

bash
# Run with different random seeds
for seed in 42 123 456; do
    python run_cv.py \
        --models dkt \
        --datasets assist2009 \
        --folds 0 \
        --output_dir ./seeds/$seed
done

Related Skills

  • dataset-prep: Prepare pyKT-compatible datasets
  • code-review: Review your KT model implementations
  • kt-model-dev: Templates for developing new KT models (future)
  • kt-hyperopt: Hyperparameter optimization (future)

References

  • pyKT Toolkit: https://github.com/pykt-team/pykt-toolkit
  • Model Documentation: references/models.md
  • Data Format: skills/dataset-prep/references/data_format.md
  • Dataset Config: skills/dataset-prep/references/datasets_config.md

Last updated: 2026-02-02