AgentSkillsCN

dqmc-run

在HDF5模拟文件上执行DQMC蒙特卡洛扫描。适用于运行模拟、对长时间运行任务进行检查点保存、利用队列系统处理多份文件,或核查模拟运行状态时使用。

SKILL.md
--- frontmatter
name: dqmc-run
description: Execute DQMC Monte Carlo sweeps on HDF5 simulation files. Use when running simulations, checkpointing long runs, using the queue system for multiple files, or checking simulation status.

Run Simulations

Inputs

  • HDF5 simulation file(s) from the dqmc-generate and dqmc-parameter-scans skills
  • Binary at build/dqmc, worker utility from dqmc-util worker

Outputs

  • Updated HDF5 files with measurements in /meas_eqlt/ (and /meas_uneqlt/ if enabled)

Decision Rules (single vs queue)

  • Exactly 1 file: run it directly with build/dqmc sim_0.h5.
  • 2+ files: use the queue/worker system (dqmc-util enqueue + dqmc-util worker) even on a local workstation.
    • This avoids ad-hoc shell loops, makes restarts/checkpointing consistent, and scales to parallel execution safely.

Procedure

Single file:

bash
build/dqmc sim_0.h5

With checkpointing (long runs):

bash
build/dqmc -s 300 -t 3600 sim_0.h5  # checkpoint every 300 seconds, max runtime 3600 seconds

Multiple files (queue system):

bash
# 1. Add files to queue/ (globs are expanded by Python, so quote them)
dqmc-util enqueue queue 'data/run/bin_*.h5'

# 2. Start workers
dqmc-util worker queue ./build/dqmc -n 8 -s 300 -t 3600  # 8 parallel workers

Multiple files (local temperature scan example):

bash
# enqueue all bins across temperatures
dqmc-util enqueue queue 'runs/pi_pi_U8_6x6/T*/bin_*.h5'

# run N workers in parallel (pick N for your machine)
dqmc-util worker queue ./build/dqmc -n 8 -s 300 -t 3600

Validation (single file)

bash
dqmc-util summary sim_0.h5

For multiple files:

bash
dqmc-util print-n data/run/   # directory paths should end with '/'

Failure Modes

SymptomCauseRecovery
"opening... failed"File doesn't existCheck path
NaN in outputNumerical instabilityReduce dt and/or n_matmul. Check parameters
Killed by schedulerExceeded time limitUse -t flag, checkpoint enabled
Large errorbarsSign problemAdd more sweeps and/or bins

Do Not

  • Kill running simulations without checkpointing (-s flag)
  • Run multiple workers on the same file
  • Use manual shell loops for 2+ files (prefer dqmc-util enqueue + dqmc-util worker)