Monitor Slurm Jobs
Monitor currently running Slurm jobs and inspect their logs for errors or issues.
Usage
code
/monitor-slurm
Features
- •Lists all running Slurm jobs for your account
- •If no running jobs, inspects the k most recently added job IDs from the logs directory and returns a big picture summary
- •Automatically finds matching
.outand.errfiles in./<project_name>/logs/ - •Displays last N lines of each log file for quick diagnostics
- •Shows job status summary
Options
- •
--tail N: Show last N lines of log files (default: 20) - •
--full: Show complete log file contents instead of tail - •
--verbose: Show detailed job information
Examples
bash
# Basic monitoring /monitor-slurm # Show more log lines /monitor-slurm --tail 50 # Show full log contents /monitor-slurm --full # Verbose output with all job details /monitor-slurm --verbose
How it works
- •Queries active Slurm jobs using
squeue - •If running jobs found:
- •For each job, searches
./<project_name>/logs/for matching log files - •Matches files by job ID pattern:
<job_id>_*.outand<job_id>_*.err - •Displays log contents to help identify issues
- •Provides job status summary
- •For each job, searches
- •If no running jobs:
- •Takes the k most recently added job IDs from the logs directory
- •Inspects their log files
- •Returns a big picture summary of recent job history
Log File Matching
The skill expects log files named by job ID as:
- •
<job_id>_<rest_of_name>.out - •
<job_id>_<rest_of_name>.err
For example, if your job ID is 12345, it will find:
- •
12345_*.out - •
12345_*.err