AgentSkillsCN

watch

启动 EvalView 监控模式,每当项目文件发生变化时自动重新运行回归检查。

SKILL.md
--- frontmatter
name: watch
description: Start EvalView watch mode to automatically re-run regression checks whenever project files change.

Watch Mode

Use this skill when the user wants continuous regression monitoring during development. Watch mode observes file changes and automatically re-runs evalview check with debounced triggers.

What this does

EvalView's watch mode uses watchdog to monitor directories for file changes (.py, .yaml, .yml, .json, .md, .txt, .toml, .cfg, .ini). When a change is detected, it runs a regression check via the gate() API and displays a live scorecard with pass/fail status, score deltas, tool changes, and streak tracking.

How to start watch mode

Watch mode is a CLI command (not an MCP tool). Help the user run it:

code
evalview watch

Common options

  • --quick — Skip LLM judge, deterministic checks only ($0 cost, sub-second)
  • --path src/ --path tests/ — Watch specific directories (default: current directory)
  • --test "my-test" — Only check a specific test by name
  • --test-dir tests/evalview — Path to test cases directory (default: tests)
  • --interval 1 — Debounce interval in seconds (default: 2.0)
  • --fail-on REGRESSION,TOOLS_CHANGED — Comma-separated statuses that count as failure (default: REGRESSION)
  • --sound — Terminal bell on regression

Examples

code
# Basic: watch everything, full checks
evalview watch

# Fast development loop: no LLM judge, 1-second debounce
evalview watch --quick --interval 1

# Watch specific directories and one test
evalview watch --path src/ --path tests/ --test "calculator-division"

# Strict mode: fail on any behavioral change
evalview watch --fail-on REGRESSION,TOOLS_CHANGED,OUTPUT_CHANGED --sound

Prerequisites

Watch mode requires the watchdog package. If not installed:

code
pip install evalview[watch]

Notes

  • Watch mode excludes .evalview/, .git/, venv/, node_modules/, __pycache__/, and other common non-source directories automatically.
  • The initial check runs immediately on startup before watching begins.
  • Results include a live scorecard with pass counts, regression counts, health percentage, and streak info.
  • --quick mode is ideal for tight development loops since it costs nothing and runs in sub-second time.