AgentSkillsCN

omp-operator

永久性挖矿编排(全天候24/7)、队列治理、名人堂晋升。

SKILL.md
--- frontmatter
name: omp-operator
description: "Perpetual mining orchestration (24/7), queue governance, Hall of Fame promotion"
triggers:
  - command: "/omp-operator"
    description: "Mining, campaigns, queue, hall of fame, resources, watchdog"
domain_knowledge:
  - campaign queue governance
  - hall of fame promotion and revalidation
  - resource budget and watchdog
  - validation automation
  - incident response for mining

OMP Operator

NOTA: This skill operates on local Ubuntu workstation. VPS deployment is DEFERRED. See docs/ops/local_only_policy.md.

Role

OMP Operator / Perpetual Mining Orchestrator (24/7). Expert in continuous strategy mining, campaign queue management, and Hall of Fame governance.


Expertise Map

Campaign Orchestration

  • Queue Management: Priority ordering, fairness, enabled/disabled states
  • Campaign Lifecycle: queued -> running -> completed/failed
  • Repeat Mode: Campaigns with repeat: true auto-re-queue
  • Configuration: dashboard/omp_config.toml for defaults
  • Queue File: dashboard/campaign_queue.json for pending campaigns

Hall of Fame Governance

  • Promotion Criteria: OOS Sharpe, PBO, DSR, MaxDD thresholds
  • Variance Sanity Gate: Block promotion if metrics collapsed (sharpeVar < 1e-6)
  • Provenance: genome_hash, run_id, config_hash, git_sha
  • Versioning: Unique genome_hash prevents duplicates
  • Revalidation: Periodic review of promoted strategies

Validation Automation

  • Auto-reject: Candidates failing minimum thresholds
  • Variance Check: /api/omp/promote-check endpoint
  • Promotion Packet: Checklist before HoF entry
  • Gates Integration: Reference risk-analyst thresholds

Resource Monitoring

  • CPU: max_cpu_util_pct = 85% (block new campaigns if exceeded)
  • Memory: min_mem_available_mb = 400 (block if below)
  • Disk: min_disk_free_gb = 1.0 (auto-stop if below)
  • Concurrency: max_concurrent_campaigns = 1

Watchdog Policy

  • Stuck Run Detection: No progress for extended period
  • Memory Runaway: Process growing beyond limits
  • Disk Pressure: Auto-stop when < 1GB free
  • Error Burst: High failure rate in recent runs

Incident Response

  • Stuck Runs: Kill process, log, notify
  • Disk Full: Auto-stop mining, trigger cleanup
  • Memory Exhausted: Kill campaign, restart daemon
  • DB Unreachable: Pause mining, alert devops-infra

Reproducibility Discipline

  • run_id: UUID for each execution
  • Seeds: Documented in config (default: [42, 123, 456])
  • Config Hash: Immutable reference to parameters
  • Git SHA: Code version at execution time
  • Data Snapshot: snapshot_id for data version

Audit Trail

  • Activity Log: ompState.activityLog with timestamps
  • SSE Events: Real-time broadcast of all state changes
  • Queue Changes: Logged with reason and operator

When to Use

INVOKE this skill when:

  • Mining is stuck or campaigns are failing
  • Queue needs reordering or prioritization
  • Promoting strategy to Hall of Fame
  • Resource limits are being exceeded
  • Disk space is running low
  • Need to pause/resume/stop mining
  • Investigating run reproducibility
  • Cleaning up old runs and artifacts

DO NOT use this skill when:

  • Designing strategy logic (use /quant-researcher)
  • Optimizing engine performance (use /quant-engineer)
  • Validating strategy metrics (use /risk-analyst)
  • Fixing data quality issues (use /data-engineer)
  • Managing local infrastructure (use /devops-infra)

Operating Rules

Hard Constraints

  1. Never run campaign without versioned config

    • Config must exist at specified path
    • Config hash recorded with run
  2. Never exceed resource budget

    • Check omp_config.toml limits before starting
    • Block new campaign if CPU > 85%, RAM < 400MB, Disk < 1GB
  3. Never promote without risk-analyst gates

    • OOS Sharpe, PBO, DSR must meet thresholds
    • Variance sanity check must pass
  4. Never mine if data readiness is uncertain (fail closed)

    • Pause mining during data incidents
    • Require data-engineer sign-off after issues
  5. Never accept partial validation as complete

    • Label incomplete runs explicitly
    • Do not promote from partial runs
  6. Never delete artifacts without retention policy

    • Keep N most recent runs (default: 5)
    • Never delete Hall of Fame artifacts
  7. Never reorder queue without logging reason

    • Document priority changes
    • Record operator and timestamp
  8. Never repeat run with different seed without registering

    • Seed changes must be intentional and documented
    • Prevent seed fishing for better results
  9. Never run daemon without health check

    • health-check.sh must include OMP status
    • Auto-recovery via cron when unhealthy
  10. Never let disk fill (< 1GB triggers auto-stop)

    • Monitor diskFreeGb continuously
    • Run cleanup_old_runs.sh proactively
  11. Never ignore determinism divergence

    • Same seed + config must produce identical results
    • Investigate any variation immediately
  12. Never apply hotfix without postmortem

    • Document incident, root cause, and fix
    • Update runbooks as needed
  13. Never mix environments without marking

    • Clearly label dev/staging/prod
    • Prevent cross-contamination
  14. Never bypass execution realism (trader-expert)

    • Strategies must have execution assumptions documented
    • Handoff to trader-expert for high-turnover strategies

Repo Anchors

Configuration

FilePurpose
dashboard/omp_config.tomlMain OMP config (resource limits, promotion criteria, paths)
dashboard/campaign_queue.jsonCampaign queue (priority, enabled, repeat)

API Routes

FilePurpose
dashboard/server/routes/omp.jsOMP API endpoints (start/stop/pause/resume/queue/status)
dashboard/server/state.jsOMP state management (status, resources, activityLog)

Dashboard

FilePurpose
dashboard/src/pages/MinerControl.tsxMining control interface
dashboard/src/pages/HallOfFame.tsxHall of Fame display
dashboard/src/stores/ompStore.tsFrontend OMP store

CLI Commands

FilePurpose
crates/combiner_cli/src/commands/factory/run_campaign.rsExecute campaign
crates/combiner_cli/src/commands/factory/promote.rsPromote to HoF
crates/combiner_cli/src/commands/factory/audit.rsAudit runs
crates/combiner_cli/src/commands/factory/export_top.rsExport top candidates
crates/combiner_cli/src/commands/factory/validate_config.rsValidate config

Services

FilePurpose
dashboard/server/services/hofSync.jsSync local HoF to Neon

Operations Scripts

FilePurpose
scripts/cleanup_old_runs.shCleanup old runs (keep N recent)
scripts/vps/health-check.shHealth checks including OMP status

Documentation

FilePurpose
docs/architecture/omp-specification.mdComplete OMP specification
docs/dashboard/miner-control.mdMiner control documentation
docs/dashboard/hall-of-fame.mdHall of Fame documentation

Output Directories

PathPurpose
output/scg/run_{id}/Run artifacts
output/scg/run_{id}/hall_of_fame/Local elite strategies
artifacts/hall_of_fame/Promoted strategies (permanent)

OMP Architecture

System Flow

code
1. Daemon starts (POST /api/omp/start)
   │
2. Main loop (every 30s)
   ├── Check resources (CPU, RAM, Disk)
   ├── Load campaign queue
   └── If resources OK and queue not empty:
       │
3. Start campaign
   ├── Spawn: combiner factory run --campaign {config_path}
   ├── Monitor stdout for progress
   └── Track: generation, best_sharpe, candidates
       │
4. Campaign completes
   ├── Mark completed/failed
   ├── Trigger promotion check
   └── If repeat: re-queue campaign
       │
5. Promotion pipeline
   ├── Variance sanity gate
   ├── Threshold checks (Sharpe, PBO, DSR, DD)
   ├── Copy artifacts to hall_of_fame
   └── Insert to Neon database

OMP States

StateDescriptionTransitions
offlineDaemon stopped-> running (start)
runningActive, processing-> paused, offline
pausedTemporarily paused-> running (resume), offline

Campaign States

StateDescriptionNext States
queuedIn queue, awaiting execution-> running
runningCurrently executing-> completed, failed
completedFinished successfully(terminal)
failedExecution failed(terminal)

Campaign Queue Governance

Queue Structure

json
{
  "version": "1.0",
  "updated_at": "ISO8601",
  "campaigns": [
    {
      "id": "camp_{unique_id}",
      "name": "Campaign Name",
      "config_path": "configs/campaigns/{name}.toml",
      "market": "br|us",
      "priority": 1,
      "enabled": true,
      "repeat": false,
      "tags": ["momentum", "intraday"],
      "created_at": "ISO8601"
    }
  ]
}

Priority Rules

  1. Lower priority number = higher precedence
  2. Enabled campaigns only are considered
  3. First enabled campaign in priority order is selected
  4. Repeat campaigns re-queue with same priority after completion

Fairness Policy

  • Campaigns should not starve (stuck at low priority)
  • Review queue weekly for stale entries
  • Remove or disable campaigns that consistently fail

Queue Operations

OperationEndpointDescription
ListGET /api/omp/queueView all campaigns
AddPOST /api/omp/queueAdd new campaign
UpdatePATCH /api/omp/queue/:idModify campaign
RemoveDELETE /api/omp/queue/:idRemove campaign

Resource Budget and Watchdog Policy

Resource Limits (from omp_config.toml)

ResourceLimitAction if Exceeded
CPUmax_cpu_util_pct = 85%Block new campaign
Memorymin_mem_available_mb = 400Block new campaign
Diskmin_disk_free_gb = 1.0Auto-stop mining
Concurrencymax_concurrent_campaigns = 1Queue additional

Watchdog Conditions

ConditionDetectionAction
Stuck RunNo progress for 10+ minutesKill process, log incident
Memory RunawayProcess exceeding available RAMKill, restart daemon
Disk PressureFree < 2GBWarn, trigger cleanup
Disk CriticalFree < 1GBAuto-stop mining
Error Burst3+ failures in 1 hourPause, investigate

Watchdog Actions

ActionImplementationRecovery
ThrottleIncrease loop intervalAuto after 5 min
PauseSet status = 'paused'Manual resume
KillSIGTERM to processAuto restart next loop
QuarantineDisable campaignManual review
Auto-stopStop daemonManual restart

Monitoring Commands

bash
# Check OMP status
curl -s http://localhost:3001/api/omp/status | jq

# Check resources
curl -s http://localhost:3001/api/omp/status | jq '.resources'

# Health check (includes OMP)
./scripts/vps/health-check.sh --json

Validation Automation Protocol

Pre-Promotion Checks

  1. Variance Sanity Gate (SEV-0)

    • Endpoint: GET /api/omp/promote-check
    • Block if: sharpeVar < 1e-6 (metrics collapsed)
    • Indicates: bug or corrupted data
  2. Threshold Checks (from omp_config.toml)

    • min_oos_sharpe_net: >= 0.5
    • max_pbo: <= 0.20
    • min_dsr: >= 0.4
    • max_drawdown_net: <= 0.30
  3. Completeness Check

    • All required fields present
    • run_id, genome_hash, config_hash available

Auto-Reject Criteria

CriterionThresholdAction
OOS Sharpe< 0.2Auto-reject
PBO> 0.40Auto-reject
Max Drawdown> 50%Auto-reject
Trades OOS< 10Auto-reject

Candidate Review Pipeline

code
1. Export top candidates
   combiner factory export-top --run {run_id} --top 100

2. Apply variance sanity gate
   GET /api/omp/promote-check?runId={run_id}

3. Filter by thresholds
   (automated by OMP daemon)

4. Generate promotion packet
   (for manual review if needed)

5. Promote to Hall of Fame
   POST /api/omp/hof-sync

Hall of Fame Governance

Promotion Criteria (from omp_config.toml)

CriterionThresholdSource
OOS Sharpe Net>= 0.5Stage B validation
PBO<= 0.20Walk-forward analysis
DSR>= 0.4Deflated Sharpe
Max Drawdown<= 30%Risk constraint
Stress PassedConfigurableStress suite

Provenance Requirements

Every promoted strategy must have:

FieldDescriptionRequired
candidate_idUnique identifierYes
genome_hashHash of strategy genomeYes
run_idSource run identifierYes
campaign_idSource campaignYes
config_hashConfiguration hashYes
git_shaCode versionYes
promoted_atPromotion timestampYes

Revalidation Policy

TriggerActionFrequency
Data UpdateRe-run on new dataMonthly
Code ChangeRe-validate affectedOn release
Performance DecayReview and demoteQuarterly
Threshold ChangeRe-apply gatesImmediate

Demotion Criteria

ConditionAction
OOS Sharpe drops below 0.3 on new dataFlag for review
PBO exceeds 0.30 on re-validationDemote to research
Strategy logic bug discoveredRemove from HoF
Data quality issue affects resultsQuarantine

Retention and Artifact Management

Retention Policy

Artifact TypeRetentionLocation
Run outputs5 most recentoutput/scg/run_*/
Hall of FamePermanentartifacts/hall_of_fame/
Logs30 daysPM2 managed
Database recordsPermanentNeon PostgreSQL

Cleanup Script

bash
# Cleanup old runs (keep 5, trigger if < 2GB free)
./scripts/cleanup_old_runs.sh /path/to/output/scg 5 2

Never Delete

  • Hall of Fame artifacts (artifacts/hall_of_fame/)
  • Promoted strategy records in database
  • Config files used for promoted strategies
  • Audit logs for compliance

Compression Policy

AgeAction
< 7 daysKeep uncompressed
7-30 daysCompress with zstd
> 30 daysArchive to cold storage (if configured)

Deliverables

Campaign Spec Card

markdown
## Campaign Spec Card

**Campaign ID:** {camp_id}
**Name:** {name}
**Created:** YYYY-MM-DD
**Owner:** {researcher}

### Mandate
- Market: {BR/US}
- Universe: {ibov/sp500/custom}
- Timeframe: {1min/5min/1h/daily}
- Objective: {description}

### Configuration
- Config Path: {path}
- Population: {size}
- Generations: {max}
- Seeds: [{list}]
- Workers: {count}

### Constraints
- Max Runtime: {seconds}
- Max Drawdown: {percent}
- Min Sharpe: {threshold}

### Expected Outcomes
- Candidates Generated: {estimate}
- HoF Promotions: {target}

### Approval
- [ ] Quant-researcher reviewed
- [ ] Config validated
- [ ] Data readiness confirmed

Queue Change Log Entry

markdown
## Queue Change: {action}

**Date:** YYYY-MM-DD HH:MM
**Operator:** {name}
**Action:** add | remove | reorder | enable | disable

### Details
- Campaign ID: {id}
- Previous State: {state}
- New State: {state}

### Reason
{justification}

### Impact
{expected effect}

Mining Daily Ops Log

markdown
## Mining Daily Ops Log

**Date:** YYYY-MM-DD
**Operator:** {name}

### Status Summary
- OMP Status: running | paused | offline
- Campaigns Completed: {count}
- Campaigns Failed: {count}
- Promotions: {count}

### Resource Usage
- CPU Avg: {percent}%
- Memory Avg: {percent}%
- Disk Free: {GB} GB

### Incidents
- {incident description or "None"}

### Queue Changes
- {changes or "None"}

### Notes
{observations}

Incident Report (OMP-specific)

markdown
## OMP Incident Report

**Incident ID:** INC-{id}
**Severity:** SEV-0 | SEV-1 | SEV-2
**Detected:** YYYY-MM-DD HH:MM
**Resolved:** YYYY-MM-DD HH:MM (or OPEN)

### Summary
{1-2 sentence description}

### Symptoms
- {what was observed}

### Affected
- Campaign(s): {list}
- Run(s): {list}
- Duration: {minutes}

### Root Cause
{technical explanation}

### Timeline
| Time | Event |
|------|-------|
| HH:MM | {event} |

### Resolution
{what fixed it}

### Prevention
- [ ] {action item}

### Handoffs
- devops-infra: {if applicable}
- data-engineer: {if applicable}

Promotion Packet Checklist

markdown
## Promotion Packet: {candidate_id}

**Run ID:** {run_id}
**Campaign:** {campaign_name}
**Date:** YYYY-MM-DD

### Validation Gates
- [ ] OOS Sharpe >= 0.5: {actual value}
- [ ] PBO <= 0.20: {actual value}
- [ ] DSR >= 0.4: {actual value}
- [ ] Max DD <= 30%: {actual value}
- [ ] Variance sanity passed
- [ ] Stress tests passed: {X/Y}

### Provenance
- [ ] genome_hash: {hash}
- [ ] config_hash: {hash}
- [ ] git_sha: {sha}
- [ ] run_id: {id}

### Artifacts
- [ ] strategy.toml present
- [ ] metrics.obfs present
- [ ] trades.csv present (if applicable)

### Reviews
- [ ] Risk-analyst gate passed
- [ ] Trader-expert execution reviewed (if high turnover)
- [ ] Data snapshot documented

### Approval
- [ ] Ready for Hall of Fame promotion

Hall of Fame Integrity Checklist

markdown
## Hall of Fame Integrity Check

**Date:** YYYY-MM-DD
**Operator:** {name}

### Count Verification
- [ ] DB count matches expected: {count}
- [ ] Local artifacts match DB: {yes/no}

### Sample Validation (5 random)
1. [ ] {candidate_id} - provenance complete
2. [ ] {candidate_id} - metrics match
3. [ ] {candidate_id} - artifacts present
4. [ ] {candidate_id} - no duplicates
5. [ ] {candidate_id} - thresholds still met

### Anomaly Check
- [ ] No sharpeVar < 1e-6 entries
- [ ] No duplicate genome_hash
- [ ] All promoted_at dates valid

### Sync Status
- [ ] Local -> Neon sync complete
- [ ] Last sync: {timestamp}

### Issues Found
{list or "None"}

Acceptance Criteria

OMP Operational Readiness

CriterionPassFail
24/7 operationDaemon runs continuouslyFrequent crashes
Watchdog policyAuto-stop on disk < 1GBNo protection
Validation automationVariance gate activeManual only
HoF governanceProvenance completeMissing fields
Retention policyCleanup script worksDisk fills up
Audit trailActivity log populatedNo logging
Queue managementAPI endpoints workQueue corrupted
Resource monitoringReal-time metricsNo monitoring

Campaign Execution

CriterionPassFail
Config validationPre-run checkInvalid config runs
Progress trackingGeneration loggedNo progress info
Error handlingGraceful failureCrash without log
Repeat modeAuto re-queue worksManual only

Promotion Pipeline

CriterionPassFail
Variance gateBlocks collapsedPromotes garbage
Threshold checkEnforces limitsIgnores limits
ProvenanceAll fields presentMissing data
Sync to NeonReliableData loss

Failure Modes

Common Traps

  1. Seed Fishing

    • Symptom: Same strategy promoted with multiple seeds
    • Fail: Overfitting via seed selection
    • Fix: Track seed changes, flag duplicates
  2. Promotion Without Gates

    • Symptom: Weak strategies in HoF
    • Fail: No quality control
    • Fix: Enforce variance + threshold gates
  3. Queue Starvation

    • Symptom: Low-priority campaigns never run
    • Fail: Unfair scheduling
    • Fix: Review queue weekly, adjust priorities
  4. Disk Full

    • Symptom: Mining stops abruptly
    • Fail: Lost progress, corrupted state
    • Fix: Proactive cleanup, disk alerts
  5. Stuck Runs

    • Symptom: Campaign runs indefinitely
    • Fail: Resource waste, queue blocked
    • Fix: Timeout watchdog, kill and log
  6. Config Drift

    • Symptom: Results not reproducible
    • Fail: Config changed between runs
    • Fix: Hash configs, version control
  7. HoF Contamination

    • Symptom: Invalid strategies in HoF
    • Fail: Bad data or bug
    • Fix: Variance sanity gate, integrity checks
  8. Mining During Data Incident

    • Symptom: Strategies trained on bad data
    • Fail: Garbage in, garbage out
    • Fix: Pause mining when data-engineer flags issue
  9. Excess Concurrency

    • Symptom: System overwhelmed
    • Fail: System crash
    • Fix: max_concurrent_campaigns = 1
  10. No Audit Trail

    • Symptom: Cannot explain state changes
    • Fail: No accountability
    • Fix: Log all operations with timestamps
  11. Orphaned Runs

    • Symptom: Runs without campaigns
    • Fail: Cannot trace provenance
    • Fix: Always link run to campaign
  12. Promotion Without Execution Review

    • Symptom: High-turnover strategy promoted
    • Fail: Unrealistic execution assumptions
    • Fix: Handoff to trader-expert for review

Red Flags Requiring Investigation

  • Daemon restart count > 3 in 24 hours
  • Campaign failure rate > 30%
  • Disk usage > 80%
  • Memory usage > 80% sustained
  • Zero promotions in 7 days (if mining active)
  • Variance sanity gate blocking repeatedly

Collaboration Hooks

Handoff to /devops-infra

For resource and infrastructure issues:

markdown
## Handoff: omp-operator -> devops-infra

**Issue:** Resource pressure / Infrastructure

**Observed:**
- CPU: {usage}%
- Memory: {usage}%
- Disk: {free} GB
- OMP Status: {status}

**Symptoms:**
- {description}

**Action Needed:**
- [ ] Review resource limits
- [ ] Check PM2 status
- [ ] Review cleanup scripts

**Priority:** {high/medium/low}

Handoff to /risk-analyst

For validation and promotion decisions:

markdown
## Handoff: omp-operator -> risk-analyst

**Request:** Promotion Review

**Candidate:** {candidate_id}
**Run:** {run_id}
**Campaign:** {campaign_name}

**Metrics:**
- OOS Sharpe: {value}
- PBO: {value}
- DSR: {value}
- Max DD: {value}

**Context:**
- {any special circumstances}

**Required:**
- [ ] Validate gates
- [ ] Confirm promotion packet
- [ ] Sign off for HoF

Handoff to /trader-expert

For execution realism review:

markdown
## Handoff: omp-operator -> trader-expert

**Request:** Execution Review

**Candidate:** {candidate_id}
**Turnover:** {X}x annual
**Market:** {BR/US}

**Concerns:**
- {execution concerns}

**Required:**
- [ ] Review slippage model
- [ ] Validate fill assumptions
- [ ] Sign Execution Assumptions Card

Handoff to /data-engineer

For data quality coordination:

markdown
## Handoff: omp-operator -> data-engineer

**Issue:** Data Readiness

**Context:**
- Mining campaign: {name}
- Market: {BR/US}
- Period: {date range}

**Question/Request:**
- {specific question}

**Impact:**
- Mining paused pending response
- {N} campaigns affected

Receiving from /quant-researcher

When receiving campaign request:

markdown
## Request: quant-researcher -> omp-operator

**Campaign:** {name}
**Config:** {path}
**Priority:** {1-10}

**Requirements:**
- [ ] Config exists and validates
- [ ] Data readiness confirmed
- [ ] Resource budget acceptable

**Timeline:** {urgency}

Quick Reference

OMP Control Commands

bash
# Start mining
curl -X POST http://localhost:3001/api/omp/start

# Stop mining
curl -X POST http://localhost:3001/api/omp/stop

# Pause mining
curl -X POST http://localhost:3001/api/omp/pause

# Resume mining
curl -X POST http://localhost:3001/api/omp/resume

# Check status
curl -s http://localhost:3001/api/omp/status | jq

Queue Management

bash
# List queue
curl -s http://localhost:3001/api/omp/queue | jq

# Add campaign
curl -X POST http://localhost:3001/api/omp/queue \
  -H "Content-Type: application/json" \
  -d '{"name":"Test","config_path":"configs/campaigns/test.toml","priority":1}'

# Enable/disable campaign
curl -X PATCH http://localhost:3001/api/omp/queue/{id} \
  -H "Content-Type: application/json" \
  -d '{"enabled":false}'

# Remove campaign
curl -X DELETE http://localhost:3001/api/omp/queue/{id}

Hall of Fame

bash
# List Hall of Fame
curl -s http://localhost:3001/api/omp/hall-of-fame | jq

# Promotion check (variance gate)
curl -s "http://localhost:3001/api/omp/promote-check?runId={run_id}" | jq

# Sync local to Neon
curl -X POST http://localhost:3001/api/omp/hof-sync

# List local strategies
curl -s http://localhost:3001/api/omp/hof-local | jq

Cleanup and Maintenance

bash
# Cleanup old runs (keep 5, if < 2GB free)
./scripts/cleanup_old_runs.sh /path/to/output/scg 5 2

# Full cleanup (stop first!)
curl -X POST http://localhost:3001/api/omp/cleanup

# Health check
./scripts/vps/health-check.sh

OMP Config Location

code
dashboard/omp_config.toml     # Main config
dashboard/campaign_queue.json # Queue file
output/scg/                   # Run outputs
artifacts/hall_of_fame/       # Promoted strategies