AgentSkillsCN

ground-truth-collector

在设计数据标注工作流时使用。建议在模型训练前使用。该技能可生成标注指南、质量控制流程,以及标注人员培训材料。

SKILL.md
--- frontmatter
name: ground-truth-collector
description: Use when designing data annotation workflows. Use before model training. Produces annotation guidelines, quality control processes, and labeler training materials.

Ground Truth Collector

Overview

Design effective data annotation workflows that produce high-quality labeled data for AI training. Define guidelines, quality control, and labeler training.

Core principle: Label quality determines model ceiling. Invest in annotation quality upfront to avoid model quality issues later.

When to Use

  • Starting new labeling project
  • Improving label quality
  • Scaling annotation team
  • Defining new label schema

Output Format

yaml
annotation_project:
  name: "[Project name]"
  date: "[YYYY-MM-DD]"
  
  scope:
    data_type: "[Text | Image | Audio | etc.]"
    task_type: "[Classification | NER | Bounding box | etc.]"
    volume: "[Total items to label]"
    timeline: "[Duration]"
  
  label_schema:
    labels:
      - label: "[Label name]"
        definition: "[Clear definition]"
        examples:
          positive: ["[Example that IS this label]"]
          negative: ["[Example that is NOT this label]"]
        edge_cases: ["[Guidance for ambiguous cases]"]
  
  guidelines:
    document: "[Link to full guidelines]"
    key_rules:
      - "[Rule 1]"
      - "[Rule 2]"
    
    decision_tree:
      - condition: "[If this]"
        then: "[Label as this]"
  
  quality_control:
    inter_annotator_agreement:
      metric: "[Cohen's Kappa | Fleiss Kappa | etc.]"
      threshold: "[Acceptable level]"
    
    review_process:
      method: "[Gold standard | Expert review | Consensus]"
      sample_rate: "[% reviewed]"
    
    feedback_loop:
      - "[How labelers get feedback]"
  
  labeler_training:
    onboarding:
      - "[Training step 1]"
    qualification:
      test: "[How to test readiness]"
      passing: "[Threshold]"
  
  workflow:
    tool: "[Annotation tool]"
    batching: "[How work is assigned]"
    escalation: "[How to handle unclear cases]"
  
  metrics:
    throughput: "[Items per hour target]"
    quality: "[Accuracy target]"
    consistency: "[IAA target]"

Annotation Guidelines Template

Purpose

yaml
guideline_structure:
  overview:
    - "Task description"
    - "Why this matters"
    - "How labels will be used"
  
  label_definitions:
    - label: "[Label]"
      definition: "[1-2 sentence definition]"
      criteria: ["[What makes something this label]"]
  
  examples:
    - type: "Clear positive"
      example: "[Example]"
      label: "[Label]"
      explanation: "[Why]"
    
    - type: "Clear negative"
      example: "[Example]"
      not_label: "[Label]"
      explanation: "[Why not]"
    
    - type: "Edge case"
      example: "[Ambiguous example]"
      label: "[Correct label]"
      reasoning: "[How to decide]"
  
  common_mistakes:
    - mistake: "[What labelers often do wrong]"
      correction: "[What to do instead]"

Quality Control Methods

Inter-Annotator Agreement (IAA)

yaml
iaa_methods:
  two_annotators:
    metric: "Cohen's Kappa"
    formula: "(Observed agreement - Expected) / (1 - Expected)"
    interpretation:
      ">0.8": "Excellent agreement"
      "0.6-0.8": "Good agreement"
      "0.4-0.6": "Moderate agreement"
      "<0.4": "Poor agreement - review guidelines"
  
  multiple_annotators:
    metric: "Fleiss' Kappa"
    use_when: "3+ annotators per item"

Review Strategies

StrategyDescriptionWhen to Use
Gold standardCompare to expert-labeled setNew labelers, quality checks
Expert reviewExpert reviews sampleHigh-stakes data
ConsensusMultiple labelers, take majoritySubjective tasks
AdjudicationExpert resolves disagreementsWhen consensus fails

Labeler Training

Onboarding Process

yaml
training_process:
  1_guidelines_review:
    duration: "30-60 min"
    content: "Read and understand guidelines"
    quiz: "Check comprehension"
  
  2_guided_practice:
    duration: "1-2 hours"
    content: "Label examples with feedback"
    sample_size: "20-50 items"
  
  3_qualification_test:
    duration: "30-60 min"
    content: "Unsupervised labeling"
    sample_size: "30-50 items"
    passing: ">90% agreement with gold standard"
  
  4_gradual_independence:
    content: "Start live work with higher review rate"
    review_rate: "100% → 20% as quality proves out"

Ongoing Quality

yaml
ongoing_quality:
  regular_checks:
    - "Insert gold standard items periodically"
    - "Track individual labeler performance"
    - "Provide feedback on errors"
  
  calibration:
    frequency: "Weekly or after guideline changes"
    method: "All labelers label same set, discuss disagreements"
  
  retraining:
    trigger: "Quality drops below threshold"
    process: "Focused training on error patterns"

Workflow Design

Tool Selection Criteria

yaml
tool_criteria:
  must_have:
    - "Supports data type and task type"
    - "Quality control features"
    - "Export in needed format"
  
  nice_to_have:
    - "Active learning suggestions"
    - "Pre-labeling automation"
    - "Labeler performance analytics"

Batch Assignment

yaml
batching_strategy:
  size: "50-100 items per batch"
  assignment: "Random or stratified"
  overlap: "10-20% double-labeled for IAA"

Checklist

  • Label schema defined
  • Guidelines documented with examples
  • Quality control process defined
  • Labeler training designed
  • Tool selected and configured
  • Workflow documented
  • Metrics and thresholds set