Content Filter Skill

Filter and classify incoming content for relevance to AI research intelligence. This skill is optimized for high-throughput bulk processing.

Purpose

The content filter is the first stage of the extraction pipeline. It quickly assesses content to:

•Determine relevance to AI research discourse
•Classify by topic and content type
•Identify author category
•Filter out noise before expensive extraction

Assessment Schema

For each piece of content, produce:

1. relevance (0.0-1.0)

How relevant is this to AI research intelligence?

Score	Meaning
0.9-1.0	Highly relevant - substantial claims, predictions, or hints
0.7-0.9	Clearly relevant - discusses AI capabilities, progress, or debate
0.5-0.7	Moderately relevant - tangentially about AI or tech industry
0.3-0.5	Low relevance - may contain signal but mostly noise
0.0-0.3	Not relevant - personal, off-topic, or pure promotion

2. topic

Primary topic category:

•scaling: Scaling laws, compute, training efficiency
•reasoning: LLM reasoning, chain-of-thought, planning
•agents: AI agents, tool use, autonomy
•safety: AI safety, alignment, control
•interpretability: Mechanistic interpretability
•multimodal: Vision, audio, video models
•rlhf: RLHF, preference learning, Constitutional AI
•benchmarks: Evals, benchmarks, capability measurement
•infrastructure: Training infra, chips, hardware
•policy: AI policy, regulation, governance
•general: General AI commentary
•other: Doesn't fit categories

3. contentType

What kind of content is this?

•prediction: Forward-looking claims about AI
•research-hint: Suggests unreleased work or capabilities
•opinion: Positioned takes on AI progress/limitations
•factual: Reports on current state or recent events
•critique: Challenges claims or work by others
•meta: About the AI discourse itself
•noise: Not substantive (personal, promotion, etc.)

4. authorCategory

Who is the author?

•lab-researcher: Works at major AI lab (Anthropic, OpenAI, DeepMind, Meta, xAI, etc.)
•critic: Known skeptic with credentials (Marcus, Chollet, Mitchell, Bender, etc.)
•academic: Academic researcher not at major lab
•independent: Independent practitioner or commentator
•journalist: Tech journalist or media
•unknown: Cannot determine

5. isSubstantive (boolean)

Does this contain actual claims worth extracting?

•true: Contains specific assertions, predictions, or valuable signal
•false: Too general, vague, or promotional to extract claims from

6. brief

One sentence summary of the content (max 100 characters).

Output Format

Return JSON:

json

{
  "assessments": [
    {
      "itemIndex": 0,
      "relevance": 0.85,
      "topic": "reasoning",
      "contentType": "opinion",
      "authorCategory": "lab-researcher",
      "isSubstantive": true,
      "brief": "Claims chain-of-thought has hit diminishing returns"
    }
  ],
  "processingNotes": "Optional batch-level observations"
}

Quick Classification Heuristics

High Relevance (0.7-1.0)

•Contains specific claims about AI capabilities
•Predictions with timeframes
•Technical discussion of methods/results
•Critique with reasoning
•Hints about unreleased work
•Debates between researchers

Medium Relevance (0.4-0.7)

•General commentary on AI field
•Sharing papers/articles with brief comment
•Reactions to announcements
•Meta-discussion about discourse
•Industry news without analysis

Low Relevance (0.0-0.4)

•Personal updates unrelated to AI
•Off-topic content
•Pure promotion without substance
•Scheduling/logistics
•Simple retweets without commentary
•"Interesting paper" without substantive comment

Author Detection Tips

Lab Researchers

Look for:

•Bio mentions: Anthropic, OpenAI, DeepMind, Google Brain, Meta AI, xAI, Mistral
•Known handles: @daborenstein, @sama, @kaborl, etc.
•Technical depth suggesting insider knowledge

Critics

Known handles and patterns:

•@garymarcus, @fchollet, @mmitchell_ai, @emilymbender
•Pattern of challenging mainstream AI claims
•Academic credentials combined with public skepticism

Independent

•No lab affiliation
•Often practitioners or commentators
•Examples: @simonw, @drjimfan, @nathanlambert

Processing Guidelines

Speed Over Depth

This skill is for throughput. Make quick assessments based on:

•Keywords and phrases
•Author identity (if known)
•Content structure
•Obvious signals

Conservative Filtering

When in doubt about relevance:

•Score 0.3-0.5 to keep for human review
•Don't filter out potentially valuable content
•False positives are okay; false negatives lose signal

Batch Efficiency

When processing batches:

•Process items in order
•Output assessments matching input order
•Note any batch-level patterns in processingNotes