Content Filter Skill
Filter and classify incoming content for relevance to AI research intelligence. This skill is optimized for high-throughput bulk processing.
Purpose
The content filter is the first stage of the extraction pipeline. It quickly assesses content to:
- •Determine relevance to AI research discourse
- •Classify by topic and content type
- •Identify author category
- •Filter out noise before expensive extraction
Assessment Schema
For each piece of content, produce:
1. relevance (0.0-1.0)
How relevant is this to AI research intelligence?
| Score | Meaning |
|---|---|
| 0.9-1.0 | Highly relevant - substantial claims, predictions, or hints |
| 0.7-0.9 | Clearly relevant - discusses AI capabilities, progress, or debate |
| 0.5-0.7 | Moderately relevant - tangentially about AI or tech industry |
| 0.3-0.5 | Low relevance - may contain signal but mostly noise |
| 0.0-0.3 | Not relevant - personal, off-topic, or pure promotion |
2. topic
Primary topic category:
- •
scaling: Scaling laws, compute, training efficiency - •
reasoning: LLM reasoning, chain-of-thought, planning - •
agents: AI agents, tool use, autonomy - •
safety: AI safety, alignment, control - •
interpretability: Mechanistic interpretability - •
multimodal: Vision, audio, video models - •
rlhf: RLHF, preference learning, Constitutional AI - •
benchmarks: Evals, benchmarks, capability measurement - •
infrastructure: Training infra, chips, hardware - •
policy: AI policy, regulation, governance - •
general: General AI commentary - •
other: Doesn't fit categories
3. contentType
What kind of content is this?
- •
prediction: Forward-looking claims about AI - •
research-hint: Suggests unreleased work or capabilities - •
opinion: Positioned takes on AI progress/limitations - •
factual: Reports on current state or recent events - •
critique: Challenges claims or work by others - •
meta: About the AI discourse itself - •
noise: Not substantive (personal, promotion, etc.)
4. authorCategory
Who is the author?
- •
lab-researcher: Works at major AI lab (Anthropic, OpenAI, DeepMind, Meta, xAI, etc.) - •
critic: Known skeptic with credentials (Marcus, Chollet, Mitchell, Bender, etc.) - •
academic: Academic researcher not at major lab - •
independent: Independent practitioner or commentator - •
journalist: Tech journalist or media - •
unknown: Cannot determine
5. isSubstantive (boolean)
Does this contain actual claims worth extracting?
- •
true: Contains specific assertions, predictions, or valuable signal - •
false: Too general, vague, or promotional to extract claims from
6. brief
One sentence summary of the content (max 100 characters).
Output Format
Return JSON:
json
{
"assessments": [
{
"itemIndex": 0,
"relevance": 0.85,
"topic": "reasoning",
"contentType": "opinion",
"authorCategory": "lab-researcher",
"isSubstantive": true,
"brief": "Claims chain-of-thought has hit diminishing returns"
}
],
"processingNotes": "Optional batch-level observations"
}
Quick Classification Heuristics
High Relevance (0.7-1.0)
- •Contains specific claims about AI capabilities
- •Predictions with timeframes
- •Technical discussion of methods/results
- •Critique with reasoning
- •Hints about unreleased work
- •Debates between researchers
Medium Relevance (0.4-0.7)
- •General commentary on AI field
- •Sharing papers/articles with brief comment
- •Reactions to announcements
- •Meta-discussion about discourse
- •Industry news without analysis
Low Relevance (0.0-0.4)
- •Personal updates unrelated to AI
- •Off-topic content
- •Pure promotion without substance
- •Scheduling/logistics
- •Simple retweets without commentary
- •"Interesting paper" without substantive comment
Author Detection Tips
Lab Researchers
Look for:
- •Bio mentions: Anthropic, OpenAI, DeepMind, Google Brain, Meta AI, xAI, Mistral
- •Known handles: @daborenstein, @sama, @kaborl, etc.
- •Technical depth suggesting insider knowledge
Critics
Known handles and patterns:
- •@garymarcus, @fchollet, @mmitchell_ai, @emilymbender
- •Pattern of challenging mainstream AI claims
- •Academic credentials combined with public skepticism
Independent
- •No lab affiliation
- •Often practitioners or commentators
- •Examples: @simonw, @drjimfan, @nathanlambert
Processing Guidelines
Speed Over Depth
This skill is for throughput. Make quick assessments based on:
- •Keywords and phrases
- •Author identity (if known)
- •Content structure
- •Obvious signals
Conservative Filtering
When in doubt about relevance:
- •Score 0.3-0.5 to keep for human review
- •Don't filter out potentially valuable content
- •False positives are okay; false negatives lose signal
Batch Efficiency
When processing batches:
- •Process items in order
- •Output assessments matching input order
- •Note any batch-level patterns in processingNotes