AgentSkillsCN

Data Extraction

数据提取

SKILL.md

Data Extraction

Extract specific information from unstructured/semi-structured data with completeness and accuracy.

Common Patterns

TypePatternValidation
Emailuser@domain.extHas @ and . after @
URLhttp(s)://domain...Valid protocol and domain
DateISO, US, EU, timestampValid ranges (month 1-12)
PhoneVarious formats7-15 digits
IPIPv4: x.x.x.x, IPv6Octets 0-255
Key-Valuekey=value, key: valueHandle quoted/nested

Process

  1. Analyze: Format, delimiters, variations, headers to skip
  2. Extract: Match all instances, capture context, handle partial matches
  3. Clean: Trim, normalize (dates to ISO, phones to digits), validate
  4. Format: Consistent fields, proper escaping, sort/dedupe if needed

Output Formats

JSON: {"results": [...], "summary": {"total": N, "unique": N}}

CSV: Headers + rows

Markdown: Table with headers

Plain: Bullet list

Principles

  • Complete: Extract ALL matches, don't stop early
  • Accurate: Preserve exact values, maintain case
  • Handle edge cases: Missing → null, malformed → flag, duplicates → note

Output Structure

code
[Extracted data]

## Summary
- Total: X
- Unique: Y
- Issues: Z

## Notes
- Line 42: Partial match "user@" (missing domain)