Skill: lead-harvest
Goal
Harvest raw lead candidates from competitor references + industry directories without violating ToS.
Inputs
- •config/competitors.yaml
- •config/sources.yaml
- •policies.yaml
Outputs
- •staging/leads_raw.parquet
- •logs/harvest.log
Procedure
- •For each competitor domain:
- •discover pages with keywords: kunden|referenzen|case study|success story
- •crawl depth=2 (configurable)
- •Extract candidate entities:
- •company name, location, website (if present), evidence_url, snippet
- •For each directory/fair source:
- •run the corresponding spider / extractor
- •Store each lead with:
- •evidence_url, fetched_at, content_hash