Lead Enrichment — Multi-Source Data Completion
Enrich CRM contact records by filling missing fields from multiple sources. Works with DuckDB workspace entries or standalone JSON data.
Sources (Priority Order)
- •LinkedIn (via linkedin-scraper skill) — name, title, company, education, connections
- •Web Search (via web_search tool) — email patterns, company info, social profiles
- •Company Website (via web_fetch) — team pages, about pages, contact info
- •Email Pattern Discovery — derive email from name + company domain
Enrichment Pipeline
Step 1: Assess What's Missing
sql
-- Query the target object to find gaps SELECT "Name", "Email", "LinkedIn URL", "Company", "Title", "Location" FROM v_leads WHERE "Email" IS NULL OR "LinkedIn URL" IS NULL OR "Title" IS NULL;
Step 2: Prioritize by Value
- •High priority: Missing email (needed for outreach)
- •Medium priority: Missing title/company (needed for personalization)
- •Low priority: Missing education, connections count, about text
Step 3: Enrich Per Record
For each record with gaps:
If LinkedIn URL is known but other fields missing:
- •Use linkedin-scraper to visit profile
- •Extract: title, company, location, education, about
- •Update DuckDB record
If LinkedIn URL is missing:
- •Search LinkedIn:
{name} {company}or{name} {title} - •Verify match (name + company alignment)
- •Store LinkedIn URL, then scrape full profile
If Email is missing:
- •Find company domain (web search or LinkedIn company page)
- •Try common patterns:
- •
first@domain.com - •
first.last@domain.com - •
flast@domain.com - •
firstl@domain.com
- •
- •Optionally verify with web search:
"email" "{name}" site:{domain} - •Check company team/about page for email format clues
If Company info is missing:
- •Web search:
"{name}" "{title}"or check LinkedIn - •Fetch company website for: industry, size, description, funding
Step 4: Update Records
sql
-- Update via DuckDB pivot view UPDATE v_leads SET "Email" = ?, "LinkedIn URL" = ?, "Title" = ?, "Company" = ?, "Location" = ? WHERE id = ?;
Bulk Enrichment Mode
For enriching many records at once:
- •Query all incomplete records from DuckDB
- •Group by company (scrape company once, apply to all employees)
- •Process in batches of 10-20 records
- •Report progress after each batch:
code
Enrichment Progress: 45/120 leads (38%) ├── Emails found: 32/45 (71%) ├── LinkedIn matched: 41/45 (91%) ├── Titles updated: 38/45 (84%) └── ETA: ~15 min remaining
- •Save checkpoint after each batch (in case of interruption)
Enrichment Quality Rules
- •Confidence scoring: Mark each enriched field with confidence (high/medium/low)
- •High: Direct match from LinkedIn profile or company website
- •Medium: Inferred from patterns (email format) or partial match
- •Low: Best guess from web search results
- •Never overwrite existing data unless explicitly asked
- •Flag conflicts: If enriched data contradicts existing data, flag for review
- •Dedup check: Before inserting LinkedIn URL, check it's not already assigned to another contact
Email Pattern Discovery
Common corporate email formats by frequency:
- •
first.last@domain.com(most common, ~45%) - •
first@domain.com(~20%) - •
flast@domain.com(~15%) - •
firstl@domain.com(~10%) - •
first_last@domain.com(~5%) - •
last.first@domain.com(~3%) - •
first.l@domain.com(~2%)
Strategy:
- •If you know one person's email at the company, derive the pattern
- •Search web for
"@{domain}" email format - •Check company team page source code for mailto: links
- •Use the most common pattern as fallback
Output
After enrichment, provide a summary:
code
Enrichment Complete: 120 leads processed ├── Emails: 94 found (78%), 26 still missing ├── LinkedIn: 108 matched (90%), 12 not found ├── Titles: 115 updated (96%) ├── Companies: 118 confirmed (98%) ├── Locations: 89 found (74%) └── Avg confidence: High (82%), Medium (14%), Low (4%) Top gaps remaining: - 26 leads missing email (mostly small/stealth companies) - 12 leads missing LinkedIn (common names, ambiguous matches)
DuckDB Field Mapping
Standard field names for Ironclaw CRM objects:
| Enrichment Data | DuckDB Field | Type |
|---|---|---|
| Full name | Name | text |
| Email address | ||
| LinkedIn URL | LinkedIn URL | url |
| Job title | Title | text |
| Company name | Company | text / relation |
| Location | Location | text |
| Education | Education | text |
| Phone | Phone | phone |
| Company size | Company Size | text |
| Industry | Industry | text |
| Enrichment date | Enriched At | date |
| Confidence | Enrichment Confidence | enum (high/medium/low) |