Company Curator
Maintain and expand the company universe tracked by the job analytics pipeline. Ensure career page URLs are valid, companies are properly configured, and coverage gaps are identified.
When to Use This Skill
Trigger when user asks to:
- •Add a new company to tracking
- •Check if companies are still valid
- •Find companies returning 0 jobs
- •Expand coverage to new companies
- •Validate career page URLs
- •Update company configurations
- •Identify broken or changed career pages
Company Inventory
Current Coverage
| Source | Companies | Config File |
|---|---|---|
| Greenhouse | 302 | config/greenhouse/company_ats_mapping.json |
| Lever | 61 | config/lever/company_mapping.json |
| Adzuna | N/A (API) | Cities configured in workflows |
Config File Formats
Greenhouse (company_ats_mapping.json):
{
"Company Name": {
"slug": "company-slug",
"url": "https://job-boards.greenhouse.io/company-slug"
}
}
Lever (company_mapping.json):
{
"Company Name": {
"slug": "company-slug",
"url": "https://jobs.lever.co/company-slug"
}
}
Adding New Companies
Step 1: Identify the ATS
Greenhouse indicators:
- •URL contains
greenhouse.ioorboards.greenhouse.io - •Career page redirects to Greenhouse-hosted page
- •Job listings have Greenhouse application forms
Lever indicators:
- •URL contains
jobs.lever.coorlever.co - •Career page has Lever branding
- •Application flow through Lever
Other ATS (not currently supported):
- •Workday, Ashby, BambooHR, etc.
Step 2: Find the Slug
For Greenhouse:
# Visit the careers page and find the Greenhouse URL # Examples: # https://boards.greenhouse.io/anthropic -> slug = "anthropic" # https://job-boards.greenhouse.io/stripe -> slug = "stripe" # https://boards.eu.greenhouse.io/company -> slug = "company" (EU)
For Lever:
# Visit careers page and find Lever URL # Examples: # https://jobs.lever.co/figma -> slug = "figma" # https://jobs.eu.lever.co/company -> slug = "company" (EU)
Step 3: Validate the Slug
Unified validation (all ATS platforms):
# Validate specific companies python pipeline/utilities/validate_ats_slugs.py greenhouse --companies company-slug python pipeline/utilities/validate_ats_slugs.py lever --companies company-slug python pipeline/utilities/validate_ats_slugs.py ashby --companies company-slug
Or manually check:
# Greenhouse curl -s "https://boards.greenhouse.io/company-slug" | head -20 # Lever curl -s "https://api.lever.co/v0/postings/company-slug" | head -20 # Ashby curl -s "https://api.ashbyhq.com/posting-api/job-board/company-slug" | head -20
Step 4: Add to Config
For Greenhouse:
# In config/greenhouse/company_ats_mapping.json
{
"New Company": {
"slug": "newcompany",
"url": "https://boards.greenhouse.io/newcompany"
}
}
For Lever:
# In config/lever/company_mapping.json
{
"New Company": {
"slug": "newcompany",
"url": "https://jobs.lever.co/newcompany"
}
}
Step 5: Test the Addition
# Test Greenhouse company python wrappers/fetch_jobs.py --sources greenhouse --companies "New Company" --dry-run # Test Lever company python wrappers/fetch_jobs.py --sources lever --companies "New Company" --dry-run
Company Health Checks
Finding Companies with 0 Jobs
-- Companies in config but returning 0 jobs recently -- (Run after a full pipeline execution) -- Check GHA logs for: -- "Jobs found: 0" patterns -- Timeout errors -- 404 responses
Common reasons for 0 jobs:
| Reason | Diagnosis | Action |
|---|---|---|
| Company stopped hiring | Check careers page manually | Keep in config (temporary) |
| Slug changed | 404 in logs | Update slug |
| ATS migration | Different URL structure | Update URL or remove |
| Geographic filter | Jobs exist but not in target cities | Expected behavior |
| Title filter | Jobs exist but not target roles | Expected behavior |
Validating Existing Companies
# Validate all slugs for any ATS platform python pipeline/utilities/validate_ats_slugs.py greenhouse python pipeline/utilities/validate_ats_slugs.py lever python pipeline/utilities/validate_ats_slugs.py ashby
Validation checks:
- •URL returns 200 status
- •Page contains job listings (not empty)
- •No redirect to different domain
- •Response time < 10 seconds
Identifying Stale Companies
-- Companies not seen in last 30 days SELECT DISTINCT employer_name, MAX(scraped_at) as last_seen FROM enriched_jobs GROUP BY employer_name HAVING MAX(scraped_at) < NOW() - INTERVAL '30 days' ORDER BY last_seen;
Expanding Coverage
Discovery Methods
1. Competitor analysis:
# Find companies similar to existing ones # Check "Similar companies" on LinkedIn # Review industry reports
2. Job board mining:
# Search Adzuna results for companies using Greenhouse/Lever # Look for patterns in application URLs
3. Greenhouse/Lever directories:
# Greenhouse customer list (limited public info) # Lever customer case studies
4. Tech company lists:
# YC company directory # Crunchbase filters # Built In city lists
Discovery Script
# Find potential Greenhouse companies from Adzuna data python pipeline/utilities/discover_greenhouse_slugs.py # Find potential Lever companies python scrapers/lever/discover_lever_companies.py
Company Categories
By Industry (for targeting)
| Category | Examples | Priority |
|---|---|---|
| Big Tech | Google, Meta, Amazon | High |
| AI/ML | Anthropic, OpenAI, Cohere | High |
| Fintech | Stripe, Plaid, Affirm | High |
| SaaS | Salesforce, Datadog, Snowflake | Medium |
| Startups | YC companies, Series A-C | Medium |
| Enterprise | Traditional tech companies | Low |
By ATS Platform
Track which companies use which ATS for coverage planning:
| ATS | Coverage | Notes |
|---|---|---|
| Greenhouse | 348 companies | Primary source |
| Lever | 61 companies | Secondary source |
| Workday | 0 | Not supported (complex) |
| Ashby | 0 | Potential future addition |
| Custom | 0 | Per-company scraping needed |
Maintenance Tasks
Weekly Checks
- •Review GHA logs for companies with 0 jobs
- •Check for new 404 errors
- •Verify no company has been returning 0 for 4+ weeks
Monthly Checks
- •Run full slug validation
- •Review companies for ATS migrations
- •Add 5-10 new companies from discovery
Quarterly Checks
- •Audit company list against industry changes
- •Remove defunct companies
- •Rebalance coverage across industries
Output Format
When curating companies, produce:
## Company Curation Report
**Date:** [Date]
**Scope:** [What was checked]
### Current Inventory
| Source | Total | Active | Inactive |
|--------|-------|--------|----------|
| Greenhouse | 302 | X | Y |
| Lever | 61 | X | Y |
### Health Check Results
#### Companies Returning 0 Jobs (4+ weeks)
| Company | Last Jobs | Weeks Empty | Action |
|---------|-----------|-------------|--------|
| [Name] | [Date] | X | [Check/Remove] |
#### Validation Failures
| Company | Issue | Resolution |
|---------|-------|------------|
| [Name] | 404 | Update slug to X |
| [Name] | Timeout | Retry / investigate |
### New Companies to Add
| Company | ATS | Slug | Validated |
|---------|-----|------|-----------|
| [Name] | Greenhouse | [slug] | Yes/No |
### Companies to Remove
| Company | Reason |
|---------|--------|
| [Name] | [Defunct/migrated ATS/etc] |
### Config Changes
```json
// Add to company_ats_mapping.json:
{
"New Company": {"slug": "newco", "url": "..."}
}
// Remove from company_ats_mapping.json:
// "Old Company": {...}
## Key Files to Reference - `config/greenhouse/company_ats_mapping.json` - Greenhouse companies - `config/lever/company_mapping.json` - Lever companies - `config/ashby/company_mapping.json` - Ashby companies - `pipeline/utilities/validate_ats_slugs.py` - Unified validation script (all ATS) - `pipeline/utilities/discover_ats_companies.py` - Unified discovery script (Google CSE) - `pipeline/utilities/discover_greenhouse_slugs.py` - Legacy Greenhouse discovery - `scrapers/lever/discover_lever_companies.py` - Legacy Lever discovery