Batch JD Processor
Process job batches through parallel manual review by subagents. Each subagent reads jobs like a human and applies user's filter criteria.
Core Rules
⛔ Prohibitions
- •NO automated scripts (no
filter_all_jobs.py, no regex scripts) - •NO keyword extraction scripts
✅ Required Approach
- •Split jobs into groups (5 per group)
- •Launch ALL subagents in SINGLE message
- •Each subagent manually reads every job (5 jobs per agent)
- •Provide detailed reasoning for every decision
Three-Tier Classification
PASS - Meets all criteria, recommend applying REJECT - Fails filter criteria (location/experience/cert/position type) ERROR - Cannot evaluate (missing JD, corrupted data, missing fields)
Critical: Use ERROR for data problems, REJECT for criteria failures.
Examples:
{"filter_result": "ERROR", "reason": "Missing JD content, cannot evaluate position requirements"}
{"filter_result": "REJECT", "reason": "Requires 3 years experience, user needs 0-year entry-level"}
{"filter_result": "PASS", "reason": "Entry-level Data Analyst, matches ideal position, located in Toronto"}
Workflow
Step 0: Display Filter Criteria
Read and display user_filter.json showing location, experience, certifications, ideal positions, avoid categories.
Step 1: Load Data
Parse Excel/CSV into JSON for subagents:
import pandas as pd, json
df = pd.read_csv('input.csv') if file.endswith('.csv') else pd.read_excel('input.xlsx')
jobs = [{
'job_id': str(row.get('Job ID', f'job_{idx}')),
'title': row.get('Title', ''),
'location': row.get('Location', ''),
'url': row.get('URL', ''),
'full_jd': row.get('Job Description', ''),
'posted_date': row.get('Posted Date', '')
} for idx, row in df.iterrows()]
with open('all_jobs.json', 'w', encoding='utf-8') as f:
json.dump(jobs, f, indent=2, ensure_ascii=False)
Step 2: Split into Groups
Calculate subagents: num_agents = Total JDs / 5 (each agent processes 5 JDs)
import math
num_agents = max(1, math.ceil(len(jobs) / 5)) # Allocate 1 agent per 5 JDs
jobs_per_agent = math.ceil(len(jobs) / num_agents)
for i in range(num_agents):
group = jobs[i*jobs_per_agent : (i+1)*jobs_per_agent]
with open(f'jobs_group_{i+1}.json', 'w', encoding='utf-8') as f:
json.dump(group, f, indent=2, ensure_ascii=False)
Step 3: Launch Subagents (In SINGLE Message)
Prompt template for each subagent:
Review Group {N}: /path/to/jobs_group_{N}.json
Filter: /path/to/user_filter.json
For EACH job:
1. Check data quality first - Missing JD/fields? → ERROR
2. Read title and full JD carefully
3. Check ALL criteria: location, experience, certs, position type, avoid categories
4. Classify as PASS/REJECT/ERROR with specific reason
Output JSON: [{job_id, title, url, location, filter_result, reason}]
Save to: /path/to/group_{N}_results.json
Rules:
- NO scripts, manual reading only
- ERROR for data issues, REJECT for criteria failures
- Specific reasons (e.g. "Requires 2 years experience" not "Does not meet requirements")
Step 4: Merge & Generate Excel
import json, pandas as pd
from openpyxl import load_workbook
from openpyxl.styles import PatternFill, Font, Alignment
# Merge results
all_results = []
for i in range(1, num_agents + 1):
with open(f'group_{i}_results.json', 'r', encoding='utf-8') as f:
all_results.extend(json.load(f))
# Statistics
pass_count = sum(1 for r in all_results if r['filter_result'] == 'PASS')
reject_count = sum(1 for r in all_results if r['filter_result'] == 'REJECT')
error_count = sum(1 for r in all_results if r['filter_result'] == 'ERROR')
# Create Excel
df = pd.DataFrame(all_results)
df['sort_key'] = df['filter_result'].map({'PASS': 0, 'REJECT': 1, 'ERROR': 2})
df = df.sort_values('sort_key').drop('sort_key', axis=1)
df.to_excel('output.xlsx', index=False)
# Color code
wb = load_workbook('output.xlsx')
ws = wb.active
green = PatternFill(start_color='C6EFCE', end_color='C6EFCE', fill_type='solid')
red = PatternFill(start_color='FFC7CE', end_color='FFC7CE', fill_type='solid')
yellow = PatternFill(start_color='FFEB9C', end_color='FFEB9C', fill_type='solid')
for row in ws.iter_rows(min_row=2):
result = row[4].value
row[4].fill = green if result == 'PASS' else red if result == 'REJECT' else yellow
wb.save('output.xlsx')
Step 5: Present Results
📊 Batch Filtering Complete!
Total: {total} | ✅ PASS: {pass} | ❌ REJECT: {reject} | ⚠️ ERROR: {error}
🎯 Recommended Positions (Top 5):
[List PASS jobs with title, location, reason, URL]
📋 Rejection Reason Statistics:
[Group REJECT by reason with counts]
⚠️ Error Statistics:
[Group ERROR by reason with counts]
[Excel download link]
Subagent Quality Checklist
✅ Read data quality first (missing JD → ERROR, not REJECT) ✅ Read full job description, don't skim ✅ Check ALL criteria (location, experience, certs, position type, avoid) ✅ Specific reasoning ("Requires 3 years experience" not "Does not meet requirements") ✅ Handle edge cases (bilingual, "Associate" titles, "preferred" vs "required")
Common Mistakes
❌ Using scripts instead of manual review ❌ Launching subagents sequentially (launch ALL in one message) ❌ Vague reasoning ❌ ERROR for criteria failures (use ERROR only for data issues)
Performance
| Batch Size | Agents | Jobs/Agent | Estimated Time |
|---|---|---|---|
| 25 | 5 | 5 | 2-3min |
| 50 | 10 | 5 | 3-5min |
| 100 | 20 | 5 | 5-8min |
| 200 | 40 | 5 | 10-15min |
Manual review is slower than scripts but catches nuances and context-dependent requirements. Each agent processes exactly 5 JDs for consistent workload distribution.