ESI Extraction Skill
Overview
Extracts structured clinical facts from patient records using an LLM. This skill performs Phase 1 of the hybrid ESI classification pipeline: transforming unstructured text into validated, machine-readable facts.
When to Use
- •Extracting vital signs from clinical notes
- •Converting patient vignettes to structured data
- •Pre-processing for deterministic ESI logic (see
esi-compositionskill) - •Building datasets for rule-based decision trees
Input/Output Schema
Input:
- •
patient_record(string): Clinical note, ED triage form, or patient vignette
Output:
- •
extracted_facts(object): Validated facts matchingschemas/extraction_schema.json - •
confidence(float): Extraction confidence 0.0-1.0, calculated as: (extracted fields count) / (total possible fields) - •
status(string): "success" or "warning" (warning if confidence < 0.5 or >3 validation errors) - •
validation_errors(array): Schema violations and sanity check failures - •
error_count(int): Number of validation errors
Quick Start
python
# Use the esi-extraction skill to extract facts from a patient record
from src.skills import SkillRegistry, SkillExecutor
registry = SkillRegistry()
executor = SkillExecutor(registry)
result = executor.execute(
skill_name="esi-extraction",
inputs={
"patient_record": "42-year-old male with chest pain and shortness of breath. BP 180/110, HR 105, RR 22"
}
)
if result.is_success():
facts = result.output["extracted_facts"]
print(f"Vital Signs: {facts['vital_signs']}")
print(f"Risk Factors: {facts['risk_factors']}")
Extracted Facts Structure
The skill extracts and validates the following categories:
Vital Signs
- •systolic_bp (int): Systolic blood pressure mmHg
- •diastolic_bp (int): Diastolic blood pressure mmHg
- •heart_rate (int): Beats per minute
- •respiratory_rate (int): Breaths per minute
- •oxygen_saturation (float): Percentage (0-100)
- •temperature (float): Celsius or Fahrenheit (inferred)
Symptoms
- •chief_complaint (string): Primary reason for visit
- •pain_level (int): 0-10 scale (if explicitly mentioned as numeric)
Risk Factors
- •high_risk_keywords (array): ["chest pain", "difficulty breathing", "confusion", ...]
- •trauma_indicators (bool): Recent injury or accident
- •infectious_signs (bool): Fever, infection markers
- •allergies (array): Known allergies
- •medications (array): Current medications
Resource Requirements
- •requires_imaging (bool): Likely needs CT, X-ray, ultrasound
- •requires_lab (bool): Needs blood work, urinalysis
- •requires_monitoring (bool): Continuous vital monitoring
Data Quality
- •extraction_confidence (float): 0.0-1.0 score
- •missing_fields (array): Fields not found in record
- •ambiguous_fields (array): Fields requiring clarification
Best Practices
Input Formatting
- •Ensure patient records are reasonably clean (medical notes, not handwritten scans)
- •Include vital signs if available; skill will infer normal ranges if missing
- •Include chief complaint in first sentence for best extraction
Output Validation
- •Always check
confidencescore; <0.7 indicates uncertain extraction - •Review
missing_fieldsto understand data gaps - •Compare extracted
vital_signsto input for sanity-checking
Error Handling
- •Skill retries up to 2 times with exponential backoff
- •If JSON parsing fails, check
raw_extractionfield - •Timeout is 30 seconds; large documents may need splitting
Caching
- •Results cached for 1 hour by default; disable in config if needed
- •Cache key based on patient_record content, not metadata
- •Clear cache between runs for same patient with updated records
Examples
Example 1: Clear vital signs
code
Input: "74-year-old female, alert, brings in husband. VS: 168/92, HR 88, RR 16, O2 98%, Temp 37.2C. Complains of chest pain x 2 hours."
Output:
{
"vital_signs": {
"systolic_bp": 168,
"diastolic_bp": 92,
"heart_rate": 88,
"respiratory_rate": 16,
"oxygen_saturation": 98.0,
"temperature": 37.2
},
"symptoms": {
"chief_complaint": "chest pain",
"pain_level": null,
"pain_location": "chest",
"symptom_onset": "2 hours ago",
"symptom_duration": "2 hours"
},
"confidence": 0.95
}
Example 2: Incomplete vitals
code
Input: "8-year-old boy brought by mother. Very pale and lethargic. Rapid breathing. No vitals available. Mother reports fever starting yesterday evening."
Output:
{
"vital_signs": {
"systolic_bp": null,
"respiratory_rate": null,
"temperature": null
},
"risk_factors": {
"infectious_signs": true
},
"extracted_facts": {
"missing_fields": ["systolic_bp", "diastolic_bp", "heart_rate", "oxygen_saturation"],
"ambiguous_fields": ["temperature"]
},
"confidence": 0.65
}
Troubleshooting
Issue: Confidence score too low (< 0.7)
- •Cause: Incomplete or unclear patient record
- •Solution: Request more detailed clinical notes; consider manual review
Issue: JSON parsing error
- •Cause: LLM returned non-JSON output
- •Solution: Review
raw_extractionfield; retry with different LLM model/temperature
Issue: Timeout (>30 seconds)
- •Cause: Very long patient record or slow LLM
- •Solution: Split record into sections; increase timeout in config
See Also
- •REFERENCE.md: Detailed parameter descriptions and edge cases
- •EXAMPLES.md: Complete examples with input/output
- •schemas/extraction_schema.json: Full JSON schema for validation
- •esi-composition: Deterministic ESI logic that consumes extracted facts