Protocol Parsing Skill
Purpose
Extract structured metadata from clinical trial protocol documents (PDF format).
Input
- •Protocol PDF document
- •Study identifier (if known)
Output Schema
json
{
"study_id": "string",
"study_title": "string",
"sponsor": "string",
"phase": "string (I, II, III, IV)",
"indication": "string",
"objectives": {
"primary": "string",
"secondary": ["string"]
},
"treatment_arms": [
{
"name": "string",
"description": "string",
"dose": "string (optional)"
}
],
"key_inclusion_criteria": ["string"],
"key_exclusion_criteria": ["string"],
"planned_enrollment": "number",
"study_duration": "string"
}
Extraction Instructions
Study Identification
- •Look for "Protocol Number" or "Study ID" in header/title page
- •Extract sponsor name from cover page
- •Identify phase from title or objectives section
Objectives
- •Locate "Objectives" or "Study Objectives" section
- •Primary objective is typically first, marked explicitly
- •Secondary objectives follow, may be numbered
Treatment Arms
- •Find "Study Design" or "Treatment" section
- •Look for tables showing arm descriptions
- •Extract dose, route, frequency information
Inclusion/Exclusion Criteria
- •Locate "Eligibility Criteria" section
- •Key criteria often marked with asterisks or bold
- •Prioritize age, diagnosis, disease stage criteria
Schedule of Activities
- •Identify table format in "Study Procedures" section
- •Extract visit names, timing, and procedures
- •Note visit windows if specified
Example Prompts
For Cover Page Analysis
code
Extract the study ID, sponsor name, phase, and study title from this protocol cover page.
For Objectives Section
code
Identify the primary and secondary objectives from this section. The primary objective typically mentions the main efficacy endpoint.
For Treatment Arms
code
List all treatment arms including placebo. For each arm, extract: - Arm name - Drug/intervention - Dose and frequency - Route of administration
Validation Rules
- •Study ID should match expected format (sponsor prefix + number)
- •Phase should be I, II, III, or IV
- •At least one treatment arm required
- •Primary objective must be present