Extraction Form (systematic review)
Goal: create a consistent, analysis-ready extraction table that is directly grounded in the protocol.
Inputs
Required:
- •
papers/screening_log.csv - •
output/PROTOCOL.md
Optional:
- •
papers/paper_notes.jsonl(if you already have structured notes)
Outputs
- •
papers/extraction_table.csv
Workflow
- •
Determine the included set
- •From
papers/screening_log.csv, collect all rows withdecision=include.
- •From
- •
Build/confirm the schema
- •Use the extraction schema defined in
output/PROTOCOL.md. - •If the protocol does not define fields yet, stop and update
output/PROTOCOL.mdfirst.
- •Use the extraction schema defined in
- •
Populate
papers/extraction_table.csv- •One row per included paper.
- •If
papers/paper_notes.jsonlexists, use it as a structured source for values/provenance (but keep the table schema governed byoutput/PROTOCOL.md). - •Always include provenance columns:
- •
paper_id,title,year,url
- •
- •For each protocol-defined field:
- •fill concrete values (units explicit)
- •use an explicit sentinel for unknowns (recommended: empty cell +
notes)
- •
Keep it auditable
- •If a value is inferred (not directly stated), mark it in a notes column.
- •Do not write synthesis; only extraction.
- •
Quick QA
- •Ensure 1:1 coverage: included papers == extraction rows.
- •Spot-check a few rows against the paper text/notes.
Definition of Done
- •
papers/extraction_table.csvexists. - • Every included paper from
papers/screening_log.csvhas exactly one extraction row. - • Column meanings match
output/PROTOCOL.md(no ad-hoc columns without updating the protocol).
Troubleshooting
Issue: the protocol does not specify extraction fields
Fix:
- •Update
output/PROTOCOL.md(extraction schema section) and re-run extraction.
Issue: extraction table mixes narrative text with fields
Fix:
- •Move narrative into a
notescolumn and keep the rest as atomic values (numbers/enums/short strings).