CSV Annotation with OpenAI
Adds AI-generated annotations to CSV files by processing each row with a prompt.
Quick Start
bash
uv run annotate-csv.py "Classify sentiment as positive/negative/neutral" reviews.csv
Output: reviews_annotated.csv with new annotation column.
Dependencies install automatically on first run.
Core Usage
bash
uv run annotate-csv.py <prompt> <csv_file> [options]
| Option | Description | Default |
|---|---|---|
-o, --output | Output file path | <input>_annotated.csv |
-c, --column | Annotation column name | annotation |
--context | Columns to use (all or comma-separated) | all |
--model | OpenAI model | gpt-5-mini |
--parallelism | Parallel workers | 10 |
--id-column | Column for progress display | - |
Prompt Sources
Prompts can be inline or from a file:
bash
# Inline prompt python annotate-csv.py "Extract the main topic" data.csv # From file (for complex prompts) python annotate-csv.py prompt.txt data.csv
Common Patterns
Classification:
bash
python annotate-csv.py "Classify sentiment: positive, negative, or neutral" feedback.csv -c sentiment
Extraction:
bash
python annotate-csv.py "Extract the product name mentioned" reviews.csv -c product --context "review_text"
Summarization:
bash
python annotate-csv.py "Summarize in one sentence" articles.csv -c summary
See EXAMPLES.md for more patterns and prompt templates.
Environment Setup
Requires OPENAI_API_KEY in .env file:
code
OPENAI_API_KEY=sk-...
Run with uv (recommended):
bash
uv run annotate-csv.py "Your prompt" data.csv
Or install dependencies first:
bash
uv pip install -r requirements.txt python annotate-csv.py "Your prompt" data.csv
Workflow
- •Preview data: Check CSV structure and columns
- •Choose context: Decide which columns the AI needs to see
- •Write prompt: Be specific about expected output format
- •Test on subset: Try on a few rows first if large dataset
- •Run full annotation: Process the complete file
- •Validate results: Spot-check annotation quality
Writing Effective Prompts
- •Be explicit about output format (single word, phrase, sentence)
- •List valid categories for classification tasks
- •Specify what to do with ambiguous cases
- •Keep prompts focused on one task
See PROMPTS.md for prompt-writing guidance.
Troubleshooting
| Issue | Solution |
|---|---|
| API key error | Check .env file has OPENAI_API_KEY |
| Column not found | Verify column names match CSV exactly |
| Rate limits | Reduce --parallelism to 5 or lower |
| Empty annotations | Check context columns have data |