Golden Dataset Management
Protect and maintain high-quality test datasets for AI/ML systems
Overview
A golden dataset is a curated collection of high-quality examples used for:
- •Regression testing: Ensure new code doesn't break existing functionality
- •Retrieval evaluation: Measure search quality (precision, recall, MRR)
- •Model benchmarking: Compare different models/approaches
- •Reproducibility: Consistent results across environments
When to use this skill:
- •Building test datasets for RAG systems
- •Implementing backup/restore for critical data
- •Validating data integrity (URL contracts, embeddings)
- •Migrating data between environments
OrchestKit's Golden Dataset
Stats (Production):
- •98 analyses (completed content analyses)
- •415 chunks (embedded text segments)
- •203 test queries (with expected results)
- •91.6% pass rate (retrieval quality metric)
Purpose:
- •Test hybrid search (vector + BM25 + RRF)
- •Validate metadata boosting strategies
- •Detect regressions in retrieval quality
- •Benchmark new embedding models
Core Concepts
Data Integrity Contracts
The URL Contract: Golden dataset analyses MUST store real canonical URLs, not placeholders.
# WRONG - Placeholder URL (breaks restore) analysis.url = "https://orchestkit.dev/placeholder/123" # CORRECT - Real canonical URL (enables re-fetch if needed) analysis.url = "https://docs.python.org/3/library/asyncio.html"
Why this matters:
- •Enables re-fetching content if embeddings need regeneration
- •Allows validation that source content hasn't changed
- •Provides audit trail for data provenance
Backup Strategy Comparison
| Strategy | Version Control | Restore Speed | Portability | Inspection |
|---|---|---|---|---|
| JSON (recommended) | Yes | Slower (regen embeddings) | High | Easy |
| SQL Dump | No (binary) | Fast | DB-version dependent | Hard |
OrchestKit uses JSON backup for version control and portability.
Quick Reference
Backup Format
{
"version": "1.0",
"created_at": "2025-12-19T10:30:00Z",
"metadata": {
"total_analyses": 98,
"total_chunks": 415,
"total_artifacts": 98
},
"analyses": [
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"url": "https://docs.python.org/3/library/asyncio.html",
"content_type": "documentation",
"status": "completed",
"created_at": "2025-11-15T08:20:00Z",
"chunks": [
{
"id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
"content": "asyncio is a library...",
"section_title": "Introduction to asyncio"
// embedding NOT included (regenerated on restore)
}
]
}
]
}
Key Design Decisions:
- •Embeddings excluded (regenerate on restore with current model)
- •Nested structure (analyses -> chunks -> artifacts)
- •Metadata for validation
- •ISO timestamps for reproducibility
CLI Commands
cd backend # Backup golden dataset poetry run python scripts/backup_golden_dataset.py backup # Verify backup integrity poetry run python scripts/backup_golden_dataset.py verify # Restore from backup (WARNING: Deletes existing data) poetry run python scripts/backup_golden_dataset.py restore --replace # Restore without deleting (adds to existing) poetry run python scripts/backup_golden_dataset.py restore
Validation Checks
| Check | Error/Warning | Description |
|---|---|---|
| Count mismatch | Error | Analysis/chunk count differs from metadata |
| Placeholder URLs | Error | URLs containing orchestkit.dev or placeholder |
| Missing embeddings | Error | Chunks without embeddings after restore |
| Orphaned chunks | Warning | Chunks with no parent analysis |
Best Practices Summary
- •Version control backups - Commit to git for history and diffs
- •Validate before deployment - Run verify before production changes
- •Test restore in staging - Never test restore in production first
- •Document changes - Track additions/removals in metadata
Disaster Recovery Quick Guide
| Scenario | Steps |
|---|---|
| Accidental deletion | restore --replace -> verify -> run tests |
| Migration failure | alembic downgrade -1 -> restore --replace -> fix migration |
| New environment | Clone repo -> setup DB -> restore -> run tests |
References
For detailed implementation patterns, see:
- •
references/storage-patterns.md- Backup strategies, JSON format, backup script implementation, CI/CD automation - •
references/versioning.md- Restore implementation, embedding regeneration, validation checklist, disaster recovery scenarios
Related Skills
- •
golden-dataset-validation- Schema and integrity validation - •
golden-dataset-curation- Quality criteria and curation workflows - •
pgvector-search- Retrieval evaluation using golden dataset - •
ai-native-development- Embedding generation for restore
Version: 1.0.0 (December 2025) Status: Production-ready patterns from OrchestKit's 98-analysis golden dataset
Capability Details
backup
Keywords: golden dataset, backup, export, json backup, version control data Solves:
- •How do I backup the golden dataset?
- •Export analyses to JSON for version control
- •Protect critical test datasets
- •Create portable database snapshots
restore
Keywords: restore dataset, import analyses, regenerate embeddings, disaster recovery, new environment Solves:
- •How do I restore from backup?
- •Import golden dataset to new environment
- •Regenerate embeddings after restore
- •Disaster recovery procedures
validation
Keywords: verify dataset, url contract, data integrity, validate backup, placeholder urls Solves:
- •How do I validate dataset integrity?
- •Check URL contracts (no placeholders)
- •Verify embeddings exist
- •Detect orphaned chunks
ci-cd-automation
Keywords: automated backup, github actions, ci cd backup, scheduled backup Solves:
- •How do I automate dataset backups?
- •Set up GitHub Actions for weekly backups
- •Commit backups to git automatically
- •CI/CD integration patterns
disaster-recovery
Keywords: disaster recovery, accidental deletion, migration failure, rollback Solves:
- •What if I accidentally delete the dataset?
- •Database migration gone wrong
- •Restore after data corruption
- •Rollback procedures
orchestkit-golden-dataset
Keywords: orchestkit, 98 analyses, 415 chunks, retrieval evaluation, real world Solves:
- •What is OrchestKit's golden dataset?
- •How does OrchestKit protect test data?
- •Real-world backup/restore examples
- •Production golden dataset stats