Export FiftyOne Datasets
Key Directives
ALWAYS follow these rules:
1. Load and understand the dataset first
set_context(dataset_name="my-dataset") dataset_summary(name="my-dataset")
2. Confirm export settings with user
Before exporting, present:
- •Dataset name and sample count
- •Available label fields and their types
- •Proposed export format
- •Export directory path
3. Match format to label types
Different formats support different label types:
| Format | Label Types |
|---|---|
| COCO | detections, segmentations, keypoints |
| YOLO (v4, v5) | detections |
| VOC | detections |
| CVAT | classifications, detections, polylines, keypoints |
| CSV | all (custom fields) |
| Image Classification Directory Tree | classification |
4. Use absolute paths
Always use absolute paths for export directories:
params={
"export_dir": {"absolute_path": "/path/to/export"}
}
5. Warn about overwriting
Check if export directory exists before exporting. If it does, ask user whether to overwrite.
Complete Workflow
Step 1: Load Dataset and Understand Content
# Set context set_context(dataset_name="my-dataset") # Get dataset summary to see fields and label types dataset_summary(name="my-dataset")
Identify:
- •Total sample count
- •Media type (images, videos, point clouds)
- •Available label fields and their types (Detections, Classifications, etc.)
Step 2: Get Export Operator Schema
# Discover export parameters dynamically get_operator_schema(operator_uri="@voxel51/io/export_samples")
Step 3: Present Export Options to User
Before exporting, confirm with the user:
Dataset: my-dataset (5,000 samples) Media type: image Available label fields: - ground_truth (Detections) - predictions (Detections) Export options: - Format: COCO (recommended for detections) - Export directory: /path/to/export - Label field: ground_truth Proceed with export?
Step 4: Execute Export
Export media and labels:
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "MEDIA_AND_LABELS",
"dataset_type": "COCO",
"export_dir": {"absolute_path": "/path/to/export"},
"label_field": "ground_truth"
}
)
Export labels only (no media copy):
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "LABELS_ONLY",
"dataset_type": "COCO",
"labels_path": {"absolute_path": "/path/to/labels.json"},
"label_field": "ground_truth"
}
)
Export media only (no labels):
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "MEDIA_ONLY",
"export_dir": {"absolute_path": "/path/to/media"}
}
)
Step 5: Verify Export
After export, verify the output:
ls -la /path/to/export
Report exported file count and structure to user.
Supported Export Formats
Detection Formats
| Format | dataset_type Value | Label Types | Labels-Only |
|---|---|---|---|
| COCO | "COCO" | detections, segmentations, keypoints | Yes |
| YOLOv4 | "YOLOv4" | detections | Yes |
| YOLOv5 | "YOLOv5" | detections | No |
| VOC | "VOC" | detections | Yes |
| KITTI | "KITTI" | detections | Yes |
| CVAT Image | "CVAT Image" | classifications, detections, polylines, keypoints | Yes |
| CVAT Video | "CVAT Video" | frame labels | Yes |
| TF Object Detection | "TF Object Detection" | detections | No |
Classification Formats
| Format | dataset_type Value | Media Type | Labels-Only |
|---|---|---|---|
| Image Classification Directory Tree | "Image Classification Directory Tree" | image | No |
| Video Classification Directory Tree | "Video Classification Directory Tree" | video | No |
| TF Image Classification | "TF Image Classification" | image | No |
Segmentation Formats
| Format | dataset_type Value | Label Types | Labels-Only |
|---|---|---|---|
| Image Segmentation | "Image Segmentation" | segmentation | Yes |
General Formats
| Format | dataset_type Value | Best For | Labels-Only |
|---|---|---|---|
| CSV | "CSV" | Custom fields, spreadsheet analysis | Yes |
| GeoJSON | "GeoJSON" | Geolocation data | Yes |
| FiftyOne Dataset | "FiftyOne Dataset" | Full dataset backup with all metadata | Yes |
Note: Formats with "Labels-Only: No" require export_type: "MEDIA_AND_LABELS" (cannot export labels without media).
Export Type Options
export_type Value | Description |
|---|---|
"MEDIA_AND_LABELS" | Export both media files and labels |
"LABELS_ONLY" | Export labels only (use labels_path instead of export_dir) |
"MEDIA_ONLY" | Export media files only (no labels) |
"FILEPATHS_ONLY" | Export CSV with filepaths only |
Target Options
Export from different sources:
target Value | Description |
|---|---|
"DATASET" | Export entire dataset (default) |
"CURRENT_VIEW" | Export current filtered view |
"SELECTED_SAMPLES" | Export selected samples only |
Common Use Cases
Use Case 1: Export to COCO Format
For training with frameworks that use COCO format:
set_context(dataset_name="my-dataset")
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "MEDIA_AND_LABELS",
"dataset_type": "COCO",
"export_dir": {"absolute_path": "/path/to/coco_export"},
"label_field": "ground_truth"
}
)
Output structure:
coco_export/ ├── data/ │ ├── image1.jpg │ └── image2.jpg └── labels.json
Use Case 2: Export to YOLO Format
For training YOLOv5/v8 models:
set_context(dataset_name="my-dataset")
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "MEDIA_AND_LABELS",
"dataset_type": "YOLOv5",
"export_dir": {"absolute_path": "/path/to/yolo_export"},
"label_field": "ground_truth"
}
)
Output structure:
yolo_export/ ├── images/ │ └── train/ │ └── image1.jpg ├── labels/ │ └── train/ │ └── image1.txt └── dataset.yaml
Use Case 3: Export Filtered View
Export only a subset of samples:
# Set context
set_context(dataset_name="my-dataset")
# Filter samples in the App
set_view(tags=["validated"])
# Export the filtered view
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"target": "CURRENT_VIEW",
"export_type": "MEDIA_AND_LABELS",
"dataset_type": "COCO",
"export_dir": {"absolute_path": "/path/to/validated_export"},
"label_field": "ground_truth"
}
)
Use Case 4: Export Labels Only
When media should stay in place:
set_context(dataset_name="my-dataset")
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "LABELS_ONLY",
"dataset_type": "COCO",
"labels_path": {"absolute_path": "/path/to/annotations.json"},
"label_field": "ground_truth"
}
)
Use Case 5: Export for Classification Training
For image classification datasets:
set_context(dataset_name="my-classification-dataset")
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "MEDIA_AND_LABELS",
"dataset_type": "Image Classification Directory Tree",
"export_dir": {"absolute_path": "/path/to/classification_export"},
"label_field": "ground_truth"
}
)
Output structure:
classification_export/
├── cat/
│ ├── cat1.jpg
│ └── cat2.jpg
└── dog/
├── dog1.jpg
└── dog2.jpg
Use Case 6: Export to CSV
For analysis in spreadsheets:
set_context(dataset_name="my-dataset")
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "LABELS_ONLY",
"dataset_type": "CSV",
"labels_path": {"absolute_path": "/path/to/data.csv"},
"csv_fields": ["filepath", "ground_truth.detections.label"]
}
)
Use Case 7: Export FiftyOne Dataset (Full Backup)
For complete dataset backup including all metadata:
set_context(dataset_name="my-dataset")
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "MEDIA_AND_LABELS",
"dataset_type": "FiftyOne Dataset",
"export_dir": {"absolute_path": "/path/to/backup"}
}
)
Output structure:
backup/ ├── metadata.json ├── samples.json ├── data/ │ └── ... ├── annotations/ ├── brain/ └── evaluations/
Python SDK Alternative
For more control, guide users to use the Python SDK directly:
import fiftyone as fo
import fiftyone.types as fot
# Load dataset
dataset = fo.load_dataset("my-dataset")
# Export to COCO format
dataset.export(
export_dir="/path/to/export",
dataset_type=fot.COCODetectionDataset,
label_field="ground_truth",
)
# Export labels only
dataset.export(
labels_path="/path/to/labels.json",
dataset_type=fot.COCODetectionDataset,
label_field="ground_truth",
)
# Export a filtered view
view = dataset.match_tags("validated")
view.export(
export_dir="/path/to/validated",
dataset_type=fot.YOLOv5Dataset,
label_field="ground_truth",
)
Python SDK dataset types:
- •
fot.COCODetectionDataset- COCO format - •
fot.YOLOv4Dataset- YOLOv4 format - •
fot.YOLOv5Dataset- YOLOv5 format - •
fot.VOCDetectionDataset- Pascal VOC format - •
fot.KITTIDetectionDataset- KITTI format - •
fot.CVATImageDataset- CVAT image format - •
fot.CVATVideoDataset- CVAT video format - •
fot.TFObjectDetectionDataset- TensorFlow Object Detection format - •
fot.ImageClassificationDirectoryTree- Classification folder structure - •
fot.VideoClassificationDirectoryTree- Video classification folders - •
fot.TFImageClassificationDataset- TensorFlow classification format - •
fot.ImageSegmentationDirectory- Segmentation masks - •
fot.CSVDataset- CSV format - •
fot.GeoJSONDataset- GeoJSON format - •
fot.FiftyOneDataset- Native FiftyOne format
Exporting to Hugging Face Hub
For complete HF Hub export documentation, see HF-HUB-EXPORT.md.
Quick reference:
| Method | Use Case |
|---|---|
push_to_hub() | Personal accounts, simple upload |
| Manual upload | Organizations, private org repos |
Quick start:
from fiftyone.utils.huggingface import push_to_hub
# Personal account
push_to_hub(dataset, repo_name="my-dataset", private=False)
# With options
push_to_hub(
dataset,
repo_name="my-dataset",
description="My dataset description",
license="apache-2.0",
private=True,
)
IMPORTANT: Always generate and get user approval for dataset card before uploading. See HF-HUB-EXPORT.md for complete documentation including authentication setup, dataset card workflow, parameters reference, use cases, and troubleshooting.
Troubleshooting
Error: "Export directory already exists"
- •Add
"overwrite": trueto params - •Or specify a different export directory
Error: "Label field not found"
- •Use
dataset_summary()to see available label fields - •Verify the field name spelling
Error: "Unsupported label type for format"
- •Check that the export format supports your label type
- •COCO: detections, segmentations, keypoints
- •YOLO: detections only
- •Classification formats: classification labels only
Error: "Permission denied"
- •Verify write permissions for the export directory
- •Check parent directory exists
Export is slow
- •Large datasets take time; consider exporting a view first
- •Export to local disk rather than network drives
- •For labels only, use
LABELS_ONLYexport type
Best Practices
- •Understand your data first - Use
dataset_summary()to know what fields and label types exist - •Match format to purpose - Use COCO/YOLO for training, CSV for analysis, FiftyOne Dataset for backups
- •Confirm with user - Present export settings before executing
- •Export filtered views - Only export what's needed rather than entire datasets
- •Verify after export - Check exported file counts match expectations
- •Use labels_path for LABELS_ONLY - When exporting labels only, use
labels_pathnotexport_dir