Rapidata Use
Overview
Translate user requests into Rapidata SDK workflows: authenticate, submit orders, apply filters/settings/selections, create validation sets, monitor progress, and retrieve results. Use MRI when the user wants model ranking/benchmarking.
Quick start (auth)
- •Initialize with
rapi = RapidataClient(). - •If the machine is not authenticated, ask the user to complete the browser login prompt; credentials are saved locally.
- •If the user has client ID/secret, pass them to
RapidataClient(client_id=..., client_secret=...).
Task routing (open references only as needed)
- •API surface + method signatures:
references/api.md - •Order/validation workflows + prompt design + early stopping:
references/workflows.md - •Results schema + pandas export:
references/results.md
Default assumptions (confirm before submit)
Ask the user to confirm each default when they did not specify it:
- •
responses_per_datapoint=10(orresponses_per_comparison=1for ranking) - •
data_type="media" - •
filters=[UserScoreFilter(0.55, 0.95)] - •
settings=[] - •
selections=[] - •
validation_set_id=None - •
confidence_threshold=None - •
contexts=None,media_contexts=None,private_notes=None - •Compare:
a_b_names=None; Ranking:random_comparisons_ratio=0.5
If the user rejects the default user-score filter, remove it entirely or apply their requested bounds. Treat the defaults above as preferences and always confirm them with the user before submission.
Core workflows
1) Submit a labeling order from a dataset
- •Ask for: order type, instruction/question, datapoints, optional contexts/media_contexts/private_notes, responses_per_datapoint, and any filters/settings/selections.
- •Map the dataset into the expected Rapidata inputs (lists of strings or pairs) and verify all list lengths match.
- •Create the order with
rapi.order.create_*_order(...), then callorder.preview()(if requested) andorder.run().
2) Pairwise tasks (true compare)
- •Use
create_compare_orderwithdatapoints=[ [left,right], ... ](two unique items per datapoint). - •If the user provides pre-concatenated left/right images, prefer a true compare order by using the original two files; only use classification if the data is irreversibly concatenated.
3) Quality via validation sets
- •Create validation sets with
rapi.validation.create_*_set(...)and attach viavalidation_set_idor selections. - •If selections are provided,
validation_set_idis ignored; pick one approach.
4) Early stopping
- •For classification/compare, set
confidence_thresholdto stop early once confidence is reached. - •Keep
responses_per_datapointas the maximum cap.
5) Monitor or manage orders
- •Use
order.display_progress_bar(),order.get_results(preliminary_results=...),order.pause(),order.unpause(),order.delete(). - •Use
rapi.order.find_orders(...)orrapi.order.get_order_by_id(...)for retrieval.
6) MRI benchmarks
- •Use
rapi.mri.create_new_benchmark(...),benchmark.create_leaderboard(...), andbenchmark.evaluate_model(...). - •Retrieve standings from
leaderboard.get_standings()orbenchmark.get_overall_standings().
Notes and guardrails
- •Prefer writing a script to disk and running it for multi-step actions (avoid inline Python).
- •Ask the user for any clarifications before submitting orders when requirements are ambiguous.
- •Compare datapoints must be pairs of two unique items; ranking requires lists of >=2 unique items per group.
- •API/platform constraints (must enforce): Rapidata rejects contexts longer than 400 characters (truncate to <= 380), and only SFW content should be submitted.
- •Compare orders only allow
a_b_namesfor the A/B labels; the optional "I can't tell" choice comes fromAllowNeitherBoth()and cannot be renamed. If you need exact label text like "I can't tell," use a classification order with stitched images instead. - •Preferences (confirm with user): batch compare/classification orders to <= 100 datapoints each; add resume flags (
--start,--total-batches) to avoid re-uploading when rerunning. Reruns create duplicate orders, so prefer an explicit cleanup step (find-by-name + delete) before resubmitting. Credits are consumed on submission—start with a tiny test batch (~10 items), confirm format, then scale up. - •
timestamporder is marked not fully supported; warn the user. - •SDK delete exists for orders and validation sets; for any finer-grained deletion (individual tasks/responses), ask for the API endpoint or confirm using the OpenAPI client.
- •Quickstart doc notes a 100-datapoint limit per order; ask if they need batching.
Resources
scripts/
- •
scripts/rapidata_create_pairwise_pref_order.py - •
scripts/rapidata_delete_orders.py
references/
- •
references/api.md - •
references/workflows.md - •
references/results.md