AgentSkillsCN

rapidata-use

适用于 Rapidata Python SDK 的各项任务:使用 RapidataClient 进行身份验证,从数据集提交标注任务(比较/分类/排序/自由文本/定位/绘图/选词/时间戳),应用筛选条件/设置/选择,创建并管理验证集,监控或删除任务,获取结果,并运行 MRI 基准测试(基准测试/排行榜/评估/排名)。当用户希望将数据发送至 Rapidata、控制标注员的目标定位、通过验证或提前停止提升质量,或管理 Rapidata 任务/结果时触发。

SKILL.md
--- frontmatter
name: rapidata-use
description: "Use for Rapidata Python SDK tasks: authenticate with RapidataClient, submit labeling orders from datasets (compare/classification/ranking/free-text/locate/draw/select-words/timestamp), apply filters/settings/selections, create and manage validation sets, monitor or delete orders, retrieve results, and run MRI benchmarking (benchmarks/leaderboards/evaluations/standings). Trigger when the user wants to send data to Rapidata, control annotator targeting, improve quality with validation or early stopping, or manage Rapidata orders/results."

Rapidata Use

Overview

Translate user requests into Rapidata SDK workflows: authenticate, submit orders, apply filters/settings/selections, create validation sets, monitor progress, and retrieve results. Use MRI when the user wants model ranking/benchmarking.

Quick start (auth)

  • Initialize with rapi = RapidataClient().
  • If the machine is not authenticated, ask the user to complete the browser login prompt; credentials are saved locally.
  • If the user has client ID/secret, pass them to RapidataClient(client_id=..., client_secret=...).

Task routing (open references only as needed)

  • API surface + method signatures: references/api.md
  • Order/validation workflows + prompt design + early stopping: references/workflows.md
  • Results schema + pandas export: references/results.md

Default assumptions (confirm before submit)

Ask the user to confirm each default when they did not specify it:

  • responses_per_datapoint=10 (or responses_per_comparison=1 for ranking)
  • data_type="media"
  • filters=[UserScoreFilter(0.55, 0.95)]
  • settings=[]
  • selections=[]
  • validation_set_id=None
  • confidence_threshold=None
  • contexts=None, media_contexts=None, private_notes=None
  • Compare: a_b_names=None; Ranking: random_comparisons_ratio=0.5

If the user rejects the default user-score filter, remove it entirely or apply their requested bounds. Treat the defaults above as preferences and always confirm them with the user before submission.

Core workflows

1) Submit a labeling order from a dataset

  • Ask for: order type, instruction/question, datapoints, optional contexts/media_contexts/private_notes, responses_per_datapoint, and any filters/settings/selections.
  • Map the dataset into the expected Rapidata inputs (lists of strings or pairs) and verify all list lengths match.
  • Create the order with rapi.order.create_*_order(...), then call order.preview() (if requested) and order.run().

2) Pairwise tasks (true compare)

  • Use create_compare_order with datapoints=[ [left,right], ... ] (two unique items per datapoint).
  • If the user provides pre-concatenated left/right images, prefer a true compare order by using the original two files; only use classification if the data is irreversibly concatenated.

3) Quality via validation sets

  • Create validation sets with rapi.validation.create_*_set(...) and attach via validation_set_id or selections.
  • If selections are provided, validation_set_id is ignored; pick one approach.

4) Early stopping

  • For classification/compare, set confidence_threshold to stop early once confidence is reached.
  • Keep responses_per_datapoint as the maximum cap.

5) Monitor or manage orders

  • Use order.display_progress_bar(), order.get_results(preliminary_results=...), order.pause(), order.unpause(), order.delete().
  • Use rapi.order.find_orders(...) or rapi.order.get_order_by_id(...) for retrieval.

6) MRI benchmarks

  • Use rapi.mri.create_new_benchmark(...), benchmark.create_leaderboard(...), and benchmark.evaluate_model(...).
  • Retrieve standings from leaderboard.get_standings() or benchmark.get_overall_standings().

Notes and guardrails

  • Prefer writing a script to disk and running it for multi-step actions (avoid inline Python).
  • Ask the user for any clarifications before submitting orders when requirements are ambiguous.
  • Compare datapoints must be pairs of two unique items; ranking requires lists of >=2 unique items per group.
  • API/platform constraints (must enforce): Rapidata rejects contexts longer than 400 characters (truncate to <= 380), and only SFW content should be submitted.
  • Compare orders only allow a_b_names for the A/B labels; the optional "I can't tell" choice comes from AllowNeitherBoth() and cannot be renamed. If you need exact label text like "I can't tell," use a classification order with stitched images instead.
  • Preferences (confirm with user): batch compare/classification orders to <= 100 datapoints each; add resume flags (--start, --total-batches) to avoid re-uploading when rerunning. Reruns create duplicate orders, so prefer an explicit cleanup step (find-by-name + delete) before resubmitting. Credits are consumed on submission—start with a tiny test batch (~10 items), confirm format, then scale up.
  • timestamp order is marked not fully supported; warn the user.
  • SDK delete exists for orders and validation sets; for any finer-grained deletion (individual tasks/responses), ask for the API endpoint or confirm using the OpenAPI client.
  • Quickstart doc notes a 100-datapoint limit per order; ask if they need batching.

Resources

scripts/

  • scripts/rapidata_create_pairwise_pref_order.py
  • scripts/rapidata_delete_orders.py

references/

  • references/api.md
  • references/workflows.md
  • references/results.md