Pluck

INPUT CONTRACT

•target: Collection (variable or ID)
•field: String path (supports dot notation like metadata.uri)
•out: Variable name

REQUIREMENTS:

•Collection MUST contain Notes (not Collections)
•Each Note MUST be dict/JSON object
•Field MUST exist as key in each Note

NOT SUPPORTED:

•❌ Note containing array (use split first)
•❌ Collection of arrays (must be dict Notes)

OUTPUT

Returns Collection of Notes, each containing extracted scalar value. Notes missing field are excluded.

FAILURE SEMANTICS

Empty Collection = expected when:

•Field missing in all Notes
•Type contract violated

Empty ≠ error — indicates no matches, not failure.

Actual failures: Invalid target type or missing parameters.

REPRESENTATION INVARIANTS

•Note containing JSON array ≠ Collection
•Use split to convert array → Collection
•flatten performs inverse (Collection → Note)

CONTENT STRUCTURE

For JSON Notes, content is a dict with fields:

•Top-level fields: text, format, char_count
•Nested fields: metadata.* (e.g., metadata.uri, metadata.title, metadata.year)

Example Note content structure (from semantic-scholar/search-web):

json

{
  "text": "Full text content...",
  "format": "paper",
  "metadata": {
    "title": "Paper Title",
    "uri": "https://example.com/paper.pdf"
  },
  "char_count": 5000
}

FIELD ACCESS EXAMPLES

Extract nested field:

json

{"type":"pluck","target":"$papers","field":"metadata.title","out":"$titles"}

Extract top-level field:

json

{"type":"pluck","target":"$results","field":"text","out":"$texts"}

Extract URI for fetching:

json

{"type":"pluck","target":"$search_results","field":"metadata.uri","out":"$urls"}

ANTI-PATTERNS

❌ pluck(target=$array_note) → Use split first ❌ pluck(target=$coll_of_arrays) → Elements must be dicts ❌ Treating empty result as error → Empty = no matches