AgentSkillsCN

Extract Workflow Data

高效提取Cromwell工作流元数据中的特定数据。在无需加载全部超过10万条标记元数据的情况下,只需提取任务输出、运行时属性或特定字段时使用此功能。

SKILL.md
--- frontmatter
description: Extract specific data from Cromwell workflow metadata efficiently. Use when you need task outputs, runtime attributes, or specific fields without loading full 100K+ token metadata.

Extracting Workflow Data Efficiently

This skill teaches you how to extract specific data from Cromwell workflow metadata without loading the entire 100K+ token metadata blob into your context.

Recommended Usage Pattern

Step 1: Start with Summary Mode (Default)

code
get_job_metadata(workspace_namespace, workspace_name, submission_id, workflow_id)

Summary mode returns a structured, context-efficient summary (~1-2K tokens) including:

  • Workflow status and timing
  • Task counts by status
  • Failed task details with errors
  • Execution summary

This gives you the lay of the land without context exhaustion.

Step 2: Extract Specific Data with Extract Mode

Use semantic parameters to get exactly what you need:

Get a Specific Task Output

code
get_job_metadata(
    workspace_namespace, workspace_name, submission_id, workflow_id,
    mode="extract",
    task_name="illumina_demux",
    output_name="commonBarcodes"
)

Get Output from a Specific Shard

For scattered tasks with multiple shards:

code
get_job_metadata(
    workspace_namespace, workspace_name, submission_id, workflow_id,
    mode="extract",
    task_name="align_reads",
    shard_index=5,
    output_name="aligned_bam"
)

Get Runtime Attributes for All Tasks

Use wildcards with dot-path notation:

code
get_job_metadata(
    workspace_namespace, workspace_name, submission_id, workflow_id,
    mode="extract",
    field_path="calls.*.runtimeAttributes"
)

This returns a dictionary mapping task names to their runtime attributes.

Dot-Path Syntax Reference

The field_path parameter supports flexible extraction using dot notation:

SyntaxMeaningExample
key.subkeyNested accesscalls.task1.outputs
key[N]Array indexingcalls.task1[0].outputs
key.*Wildcard (all keys)calls.*.executionStatus

Examples

code
# Get execution status of all tasks
field_path="calls.*.executionStatus"

# Get first shard's outputs for a specific task
field_path="calls.my_task[0].outputs"

# Get all runtime attributes
field_path="calls.*.runtimeAttributes"

# Get preemptible setting across all tasks
field_path="calls.*.runtimeAttributes.preemptible"

# Get backend status
field_path="calls.*[0].backendStatus"

Multiple Extractions

If you need multiple pieces of data, make multiple extract calls rather than trying to get everything at once. Each call returns only what you need:

python
# Call 1: Get specific output
get_job_metadata(..., mode="extract", task_name="task1", output_name="result")

# Call 2: Get runtime attributes
get_job_metadata(..., mode="extract", field_path="calls.task1[0].runtimeAttributes")

# Call 3: Get another task's output
get_job_metadata(..., mode="extract", task_name="task2", output_name="file")

This is more context-efficient than loading full metadata.

Error Handling

When extraction fails, the tool provides helpful error messages:

Task not found:

code
Task 'xyz' not found. Available tasks: ['workflow.task1', 'workflow.task2', ...]

Output not found:

code
Output 'xyz' not found for task 'task1'. Available outputs: ['file', 'count', 'summary']

Shard not found:

code
Shard 5 not found for task 'task1'. Available shards: [0, 1, 2, 3]

When to Use Each Mode

ScenarioModeParameters
Initial explorationsummary(default)
Get specific task outputextracttask_name, output_name
Get scattered task shard outputextracttask_name, shard_index, output_name
Compare settings across tasksextractfield_path with wildcard
Get deep nested fieldextractfield_path with dot notation

Anti-Patterns

Never do this:

  • Repeatedly call get_job_metadata without specifying what you need
  • Try to extract everything in one call

Do this instead:

  • Start with summary mode to understand the workflow
  • Make targeted extract calls for specific data you need
  • Use wildcards to batch-extract across tasks