Managing Context Size
This skill teaches you how to avoid context exhaustion when working with large Terra datasets, scattered workflows, and verbose Cromwell metadata.
Context Size Risks
These operations can exhaust your context window:
| Operation | Risk Level | Potential Size |
|---|---|---|
get_job_metadata (full) | CRITICAL | 100K+ tokens per workflow |
get_workflow_logs with many shards | HIGH | 25K+ tokens per task log |
get_entities for large tables | HIGH | 10K+ tokens for 100 entities |
get_submission_status with include_inputs=True | MEDIUM | 30K+ tokens for 9 workflows |
Safe Patterns
1. Always Use Summary Mode for Metadata
code
# SAFE: Returns ~1-2K tokens get_job_metadata(workspace_namespace, workspace_name, submission_id, workflow_id) # DANGEROUS: Can return 100K+ tokens # (There is no "full" mode - the tool protects you)
Summary mode is the default. It returns structured, actionable information without the verbose raw metadata.
2. Get Log URLs Before Fetching Content
For scattered workflows (50+ shards), fetching all logs can quickly exhaust context.
code
# Step 1: Get URLs only (fast, small response)
get_workflow_logs(
workspace_namespace, workspace_name, submission_id, workflow_id,
fetch_content=False # Default
)
# Step 2: Identify which specific tasks failed
get_job_metadata(..., mode="summary")
# Step 3: Fetch content only for failed tasks you need
# (Consider fetching logs one at a time for scattered workflows)
3. Check Table Sizes Before Fetching Entities
code
# Step 1: Check row counts
get_workspace_data_tables(workspace_namespace, workspace_name)
# Response: {"tables": [{"name": "sample", "count": 5000}, ...]}
# Step 2: If count > 100, reconsider whether you need all entities
# Often, specific entity names from workflow inputs are sufficient
4. Limit Workflow Count in Submission Status
code
# Default: Returns first 10 workflows (manageable) get_submission_status(workspace_namespace, workspace_name, submission_id) # For large submissions, the default limit protects you # Only use max_workflows=0 if you truly need all workflow details
5. Exclude Input Resolutions (Default Behavior)
code
# Default: Excludes inputResolutions (94% size reduction) get_submission_status(workspace_namespace, workspace_name, submission_id) # Only include inputs if you need to debug input values get_submission_status(..., include_inputs=True) # Much larger response
Progressive Disclosure Strategy
When investigating workflows, follow this pattern:
- •
Start broad, go narrow:
- •Begin with summaries and overviews
- •Drill into specific items only when needed
- •
Identify targets first:
- •Get workflow IDs before fetching details
- •Get failed task names before fetching logs
- •Get entity names from workflow inputs, not full table dumps
- •
Fetch content last:
- •URLs and summaries first
- •Actual content only for items you're debugging
Context Budget Estimation
Rough token estimates for planning:
| Response Type | Approximate Tokens |
|---|---|
| Submission status (10 workflows, no inputs) | 1,500 |
| Submission status (10 workflows, with inputs) | 20,000 |
| Job metadata summary | 1,500 |
| Workflow logs (URLs only) | 500 per task |
| Workflow logs (with content, truncated) | 8,000 per task |
| Entity table (100 entities) | 10,000 |
| Batch job status | 2,500 |
When Large Responses Are Needed
Sometimes you genuinely need large data. In these cases:
- •Process incrementally: Request data in batches
- •Extract specific fields: Use
get_job_metadata(..., mode="extract", field_path="...") - •Work offline: For very large datasets, consider extracting to files and analyzing externally
Anti-Patterns to Avoid
| Bad Pattern | Good Alternative |
|---|---|
| Fetch all entities for a 5000-row table | Get specific entity names from workflow inputs |
| Fetch all workflow logs in scattered workflow | Get URLs, identify failures, fetch specific logs |
| Include inputs for all workflows | Use default (exclude inputs) unless debugging inputs |
| Request max_workflows=0 for 500-workflow submission | Use default limit, request specific workflow details as needed |
Quick Reference: Safe Defaults
All these tools have context-safe defaults:
- •
get_job_metadata: Summary mode by default - •
get_workflow_logs: URLs only by default (fetch_content=False) - •
get_submission_status: 10 workflows max, no inputs by default - •Log truncation: First 5K + last 20K chars when fetching content