carbon.data.qa

Purpose

This skill enables Claude to answer factual, analytical questions about carbon accounting data by querying Carbon ACX's internal datasets (CSV files in data/ directory), derived artifacts, and the local API when running. It encodes domain knowledge about:

•Carbon accounting terminology and units (tCO2e, kWh, pkm, etc.)
•Emission factor structures and relationships
•Activity-to-emissions calculations
•Temporal data queries (Q1 2024, monthly totals, etc.)
•Layer, sector, and profile hierarchies

When to Use

Trigger Patterns:

•User asks about emissions data: "What were total CO2 emissions for Q1 2024?"
•Queries about specific activities: "What's the emission factor for streaming video?"
•Comparative questions: "Compare emissions from cloud storage vs local storage"
•Data exploration: "Show me all activities in the professional services layer"
•Unit conversions: "Convert 500 kWh to tCO2e"
•Source/provenance queries: "Where does the video streaming data come from?"

Do NOT Use When:

•User wants to generate reports (use carbon.report.gen instead)
•User wants to write code (use acx.code.assistant instead)
•Questions about repo structure or development setup
•Non-carbon-accounting questions

Allowed Tools

•read_file - Read CSV data files, JSON artifacts, schemas
•python - Process data, perform calculations, query APIs
•grep - Search for specific activities or emission factors
•bash - Run simple data queries via command line (read-only)

Access Level: 1 (Local Execution - read-only, no file writes, no external network)

Tool Rationale:

•read_file: Required to access canonical CSV data in data/ directory
•python: Needed for parsing CSVs, JSON artifacts, performing unit conversions and emission calculations
•grep: Efficient searching through data files for specific patterns
•bash: Helpful for quick file inspection and data exploration

Explicitly Denied:

•write_file, edit_file - This is a read-only analytical skill
•web_fetch with external URLs - Only internal localhost API endpoints allowed

Expected I/O

Input:

•Type: Natural language question (string)
•Format: Free-form query about carbon data
•Constraints: Must relate to carbon accounting, emissions, or activities in the dataset
•
Examples:
- •"What is the emission factor for coffee?"
- •"Total emissions from video streaming in 2024"
- •"List all military operations activities"
- •"What units are used for grid intensity?"

Output:

•Type: Structured answer with data, units, and citations
•Format: Markdown with tables, bullet lists, and inline values
•
Requirements:
- •MUST include units (tCO2e, kWh, etc.) with all numeric answers
- •MUST cite data sources - reference source_id from data/sources.csv
- •MUST include timestamp - data vintage or "as of" date
- •Handle ambiguity by asking clarifying questions

•Example:

markdown

**Emission Factor for HD Video Streaming:**

- Activity: `MEDIA.STREAM.HD.HOUR` (HD video streaming per hour)
- Emission Factor: 0.055 kgCO2e/hour
- Unit: kgCO2e per hour of streaming
- Source: [SOURCE_ID_123] - "Streaming Energy Report 2023"
- Vintage: 2023
- Notes: Includes device playback + network delivery

Validation:

•Every numeric value has explicit units
•Sources are referenced by source_id
•"Unknown" or "Data not available" for missing data (never guess)
•Calculations show methodology

Dependencies

Required:

•Access to Carbon ACX data directory (data/)
•Python 3.11+ with pandas, PyYAML
•Understanding of data schema (see reference/data_schema.md)
•Carbon accounting units glossary (see reference/units_glossary.md)

Data Files:

•data/activities.csv - Activity catalog
•data/emission_factors.csv - Emission factors
•data/layers.csv - Layer definitions
•data/sectors.csv - Sector taxonomy
•data/units.csv - Unit definitions and conversions
•data/sources.csv - Data provenance
•data/profiles.csv - Activity profiles
•calc/outputs/ - Derived artifacts (if available)

Optional:

•Local API at http://localhost:8787/api (when Worker is running)
•Derived JSON manifests in dist/artifacts/

Examples

Example 1: Basic Emission Factor Query

User: "What's the emission factor for a cup of coffee?"

Claude Process:

•Recognize this is a carbon data query → invoke carbon.data.qa skill
•Search data/activities.csv for coffee-related activities
•Find matching emission factor in data/emission_factors.csv
•Look up source details in data/sources.csv
•Format answer with units and citation

Output:

markdown

**Coffee Emission Factor:**

- Activity: `FOOD.COFFEE.CUP.HOT` (12 oz hot coffee cup)
- Emission Factor: 0.021 kgCO2e/cup
- Source: [SRC_FOOD_2023] Coffee supply chain LCA
- Vintage: 2023
- Scope: Includes bean production, roasting, brewing, waste

Example 2: Aggregation Query

User: "Show me total emissions from all social media activities"

Claude Process:

•Query data/activities.csv for activities with category=social
•Load corresponding emission factors
•Aggregate by activity type
•Present as table with sources

Output:

markdown

**Social Media Activity Emissions (per hour):**

| Activity | Emission Factor | Source |
|----------|----------------|--------|
| YouTube streaming | 0.046 kgCO2e/hr | SRC_MEDIA_2024 |
| Facebook browsing | 0.038 kgCO2e/hr | SRC_SOCIAL_2024 |
| Instagram scrolling | 0.041 kgCO2e/hr | SRC_SOCIAL_2024 |
| TikTok watching | 0.043 kgCO2e/hr | SRC_SOCIAL_2024 |

*All values include device use + network transfer*
*Data vintage: 2024*

Example 3: Ambiguous Query Handling

User: "What are the emissions for streaming?"

Claude Process:

•Recognize ambiguity - multiple streaming types exist
•Ask clarifying question before searching

Output:

markdown

I found multiple streaming activities in the dataset:

1. **Video streaming:**
   - HD video (1080p)
   - UHD/4K video
   - SD video (mobile)

2. **Audio streaming:**
   - Standard quality music

3. **Platform-specific:**
   - YouTube
   - Social media video (TikTok, Instagram, etc.)

Which type of streaming would you like emission data for?

Example 4: Unit Conversion Query

User: "How much CO2 does 500 kWh of Ontario grid electricity produce?"

Claude Process:

•Look up Ontario grid emission factor in data/emission_factors.csv
•Find activity ENERGY.CA-ON.GRID.KWH
•Perform calculation: 500 kWh × emission factor
•Show calculation with units

Output:

markdown

**Ontario Grid Electricity Emissions:**

- Activity: `ENERGY.CA-ON.GRID.KWH`
- Consumption: 500 kWh
- Emission Factor: 0.025 kgCO2e/kWh (Ontario IESO grid average)
- **Total Emissions: 12.5 kgCO2e** (0.0125 tCO2e)

*Calculation: 500 kWh × 0.025 kgCO2e/kWh = 12.5 kgCO2e*
*Source: [SRC_GRID_ON_2024]*
*Vintage: 2024 annual average*

Example 5: Edge Case - Missing Data

User: "What's the emission factor for blockchain mining?"

Claude Process:

•Search data/activities.csv for blockchain/crypto activities
•No matches found
•Return graceful "not found" response

Output:

markdown

**Data Not Available**

I could not find emission factors for blockchain or cryptocurrency mining in the current Carbon ACX dataset (v1.2).

The dataset currently focuses on:
- Professional services and food
- Digital infrastructure (streaming, cloud, AI)
- Industrial operations
- Transportation and logistics
- Military and defense operations
- Earth system feedbacks

You might be interested in related activities:
- Cloud server operations (`ONLINE.DC.CLOUD.SERVER.HOUR`)
- Data center rack usage (`ONLINE.DC.COLOCATION.RACK.MONTH`)

Would you like information on any of these instead?

Limitations

Known Edge Cases:

•Cannot answer questions requiring data not in the CSV files
•Temporal queries limited to vintage years present in dataset
•Cannot perform predictive modeling or forecasting
•Regional data limited to what's explicitly coded (e.g., Ontario grid)
•Some activities have emission factors marked as "to be added"

Performance Constraints:

•Large aggregations across all activities may take 5-10 seconds
•Complex cross-layer queries require multiple file reads
•Derived artifacts may not always be up-to-date with source CSVs

Security Boundaries:

•Read-only access to data files
•No external API calls (except localhost Worker API)
•Cannot modify source data
•Cannot access files outside data/ or calc/outputs/ directories

Scope Limitations:

•Answers based solely on Carbon ACX dataset - no external knowledge
•Does not perform lifecycle assessments beyond what's in emission factors
•Does not provide regulatory compliance advice
•Does not make emission reduction recommendations (analytical only)

Validation Criteria

Success Metrics:

•✅ All numeric answers include explicit units (kgCO2e, tCO2e, etc.)
•✅ Every emission factor cites source_id or notes if source missing
•✅ Data vintage/timestamp included in responses
•✅ Ambiguous queries prompt for clarification before answering
•✅ Missing data returns graceful "not found" rather than guessing
•✅ Calculations show methodology (formula with units)
•✅ Responses match data files exactly (no hallucination)

Failure Modes:

•❌ Returns emission values without units → REJECT
•❌ Makes up data not in CSV files → REJECT
•❌ Provides answers without source attribution → WARN
•❌ Performs calculations with wrong units → REJECT
•❌ Answers ambiguous questions without clarification → WARN

Recovery:

•If uncertain about data interpretation: Ask user for clarification
•If data missing: Explicitly state "Data not available" and suggest alternatives
•If calculation complex: Show step-by-step methodology
•If source missing: Note "Source not specified in dataset"

Related Skills

Dependencies:

•None - this is a foundational skill

Composes With:

•carbon.report.gen - Use this skill to gather data, then generate reports
•acx.code.assistant - This skill informs what data structures exist for code generation

Alternative Skills:

•For report generation: carbon.report.gen
•For code generation: acx.code.assistant
•For schema validation: schema.linter

Maintenance

Owner: ACX Team Review Cycle: Monthly (align with dataset releases) Last Updated: 2025-10-18 Version: 1.0.0

Maintenance Notes:

•Update when new CSV files added to data/
•Review when emission factor schema changes
•Validate examples against current dataset version
•Keep reference/data_schema.md synchronized with actual schema