AgentSkillsCN

Spreadsheet Analysis

表格分析

SKILL.md
--- frontmatter
name: Spreadsheet Analysis
when_to_use: Analyzing xlsx/csv data, creating reports, building financial models, data manipulation tasks

Overview

Never hardcode calculations. Always use Excel formulas. This keeps spreadsheets dynamic and updateable.

Implementation Checklist

  • Clarify objectives - What insights or outputs does partner need?
  • Examine structure - Use Read tool to understand file layout
  • Explore data - Use pandas for initial exploration: python import pandas as pd df = pd.read_excel('file.xlsx') df.info() df.describe() df.head()
  • Choose approach: - pandas: For data analysis, transformations, simple exports - openpyxl: For formulas, formatting, preserving Excel features
  • Build analysis - Use formulas, not hardcoded values: - ✅ sheet['B10'] = '=SUM(B2:B9)' - ❌ sheet['B10'] = sum(values)
  • Apply formatting if creating output: - Blue text: Inputs users will change - Black text: Formulas and calculations - Yellow background: Key assumptions
  • Recalculate formulas (if using openpyxl): bash python ${HOME}/.config/agent/skills/skills/tools/xlsx/recalc.py output.xlsx
  • Verify zero errors - Check for #REF!, #DIV/0!, #VALUE!, etc.
  • Test edge cases - Try with zeros, negatives, very large values
  • Document assumptions - Add comments to cells with key formulas

Quality Standards

Zero Tolerance for Formula Errors

Every Excel file MUST be delivered with ZERO formula errors:

  • No #REF! (invalid references)
  • No #DIV/0! (division by zero)
  • No #VALUE! (wrong data type)
  • No #N/A (lookup failures)
  • No #NAME? (unrecognized function)

Formula Construction

Use cell references, not hardcoded values:

python
# BAD - Hardcodes growth rate
sheet['C5'] = revenue * 1.15

# GOOD - References assumption cell
sheet['C5'] = '=C4*(1+$B$2)'  # Where B2 contains growth rate

Check denominators before division:

python
# BAD - Might divide by zero
sheet['D5'] = '=C5/C4'

# GOOD - Handles zero case
sheet['D5'] = '=IF(C4=0,"-",C5/C4)'

Data Integrity

  • Verify all cell references point to intended cells
  • Check for off-by-one errors in ranges
  • Ensure consistent formulas across projection periods
  • Test with edge cases before declaring complete

Workflow Patterns

Pattern 1: Quick Data Analysis

For exploration and simple exports:

python
import pandas as pd

# Read and analyze
df = pd.read_excel('input.xlsx')
df.describe()
df.groupby('category')['sales'].sum()

# Export results
df.to_excel('analysis.xlsx', index=False)

Pattern 2: Creating Excel Files with Formulas

For dynamic spreadsheets:

python
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill

wb = Workbook()
sheet = wb.active

# Add data and formulas
sheet['A1'] = 'Revenue'
sheet['B1'] = 'Growth Rate'
sheet['B2'] = 0.15  # Input (blue)
sheet['A3'] = 'Year 1'
sheet['B3'] = 1000
sheet['A4'] = 'Year 2'
sheet['B4'] = '=B3*(1+$B$2)'  # Formula (black)

# Format
sheet['B2'].font = Font(color='0000FF')  # Blue for input
sheet['B4'].font = Font(color='000000')  # Black for formula

wb.save('model.xlsx')

# MUST recalculate
import subprocess
subprocess.run(['python', 'recalc.py', 'model.xlsx'])

Pattern 3: Editing Existing Files

Preserve existing structure:

python
from openpyxl import load_workbook

# Load without losing formulas
wb = load_workbook('existing.xlsx')
sheet = wb['SheetName']

# Modify cells
sheet['A1'] = 'Updated Value'
sheet['B5'] = '=SUM(B2:B4)'  # Add formula

# Save and recalculate
wb.save('updated.xlsx')
subprocess.run(['python', 'recalc.py', 'updated.xlsx'])

Common Pitfalls

Using data_only=True and saving - This replaces formulas with values permanently ❌ Forgetting to recalculate - openpyxl doesn't evaluate formulas ❌ Hardcoding calculated values - Makes spreadsheet static ❌ Not checking for #DIV/0! - Test edge cases ❌ Wrong cell references - Off-by-one errors are common ❌ Mixing 0-based and 1-based indexing - pandas is 0-based, Excel is 1-based

Anti-Patterns

Calculating in Python, hardcoding result:

python
total = df['Sales'].sum()  # Calculates in Python
sheet['B10'] = total       # Hardcodes 5000

Using Excel formula:

python
sheet['B10'] = '=SUM(B2:B9)'  # Excel calculates dynamically

Not verifying formula errors:

python
wb.save('output.xlsx')  # Done, right?

Always verify:

python
wb.save('output.xlsx')
result = subprocess.run(['python', 'recalc.py', 'output.xlsx'],
                       capture_output=True, text=True)
errors = json.loads(result.stdout)
if errors['status'] == 'errors_found':
    print(f"Errors: {errors['error_summary']}")
    # Fix errors and retry

Tools

Python Libraries

  • pandas - Data analysis, reading/writing Excel
  • openpyxl - Creating/editing with formulas and formatting
  • xlsxwriter - Alternative for writing (no editing existing files)

Scripts

  • recalc.py - Recalculates all formulas using LibreOffice
    bash
    python ~/.config/agent/skills/skills/tools/xlsx/recalc.py file.xlsx
    
    Returns JSON with error details and counts

Supporting

For detailed tool documentation and reference, see:

  • skills/tools/xlsx/ - Comprehensive xlsx manipulation guide