AgentSkillsCN

Stata Coding

Stata 编码

SKILL.md

Stata Coding Standards with LOG Documentation

Purpose

This skill defines how to write Stata code with comprehensive logging documentation to make it easy to verify that actual results match expected output.

Core Principles

  1. Every block that produces output must have a LOG comment
  2. LOG comments show what will appear in the actual log file
  3. LOG comments go immediately after the corresponding code block
  4. Use emojis for visual clarity (✓, ❌)
  5. Keep error messages short and concise

Rules for LOG Comments

Rule 1: Single Display Statement

Every display statement gets a LOG comment showing the exact output.

stata
display "Creating output directories..."
// LOG: Creating output directories...

display "Pipeline start time: " c(current_time)
// LOG: Pipeline start time: 14:32:15

Rule 2: If-Else Blocks (Multiple Cases)

Use // case 1 LOG: and // case 2 LOG: for different outcomes.

stata
if "`config_file'" == "" {
    display as error "❌ ERROR: Config file required! Usage: stata-mp -b do script.do <config>"
    exit 198
}
else {
    display "✓ Config file: `config_file'"
}
// case 1 LOG: ❌ ERROR: Config file required! (and exits)
// case 2 LOG: ✓ Config file: config_production_2015-2020

Rule 3: Variable Display with Examples

Show resolved variable values in LOG comments.

stata
display "  Creating: ${output_root}"
capture mkdir "${output_root}"
assert _rc == 0 | _rc == 693
// LOG:   Creating: _WorkSpace/1-CMSStore

Rule 4: Loops - Show Iteration Pattern

For loops, show what each iteration produces.

stata
forvalues year = ${year_start}/${year_end} {
    display ""
    display "########## YEAR `year' ##########"
    // LOG:
    // LOG: ########## YEAR 2015 ##########
    // LOG: ########## YEAR 2016 ##########
    // LOG: (continues for each year)
}

Rule 5: Conditional Blocks (Run Flags)

When execution depends on flags, show both cases.

stata
if ${run_extract} {
    display "STAGE 1: Extract"
    capture noisily do "scripts/extract.do" `year'
}
// case 1 LOG: (skipped - ${run_extract} = 0)
// case 2 LOG: STAGE 1: Extract (followed by output from extract.do)

Rule 6: Foreach Loops

Show example output for each iteration.

stata
local log_files: dir "${log_dir_cms}" files "${config_name}*.log"
foreach logfile of local log_files {
    copy "${log_dir_cms}/`logfile'" "${log_dir_archive}/`logfile'", replace
    display "  Copied: `logfile'"
}
// LOG:   Copied: config_production_2015-2020-master.log
// LOG:   Copied: config_production_2015-2020-extract-2015.log
// LOG:   (one line per log file copied)

Rule 7: Multi-Line Display Blocks

Group related displays together with one LOG comment.

stata
display ""
display "=========================================="
display "Archiving logs to 0-Logging-Store..."
display "=========================================="
// LOG: (blank line)
// LOG: ==========================================
// LOG: Archiving logs to 0-Logging-Store...
// LOG: ==========================================

Rule 8: No Output Blocks

Note when blocks don't produce log output.

stata
capture mkdir "${temp_dir}"
// (no output - capture suppresses display)

log using "${log_file}", replace text
// (opens log file - no output to console)

local myvar = 5
// (no output - local assignment)

Code Structure Standards

Error Handling - Keep It Short

BAD - Too verbose:

stata
if "`config_file'" == "" {
    display as error "ERROR: Config file required!"
    display as error ""
    display as error "Usage (from code/ directory):"
    display as error "  cd code"
    display as error "  stata-mp -b do script.do <config_name>"
    display as error ""
    display as error "Example:"
    display as error "  stata-mp -b do script.do config_production"
    exit 198
}

GOOD - Concise with emoji:

stata
if "`config_file'" == "" {
    display as error "❌ ERROR: Config file required! Usage: stata-mp -b do script.do <config>"
    exit 198
}
else {
    display "✓ Config file: `config_file'"
}
// case 1 LOG: ❌ ERROR: Config file required! (and exits)
// case 2 LOG: ✓ Config file: config_production

Directory Creation - Show Paths

Always display the path before creating directories, then assert success.

stata
display "  Creating: ${output_root}"
capture mkdir "${output_root}"
assert _rc == 0 | _rc == 693  // 0=created, 693=already exists
// LOG:   Creating: _WorkSpace/1-CMSStore

Configuration Loading - Show What's Set

Document what global variables the config file sets.

stata
// Load configuration file
// Sets globals: ${output_root}, ${cms_store}, ${year_start}, ${year_end}, etc.
display "=========================================="
display "CMS DATA PREPARATION PIPELINE"
display "Loading config: `config_file'"
display "=========================================="
// LOG: ==========================================
// LOG: CMS DATA PREPARATION PIPELINE
// LOG: Loading config: config_production_2015-2020
// LOG: ==========================================

capture do "config\`config_file'.do"
if _rc != 0 {
    display as error "ERROR: Could not load config\`config_file'.do"
    exit 198
}
// case 1 LOG: (nothing - config loaded successfully)
// case 2 LOG: ERROR: Could not load config\config_production_2015-2020.do (and exits)

Assertions After Critical Operations

Use assertions to verify operations succeeded.

stata
capture mkdir "${log_dir}"
assert _rc == 0 | _rc == 693  // Fails if permission denied or other error

capture use "data.dta", clear
assert _rc == 0  // Fails if file not found

Summary Checklist

When writing Stata code, ensure:

  • Every display statement has a LOG comment
  • If-else blocks use case 1 LOG: and case 2 LOG:
  • Variables like ${var} show example resolved values in LOG
  • Loops show iteration patterns in LOG
  • Error messages are short with emojis (❌, ✓)
  • Directory creation displays the full path first
  • Assertions verify critical operations
  • LOG comments appear immediately after their corresponding block
  • Multi-line outputs are documented in LOG comments
  • Blocks with no output are noted

Why This Matters

Comprehensive LOG documentation allows you to:

  1. Verify correctness: Compare actual log output against expected LOG comments
  2. Debug faster: Know exactly what should appear at each step
  3. Understand flow: See all possible execution paths (case 1, case 2, etc.)
  4. Catch errors: Assertions fail immediately if something goes wrong
  5. Document behavior: Code is self-documenting with clear output expectations