AgentSkillsCN

debug-ios-ci

深入探究 iOS CI 构建——故障、性能表现、不稳定测试、构建耗时。涉及 Buildkite MCP、mobuild/Runway 等工具。

SKILL.md
--- frontmatter
name: debug-ios-ci
description: Investigate iOS CI builds - failures, performance, flaky tests, build times. Buildkite MCP, mobuild/Runway.

Investigate iOS CI

Use this skill to investigate iOS CI builds: debug failures, analyze performance, identify flaky tests, or understand what a build does.

Workflow

Step 1: Get Build Overview

code
mcp__buildkite-mcp__get_build(
  org_slug: "runway",
  pipeline_slug: "wallet",
  build_number: <number>,
  detail_level: "summary"
)

For failures: identify which job(s) failed. For performance: note total duration and job count.

Step 2: Check Log Size and Plan Strategy

Logs are almost always huge. Check size first to determine strategy:

code
mcp__buildkite-mcp__get_logs_info(
  org_slug: "runway",
  pipeline_slug: "wallet",
  build_number: <number>,
  job_id: <job_id>
)
Log SizeStrategy
<50KBDirect analysis with tail_logs(tail: 200)
50-500KBUse search_logs with patterns
>500KBChunk analysis with subagent delegation (see Large Log Strategy)

Step 3: Read Logs Efficiently

For small logs (<50KB) - read the tail directly:

code
mcp__buildkite-mcp__tail_logs(..., tail: 200)

For medium logs (50-500KB) - use targeted searches:

code
mcp__buildkite-mcp__search_logs(..., pattern: "<pattern>", context: 3)

For large logs (>500KB) - delegate to subagents (see Large Log Strategy section).

Error Patterns

Issue TypeSearch Pattern
Xcode errors\[x\]|error:|BUILD FAILED
Rust compilererror\[E\d{4}\]:
Rust field errorsstruct.*has no field named|does not have a field
Test failure\[x\].*Test|FAILED|XCTestCase.*failed
Recipe failureRecipe.*failed.*exit code
Missing bundle\.xcresult.*not found
Swift compilecannot find|type.*has no member
Linkerundefined symbol|ld:.*error
Timing/perfLap Time|elapsed|cache

Step 4: Compare Builds (if needed)

For flaky tests or performance regression, compare multiple builds:

code
mcp__buildkite-mcp__list_builds(
  org_slug: "runway",
  pipeline_slug: "wallet",
  branch: "<branch>",
  per_page: 10
)

Step 5: Investigate Infrastructure (if needed)

If it's an infra issue, explore relevant repos:

bash
# Core Build System
gh repo view squareup/ios-builder                    # Core build toolchain (sqiosbuild)
gh repo view squareup/ios-builder-buildkite-plugin   # Buildkite plugin
gh repo view squareup/macos-environment-buildkite-plugin  # macOS env setup

# Infrastructure
gh repo view squareup/runway                         # Runway infrastructure
gh repo view squareup/tf-mobuild-workers             # Terraform for EC2 workers

Step 6: Escalate if Needed

  • #mdx-ios - iOS build infrastructure
  • #mobuild-buildkite - MoBuild and Buildkite
  • #ci-infrastructure - General CI

Reference

Architecture

code
Buildkite Pipeline → ios-builder-buildkite-plugin → ios-builder (sqiosbuild) → Xcode
                   ↘ macos-environment-buildkite-plugin ↗

iOS builds run on mobuild/Runway infrastructure using Buildkite with macOS EC2 workers managed by MDX team.

Pipeline Configs

Located in .buildkite/mobuild/:

FilePurpose
pipeline.pr.ymlPR builds (unit tests, snapshots, KMP tests)
pipeline.main.ymlMain branch builds
pipeline.release.ios.ymlRelease builds
pipeline.team.testflight.ios.ymlTestFlight builds

Router: .buildkite/pipeline.sh routes based on branch/labels.

Build Actions

ActionDescription
unitRun unit tests
ios-snapshotsRun snapshot tests
debugDebug build
releaseCustomer release build
team-testflightTestFlight build
team-alphaEnterprise distribution

Key Scripts

ScriptPurpose
app/ios/Scripts/CI/BuildProjectMain build script
app/ios/Scripts/CI/BuildReleaseProjectRelease build script
app/ios/Scripts/CI/RunKmpIosTestsKMP iOS test runner

Caching

  • sccache - Rust compilation cache
  • Gradle - Kotlin/KMP build cache

Buildkite Log Format Patterns

All jobs:

MarkerPatternUse
Section divider~~~Major phase boundaries

ios-builder jobs (Xcode builds):

MarkerPatternUse
Action start==== Starting to run action: <name> ====Find specific build steps
Action result^^^^ run <name> FAILED!!! ^^^^Quickly find failures
Failure cascadeWITH PRIOR FAILURE!Distinguish root cause from symptoms
TimingLogging timing event <step>, Lap Time: <s>s, Elapsed Time: <t>sPerformance debugging

Tip: Search for ^^^^ run.*FAILED to find the first failure, then look for WITH PRIOR FAILURE to identify cascading failures vs root cause.

Gradle/KMP jobs:

MarkerPatternUse
Task failure> Task :module:task FAILEDFind failed Gradle task
Build summaryBUILD FAILED inConfirm build failed
Test failureFAILED (with ANSI codes)Find failed tests

Buildkite MCP Guidelines

Always include links to builds/jobs in responses: [Build #123](https://buildkite.com/runway/wallet/builds/123)

Token efficiency (logs can be huge):

  1. Start with get_build using detail_level: "summary"
  2. Use get_logs_info to check log size before reading
  3. Use tail_logs (50-100 lines) for failure context
  4. Use search_logs with patterns for specific issues
  5. Only use read_logs with limit parameter - never unlimited

Large Log Subagent Strategy

For logs >500KB, delegate to subagents to avoid context exhaustion:

TaskModelRationale
Pattern extractionhaikuFast/cheap for extracting errors, timings from log chunks
Root cause analysissonnetCapable model for synthesizing findings, determining fix

Workflow for large logs:

  1. Use search_logs to identify relevant line ranges (e.g., around failures)
  2. Spawn haiku subagent with read_logs(seek: <start>, limit: 500) to extract details
  3. Repeat for multiple failure points if needed
  4. Synthesize findings yourself (or spawn sonnet for complex root cause analysis)

Rule of thumb: Use the fastest model that can do the job. Extraction = fast. Synthesis = capable.