CI Test Failure Diagnosis
This skill provides patterns and practices for diagnosing GitHub Actions test failures in the Aspire repository using the DownloadFailingJobLogs tool.
Overview
When CI tests fail, use the DownloadFailingJobLogs.cs tool to automatically download logs and artifacts for failed jobs. This tool eliminates manual log hunting and provides structured access to test failures.
Location: tools/scripts/DownloadFailingJobLogs.cs
Quick Start
Step 1: Find the Run ID
Get the run ID from the GitHub Actions URL or use the gh CLI:
# From URL: https://github.com/dotnet/aspire/actions/runs/19846215629 # ^^^^^^^^^^ # run ID # Or find the latest run on a branch gh run list --repo dotnet/aspire --branch <branch-name> --limit 1 --json databaseId --jq '.[0].databaseId' # Or for a PR gh pr checks <pr-number> --repo dotnet/aspire
Step 2: Run the Tool
cd tools/scripts dotnet run DownloadFailingJobLogs.cs -- <run-id>
Example:
dotnet run DownloadFailingJobLogs.cs -- 19846215629
Step 3: Analyze Output
The tool creates files in your current directory:
| File Pattern | Contents |
|---|---|
failed_job_<n>_<job-name>.log | Raw job logs from GitHub Actions |
artifact_<n>_<testname>_<os>.zip | Downloaded artifact zip files |
artifact_<n>_<testname>_<os>/ | Extracted directory with .trx files, logs, binlogs |
What the Tool Does
- •Finds all failed jobs in a GitHub Actions workflow run
- •Downloads job logs for each failed job
- •Extracts test failures and errors from logs using regex patterns
- •Determines artifact names from job names (pattern:
logs-{testShortName}-{os}) - •Downloads test artifacts containing .trx files and test logs
- •Extracts artifacts to local directories for inspection
Example Workflow
# 1. Check failed jobs on a PR
gh pr checks 14105 --repo dotnet/aspire 2>&1 | Where-Object { $_ -match "fail" }
# 2. Get the run ID
$runId = gh run list --repo dotnet/aspire --branch davidfowl/my-branch --limit 1 --json databaseId --jq '.[0].databaseId'
# 3. Download failure logs
cd tools/scripts
dotnet run DownloadFailingJobLogs.cs -- $runId
# 4. Search for errors in downloaded logs
Get-Content "failed_job_0_*.log" | Select-String -Pattern "error|Error:" -Context 2,3 | Select-Object -First 20
# 5. Check .trx files for test failures
Get-ChildItem -Recurse -Filter "*.trx" | ForEach-Object {
[xml]$xml = Get-Content $_.FullName
$xml.TestRun.Results.UnitTestResult | Where-Object { $_.outcome -eq "Failed" }
}
Understanding Job Log Output
The tool prints a summary for each failed job:
=== Failed Job 1/1 === Name: Tests / Integrations macos (Hosting.Azure) / Hosting.Azure (macos-latest) ID: 56864254427 URL: https://github.com/dotnet/aspire/actions/runs/19846215629/job/56864254427 Downloading job logs... Saved job logs to: failed_job_0_Tests___Integrations_macos__Hosting_Azure____Hosting_Azure__macos-latest_.log Errors found (2): - System.InvalidOperationException: Step 'provision-api-service' failed...
Searching Downloaded Logs
Find Errors in Job Logs
# PowerShell Get-Content "failed_job_*.log" | Select-String -Pattern "error|Error:" -Context 2,3 # Bash grep -i "error" failed_job_*.log | head -50
Find Build Failures
Get-Content "failed_job_*.log" | Select-String -Pattern "Build FAILED|error MSB|error CS"
Find Test Failures
Get-Content "failed_job_*.log" | Select-String -Pattern "Failed!" -Context 5,0
Check for Disk Space Issues
Get-Content "failed_job_*.log" | Select-String -Pattern "No space left|disk space"
Check for Timeout Issues
Get-Content "failed_job_*.log" | Select-String -Pattern "timeout|timed out|Timeout"
Using GitHub API for Annotations
Sometimes job logs aren't available (404). Use annotations instead:
gh api repos/dotnet/aspire/check-runs/<job-id>/annotations
This returns structured error information even when full logs aren't downloadable.
Common Failure Patterns
Disk Space Exhaustion
Symptom: No space left on device in annotations or logs
Diagnosis:
gh api repos/dotnet/aspire/check-runs/<job-id>/annotations 2>&1
Common fixes:
- •Add disk cleanup step before build
- •Use larger runner (e.g.,
8-core-ubuntu-latest) - •Skip unnecessary build steps (e.g.,
/p:BuildTests=false)
Command Not Found
Symptom: exit code 127 or command not found
Diagnosis:
Get-Content "failed_job_*.log" | Select-String -Pattern "command not found|exit code 127" -Context 3,1
Common fixes:
- •Ensure PATH includes required tools
- •Use full path to executables
- •Install missing dependencies
Test Timeout
Symptom: Test hangs, then fails with timeout
Diagnosis:
Get-Content "failed_job_*.log" | Select-String -Pattern "Test host process exited|Timeout|timed out"
Common fixes:
- •Increase test timeout
- •Check for deadlocks in test code
- •Review Heartbeat.cs output for resource exhaustion
Build Failure
Symptom: Build FAILED or MSBuild errors
Diagnosis:
Get-Content "failed_job_*.log" | Select-String -Pattern "error CS|error MSB|Build FAILED" -Context 0,3
Common fixes:
- •Check for missing project references
- •Verify package versions
- •Download and analyze .binlog from artifacts
Artifact Contents
Downloaded artifacts typically contain:
artifact_0_TestName_os/ ├── testresults/ │ ├── TestName_net10.0_timestamp.trx # Test results XML │ ├── Aspire.*.Tests_*.log # Console output │ └── recordings/ # Asciinema recordings (CLI E2E tests) ├── *.crash.dmp # Crash dump (if test crashed) └── test.binlog # MSBuild binary log
Parsing .trx Files
# Find all failed tests in .trx files
Get-ChildItem -Path "artifact_*" -Recurse -Filter "*.trx" | ForEach-Object {
Write-Host "=== $($_.Name) ==="
[xml]$xml = Get-Content $_.FullName
$xml.TestRun.Results.UnitTestResult | Where-Object { $_.outcome -eq "Failed" } | ForEach-Object {
Write-Host "FAILED: $($_.testName)"
Write-Host $_.Output.ErrorInfo.Message
Write-Host "---"
}
}
Tips
Clean Up Before Running
Remove-Item *.log -Force -ErrorAction SilentlyContinue Remove-Item *.zip -Force -ErrorAction SilentlyContinue Remove-Item -Recurse artifact_* -Force -ErrorAction SilentlyContinue
Run From tools/scripts Directory
The tool creates files in the current directory, so run it from tools/scripts to keep things organized:
cd tools/scripts dotnet run DownloadFailingJobLogs.cs -- <run-id>
Don't Commit Log Files
The downloaded log files can be large. Don't commit them to the repository:
# Before committing rm tools/scripts/*.log rm tools/scripts/*.zip rm -rf tools/scripts/artifact_*
Prerequisites
- •.NET 10 SDK or later
- •GitHub CLI (
gh) installed and authenticated - •Access to the dotnet/aspire repository
See Also
- •
tools/scripts/README.md- Full documentation - •
tools/scripts/Heartbeat.cs- System monitoring tool for diagnosing hangs - •
.github/skills/cli-e2e-testing/SKILL.md- CLI E2E test troubleshooting