Check Upstream Flake
Check if a failing test is a known upstream flake in the Chromium LUCI Analysis database. This queries the REST API at analysis.api.luci.app to retrieve historical pass/fail/flake data for a test in the Chromium CI infrastructure.
When to Use
- •Investigating intermittent test failures before deciding on a fix approach
- •Evaluating test disable PRs to verify upstream flakiness claims
- •During PR review (via the review skill) when assessing test filter changes
- •Working on "pending" stories that involve Chromium test failures
The Job
When invoked with a test name:
- •Search for matching test IDs in the Chromium LUCI Analysis database
- •Retrieve flakiness statistics for each match over the lookback period
- •Analyze pass/fail/flake rates
- •Report a verdict and recommendation
Usage
# Basic: check a specific test (default 30-day lookback) python3 ./scripts/check-upstream-flake.py "WebUIURLLoaderFactoryTest.RangeRequest" # Longer lookback window python3 ./scripts/check-upstream-flake.py "WebUIURLLoaderFactoryTest.RangeRequest" --days 60 # JSON output (for programmatic use) python3 ./scripts/check-upstream-flake.py "WebUIURLLoaderFactoryTest.RangeRequest" --json # Search by test class name (finds all methods) python3 ./scripts/check-upstream-flake.py "WebUIURLLoaderFactoryTest"
Arguments:
- •
test_name(required): Test name or substring to search for - •
--days N: Lookback window in days (default: 30, max: 90) - •
--json: Output JSON instead of markdown
Exit codes:
- •
0: Success (results found and reported) - •
1: Error (network, API, etc.) - •
2: No matching test IDs found
Interpreting Results
The script produces one of five verdicts:
| Verdict | Flake Rate | Action |
|---|---|---|
| Known upstream flake | >= 5% | Safe to add to filter file. Document upstream flakiness in the filter comment. |
| Occasional upstream failures | 1-5% | Consider filtering. Document findings. May still warrant investigation. |
| Stable upstream | < 1% | Investigate Brave-specific causes. The test is stable in Chromium, so Brave code changes are likely causing the failure. |
| Insufficient data | N/A (<10 verdicts) | Cannot determine from upstream data. Manual investigation needed. |
| Not found | N/A | Test not in Chromium database. May be Brave-specific or use a different ID format. |
Flake rate is calculated as (failed + flaky) / (passed + failed + flaky). Skipped and precluded verdicts are excluded from the rate.
How Results Inform Decisions
Known upstream flake or occasional failures
- •Disabling via filter file is appropriate
- •Use the most specific filter file possible (platform/sanitizer-specific)
- •Include in the filter comment: "Known upstream flake (X% flake rate over N days per LUCI Analysis)"
- •Reference this in commit message and PR body
Stable upstream
- •The test passes reliably in Chromium CI
- •Focus investigation on Brave-specific factors:
- •Check
brave/chromium_src/overrides in related directories - •Look for Brave features that change timing or behavior
- •Check if Brave adds UI elements that affect the test
- •Check
- •A filter disable should be a last resort and needs strong justification
Not found or insufficient data
- •The test may use a different ID format in LUCI
- •Try searching with just the class name or a broader substring
- •Check manually at https://ci.chromium.org/ui/p/chromium/test-search
- •Proceed with normal investigation
API Details
The script uses the LUCI Analysis REST API (pRPC protocol):
- •QueryTests:
POST https://analysis.api.luci.app/prpc/luci.analysis.v1.TestHistory/QueryTests - •QueryStats:
POST https://analysis.api.luci.app/prpc/luci.analysis.v1.TestHistory/QueryStats - •Query (fallback):
POST https://analysis.api.luci.app/prpc/luci.analysis.v1.TestHistory/Query
No authentication is required for public Chromium data.
Test IDs in LUCI follow the format: ninja://{gn_path}:{target}/{TestSuite}.{TestMethod}
Limitations
- •Only covers Chromium upstream data (not Brave CI)
- •Test ID format may not match for all tests
- •Historical data limited to ~90 days
- •Does not compare failure logs/output (only counts pass/fail/flake)
- •Cannot distinguish between different failure modes for the same test