Update CodeQL Query Dataflow for Go
This skill guides you through migrating Go CodeQL queries from the legacy v1 (language-specific) dataflow API to the modern v2 (shared) dataflow API while ensuring query results remain equivalent.
When to Use This Skill
- •Migrating Go queries using deprecated
DataFlow::Configurationclasses - •Updating queries to use
DataFlow::ConfigSigmodules - •Modernizing Go queries to use the shared dataflow library
- •Ensuring query result equivalence during dataflow API migration
Prerequisites
- •Existing Go CodeQL query using v1 dataflow API that you want to migrate
- •Existing unit tests for the query
- •Understanding of the query's detection purpose
- •Access to CodeQL Development MCP Server tools
Key Dataflow API Changes (v1 → v2)
Configuration Class → Configuration Module
v1 (Legacy):
class MyConfig extends DataFlow::Configuration {
MyConfig() { this = "MyConfig" }
override predicate isSource(DataFlow::Node source) { ... }
override predicate isSink(DataFlow::Node sink) { ... }
override predicate isSanitizer(DataFlow::Node node) { ... }
override predicate isAdditionalTaintStep(DataFlow::Node n1, DataFlow::Node n2) { ... }
}
v2 (Modern):
module MyConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) { ... }
predicate isSink(DataFlow::Node sink) { ... }
predicate isBarrier(DataFlow::Node node) { ... }
predicate isAdditionalFlowStep(DataFlow::Node n1, DataFlow::Node n2) { ... }
}
module MyFlow = TaintTracking::Global<MyConfig>;
Key Terminology Changes
| v1 API | v2 API | Purpose |
|---|---|---|
DataFlow::Configuration | DataFlow::ConfigSig | Configuration signature |
isSanitizer | isBarrier | Stop data flow propagation |
isAdditionalTaintStep | isAdditionalFlowStep | Custom flow steps |
this.hasFlow(source, sink) | MyFlow::flow(source, sink) | Query flow paths |
Go-Specific Node Types
Go dataflow uses multiple node representations:
- •
ExprNode: AST expression nodes (e.g., function calls, literals) - •
ParameterNode: Function parameter nodes - •
InstructionNode: IR (intermediate representation) instruction nodes - •
RemoteFlowSource: Predefined sources for user-controllable input
Migration Workflow
Phase 1: Establish Test Baseline (TDD Foundation)
Critical: Before any code changes, capture current query behavior.
Step 1: Run Existing Tests
Use codeql_test_run to establish baseline:
{
"testPath": "<query-pack>/test/{QueryName}",
"searchPath": ["<query-pack>"]
}
Save the output - this is your reference for query result equivalence.
Step 2: Document Current Results
Create a reference file with current results:
cp <query-pack>/test/{QueryName}/{QueryName}.expected \
<query-pack>/test/{QueryName}/{QueryName}.expected.v1-baseline
This ensures you can verify equivalence after migration.
Phase 2: Analyze Current Query
Step 3: Identify v1 Patterns
Review the query for v1 API usage:
- •
class X extends DataFlow::Configuration - •
isSanitizerpredicates - •
isAdditionalTaintSteppredicates - •
this.hasFlow(source, sink)queries
Step 4: Understand Go-Specific Flow
Identify how the query uses Go dataflow constructs:
- •AST-to-IR mappings (e.g.,
asExpr(),asInstruction()) - •
RemoteFlowSourcefor user input - •Go-specific sources:
os.Args,os.Getenv, HTTP request parameters - •Go-specific sinks:
os/exec.Command,database/sql.Query, file operations
Phase 3: Migrate to v2 API
Step 5: Convert Configuration Class to Module
Before:
class MyConfig extends DataFlow::Configuration {
MyConfig() { this = "MyConfig" }
override predicate isSource(DataFlow::Node source) {
source instanceof RemoteFlowSource
}
override predicate isSink(DataFlow::Node sink) {
exists(DataFlow::CallNode call |
call.getTarget().hasQualifiedName("os/exec", "Command") and
sink = call.getAnArgument()
)
}
override predicate isSanitizer(DataFlow::Node node) {
node = any(SanitizationCall c).getResult()
}
}
from MyConfig cfg, DataFlow::Node source, DataFlow::Node sink
where cfg.hasFlow(source, sink)
select sink, "Untrusted data flows to command execution"
After:
module MyConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
source instanceof RemoteFlowSource
}
predicate isSink(DataFlow::Node sink) {
exists(DataFlow::CallNode call |
call.getTarget().hasQualifiedName("os/exec", "Command") and
sink = call.getAnArgument()
)
}
predicate isBarrier(DataFlow::Node node) {
node = any(SanitizationCall c).getResult()
}
}
module MyFlow = TaintTracking::Global<MyConfig>;
from DataFlow::Node source, DataFlow::Node sink
where MyFlow::flow(source, sink)
select sink, "Untrusted data flows to command execution"
Step 6: Rename Predicates
- •
isSanitizer→isBarrier: Change method name only, logic unchanged - •
isAdditionalTaintStep→isAdditionalFlowStep: Change method name only
Step 7: Update Flow Queries
Replace cfg.hasFlow(source, sink) with MyFlow::flow(source, sink):
- •Remove configuration variable from
fromclause - •Use module flow predicate directly
Phase 4: Handle Go-Specific Migration Patterns
Step 8: AST-to-IR Node Conversions
Ensure proper node type handling:
// v1 and v2 both support these conversions DataFlow::Node n; Expr e = n.asExpr(); // AST expression Instruction i = n.asInstruction(); // IR instruction Parameter p = n.asParameter(); // Function parameter
Step 9: Go RemoteFlowSource Usage
RemoteFlowSource works identically in v1 and v2:
predicate isSource(DataFlow::Node source) {
source instanceof RemoteFlowSource or
source.asExpr().(CallExpr).getTarget().hasQualifiedName("os", "Getenv") or
// In Go, main function is in package main with empty qualifier
source.asParameter().getFunction().hasQualifiedName("", "main")
}
Step 10: Channel and Goroutine Flow
For concurrent flow patterns, ensure proper tracking:
predicate isAdditionalFlowStep(DataFlow::Node n1, DataFlow::Node n2) {
// Basic channel send → receive flow
// Note: This is a simplified example for unbuffered channels.
// Complex scenarios (buffered channels, select statements, channel closures)
// require more sophisticated tracking.
exists(SendStmt send, RecvExpr recv |
n1.asExpr() = send.getValue() and
n2.asExpr() = recv and
send.getChannel() = recv.getChannel()
)
}
Phase 5: Validate Equivalence Through Testing
Step 11: Compile Migrated Query
Use codeql_query_compile to check for errors:
{
"queryPath": "<query-pack>/src/{QueryName}/{QueryName}.ql",
"searchPath": ["<query-pack>"]
}
Fix any compilation errors before testing.
Step 12: Run Tests and Compare Results
Use codeql_test_run on migrated query:
{
"testPath": "<query-pack>/test/{QueryName}",
"searchPath": ["<query-pack>"]
}
Critical: Results MUST match baseline from Phase 1.
Step 13: Verify Result Equivalence
Compare results line-by-line:
diff <query-pack>/test/{QueryName}/{QueryName}.expected.v1-baseline \
<query-pack>/test/{QueryName}/{QueryName}.expected
Success: Empty diff (identical results) Failure: Any differences require investigation and fixes
Phase 6: Expand Test Coverage (Optional)
If baseline tests pass, add more test cases to ensure robustness:
Step 14: Add Edge Case Tests
Create additional test files covering:
- •Complex goroutine data sharing patterns
- •Interface type assertions and conversions
- •Error handling flow patterns (ignored errors, wrapped errors)
- •Stdlib sink variations (
exec.CommandContext,sql.Prepare) - •Channel-based concurrent flows
For each new test:
- •Add test code to
Example2.go,Example3.go, etc. - •Update
.expectedfile with anticipated results - •Re-extract test database with
codeql_test_extract - •Run tests to verify
Phase 7: Performance Validation
Step 15: Check Query Performance
Run query on realistic database and monitor performance:
{
"query": "<query-pack>/src/{QueryName}/{QueryName}.ql",
"database": "<path-to-realistic-go-database>",
"searchPath": ["<query-pack>"]
}
If performance degrades significantly, consider:
- •Caching expensive predicates with
cached - •Using local flow instead of global flow where possible
- •Limiting scope with additional constraints
Phase 8: Finalize Migration
Step 16: Update Query Metadata
Ensure query metadata reflects v2 API usage:
/** * @name Command Injection via Untrusted Data * @description Executes system commands with user-controllable data * @kind path-problem * @id go/command-injection * @tags security */ import go import DataFlow::PathGraph
Step 17: Clean Up and Document
- •Remove v1 baseline files after verification
- •Add migration notes in query comments if helpful
- •Format query with
codeql_query_format
Go-Specific Dataflow Considerations
Error Handling Patterns
Go's explicit error handling affects dataflow:
// Track flows through error-returning functions
predicate isAdditionalFlowStep(DataFlow::Node n1, DataFlow::Node n2) {
exists(CallExpr call, DataFlow::ResultNode result |
n1 = call.getAnArgument() and
n2 = result and
result.getCall() = call and
// Function returns (value, error) pair - track the value (index 0)
call.getType() instanceof TupleType and
result.hasResultIndex(0)
)
}
Interface Type Flows
Track flows through interface conversions:
predicate isAdditionalFlowStep(DataFlow::Node n1, DataFlow::Node n2) {
exists(TypeAssertExpr assertion |
n1.asExpr() = assertion.getExpr() and
n2.asExpr() = assertion
)
}
Pointer Dereferences
Track flows through pointer operations:
predicate isAdditionalFlowStep(DataFlow::Node n1, DataFlow::Node n2) {
exists(StarExpr deref |
n1.asExpr() = deref.getBase() and
n2.asExpr() = deref
)
}
MCP Tools Reference
- •
codeql_test_run: Run tests and compare with expected results - •
codeql_test_extract: Extract test databases from Go source code - •
codeql_query_compile: Compile queries and check for errors - •
codeql_query_run: Run queries for analysis - •
codeql_bqrs_decode: Decode binary query results - •
codeql_query_format: Format query files for consistency - •
codeql_pack_install: Install query pack dependencies
Common Migration Pitfalls
❌ Don't:
- •Skip baseline test establishment before migration
- •Change query logic alongside API migration (separate concerns)
- •Accept test results without verifying equivalence
- •Remove v1 baseline until migration is confirmed successful
- •Ignore performance regressions
- •Forget to update imports if needed
✅ Do:
- •Establish test baseline BEFORE any changes
- •Make purely mechanical API changes first
- •Verify exact result equivalence after migration
- •Keep v1 baseline for comparison during migration
- •Test edge cases specific to Go (goroutines, channels, interfaces)
- •Document any intentional behavior changes separately
Troubleshooting Non-Equivalent Results
If results differ after migration:
- •Check node type conversions: Ensure
asExpr(),asInstruction()usage is correct - •Verify predicate renames: Confirm
isBarriervsisSanitizerlogic is identical - •Review flow predicates: Check
isAdditionalFlowStepmirrorsisAdditionalTaintStep - •Inspect missing results: Use
MyFlow::flow(source, sink)for debugging partial flows - •Debug with partial flow: Use flow exploration to find missing edges
Documentation References
- •New dataflow API for writing custom CodeQL queries - Official v2 API announcement
- •Analyzing data flow in Go - Go dataflow guide
- •CodeQL Go Library Reference - Standard library documentation
Related Resources
- •Create CodeQL Query TDD Generic - TDD workflow for queries
- •QSpec Reference for Go - Go-specific QSpec patterns
- •Go Query Development Prompts - Go query guidance
Success Criteria
Your dataflow migration is successful when:
- •✅ Test baseline established before migration
- •✅ Query compiles without errors using v2 API
- •✅ All configuration classes converted to modules
- •✅ All
isSanitizerrenamed toisBarrier - •✅ All
isAdditionalTaintSteprenamed toisAdditionalFlowStep - •✅ All
cfg.hasFlow()calls replaced with module flow predicates - •✅ Test results EXACTLY match v1 baseline (zero diff)
- •✅ No performance regressions
- •✅ Query metadata updated appropriately
- •✅ Go-specific patterns (goroutines, channels, errors) handled correctly