Update CodeQL Query Dataflow for Ruby
This skill guides you through migrating Ruby CodeQL queries from the legacy v1 (language-specific) dataflow API to the modern v2 (shared) dataflow API while ensuring query result equivalence.
When to Use This Skill
- •Migrating Ruby queries using deprecated
DataFlow::Configurationclasses - •Updating queries to use
DataFlow::ConfigSigmodules - •Modernizing Ruby queries to use the shared dataflow library
- •Ensuring query result equivalence during dataflow API migration
Prerequisites
- •Existing Ruby CodeQL query using v1 dataflow API that you want to migrate
- •Existing unit tests for the query
- •Understanding of the query's detection purpose
- •Access to CodeQL Development MCP Server tools
Key Dataflow API Changes (v1 → v2)
Configuration Class → Configuration Module
v1 (Legacy):
class MyConfig extends DataFlow::Configuration {
MyConfig() { this = "MyConfig" }
override predicate isSource(DataFlow::Node source) { ... }
override predicate isSink(DataFlow::Node sink) { ... }
override predicate isSanitizer(DataFlow::Node node) { ... }
override predicate isAdditionalTaintStep(DataFlow::Node n1, DataFlow::Node n2) { ... }
}
v2 (Modern):
module MyConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) { ... }
predicate isSink(DataFlow::Node sink) { ... }
predicate isBarrier(DataFlow::Node node) { ... }
predicate isAdditionalFlowStep(DataFlow::Node n1, DataFlow::Node n2) { ... }
}
module MyFlow = TaintTracking::Global<MyConfig>;
Key Terminology Changes
| v1 API | v2 API | Purpose |
|---|---|---|
DataFlow::Configuration | DataFlow::ConfigSig | Configuration signature |
isSanitizer | isBarrier | Stop data flow propagation |
isAdditionalTaintStep | isAdditionalFlowStep | Custom flow steps |
this.hasFlow(source, sink) | MyFlow::flow(source, sink) | Query flow paths |
Ruby-Specific Node Types
Ruby dataflow uses multiple node representations:
- •
ExprNode: AST expression nodes (method calls, literals) - •
ParameterNode: Method parameter nodes - •
CfgNodes::ExprCfgNode: Control-flow graph nodes (returned byasExpr()) - •
LocalSourceNode: Local sources for API graph analysis - •
RemoteFlowSource: Predefined sources for user-controllable input
Migration Workflow
Phase 1: Establish Test Baseline (TDD Foundation)
Critical: Before any code changes, capture current query behavior.
Step 1: Run Existing Tests
Use codeql_test_run to establish baseline:
{
"testPath": "<query-pack>/test/{QueryName}",
"searchPath": ["<query-pack>"]
}
Save the output - this is your reference for query result equivalence.
Step 2: Document Current Results
Create a reference file with current results:
cp <query-pack>/test/{QueryName}/{QueryName}.expected \
<query-pack>/test/{QueryName}/{QueryName}.expected.v1-baseline
This ensures you can verify equivalence after migration.
Phase 2: Analyze Current Query
Step 3: Identify v1 Patterns
Review the query for v1 API usage:
- •
class X extends DataFlow::Configuration - •
isSanitizerpredicates - •
isAdditionalTaintSteppredicates - •
this.hasFlow(source, sink)queries
Step 4: Understand Ruby-Specific Flow
Identify how the query uses Ruby dataflow constructs:
- •CFG node conversions (e.g.,
asExpr()returnsCfgNodes::ExprCfgNode) - •
RemoteFlowSourcefor user input (Railsparams, HTTP requests) - •API graphs for tracking gem/framework usage (
codeql.ruby.ApiGraphs) - •Ruby-specific sources:
ARGV,ENV, Rails parameters, HTTP requests - •Ruby-specific sinks:
eval,send,system, ActiveRecord queries
Phase 3: Migrate to v2 API
Step 5: Convert Configuration Class to Module
Before:
class CommandInjectionConfig extends TaintTracking::Configuration {
CommandInjectionConfig() { this = "CommandInjectionConfig" }
override predicate isSource(DataFlow::Node source) {
source instanceof RemoteFlowSource
}
override predicate isSink(DataFlow::Node sink) {
exists(DataFlow::CallNode call |
call.getMethodName() = "system" and
sink = call.getAnArgument()
)
}
override predicate isSanitizer(DataFlow::Node node) {
node = any(ShellquoteCall c).getResult()
}
}
from CommandInjectionConfig cfg, DataFlow::Node source, DataFlow::Node sink
where cfg.hasFlow(source, sink)
select sink, "Untrusted data flows to command execution"
After:
module CommandInjectionConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
source instanceof RemoteFlowSource
}
predicate isSink(DataFlow::Node sink) {
exists(DataFlow::CallNode call |
call.getMethodName() = "system" and
sink = call.getAnArgument()
)
}
predicate isBarrier(DataFlow::Node node) {
node = any(ShellquoteCall c).getResult()
}
}
module CommandInjectionFlow = TaintTracking::Global<CommandInjectionConfig>;
from DataFlow::Node source, DataFlow::Node sink
where CommandInjectionFlow::flow(source, sink)
select sink, "Untrusted data flows to command execution"
Step 6: Rename Predicates
- •
isSanitizer→isBarrier: Change method name only, logic unchanged - •
isAdditionalTaintStep→isAdditionalFlowStep: Change method name only
Step 7: Update Flow Queries
Replace cfg.hasFlow(source, sink) with MyFlow::flow(source, sink):
- •Remove configuration variable from
fromclause - •Use module flow predicate directly
Phase 4: Handle Ruby-Specific Migration Patterns
Step 8: Control-Flow Node Conversions
Ruby's asExpr() returns CfgNodes::ExprCfgNode, not AST nodes:
// v1 and v2 both support these conversions DataFlow::Node n; CfgNodes::ExprCfgNode cfgExpr = n.asExpr(); // CFG expression, not AST DataFlow::ParameterNode param = n.asParameter(); // Method parameter
To get AST nodes from CFG nodes:
Expr astExpr = cfgExpr.getExpr(); // Get underlying AST expression
Step 9: RemoteFlowSource Usage
RemoteFlowSource works identically in v1 and v2:
predicate isSource(DataFlow::Node source) {
source instanceof RemoteFlowSource or
// Rails parameters
source.asExpr().getExpr().(MethodCall).getMethodName() = "params" or
// Environment variables
exists(ConstantReadAccess env |
env.getExpr().(ConstRef).getName() = "ENV" and
source.asExpr().getExpr() = env.getAMethodCall()
) or
// Command line arguments
exists(ConstantReadAccess argv |
argv.getExpr().(ConstRef).getName() = "ARGV" and
source.asExpr().getExpr() = argv
)
}
Step 10: API Graph Integration
Use API graphs to track framework and gem usage. Example: Rails controller params via API::getTopLevelMember("ActionController").getReturn("Base")...getReturn("params").
Step 11: Rails-Specific Patterns
Track flows through Rails: ActiveRecord mass assignment (create, update), ActionView render (render with inline), hash access ([], fetch, dig).
Step 12: Metaprogramming Patterns
Track flows through dynamic features: send/public_send, define_method, const_get/const_set.
Phase 5: Validate Equivalence Through Testing
Step 13: Compile Migrated Query
Use codeql_query_compile to check for errors:
{
"queryPath": "<query-pack>/src/{QueryName}/{QueryName}.ql",
"searchPath": ["<query-pack>"]
}
Fix any compilation errors before testing.
Step 14: Run Tests and Compare Results
Use codeql_test_run on migrated query:
{
"testPath": "<query-pack>/test/{QueryName}",
"searchPath": ["<query-pack>"]
}
Critical: Results MUST match baseline from Phase 1.
Step 15: Verify Result Equivalence
Compare results line-by-line:
diff <query-pack>/test/{QueryName}/{QueryName}.expected.v1-baseline \
<query-pack>/test/{QueryName}/{QueryName}.expected
Success: Empty diff (identical results) Failure: Any differences require investigation and fixes
Phase 6: Expand Test Coverage (Optional)
Add test cases for Rails features, gems (Sinatra, Grape), metaprogramming, string interpolation, blocks/lambdas, and hash/array flows. For each: add test code, update .expected, extract with codeql_test_extract, run tests.
Phase 7: Performance Validation
Run query on realistic database. If performance degrades: cache expensive predicates, use local flow where possible, limit scope, optimize API graph queries.
Phase 8: Finalize Migration
Update query metadata, remove v1 baseline files, add migration notes if needed, format with codeql_query_format.
Ruby-Specific Dataflow Considerations
Rails Framework Sources
Track user input in Rails applications:
predicate isSource(DataFlow::Node source) {
// Controller parameters
exists(DataFlow::CallNode params |
params.getMethodName() = "params" and
source = params.getAMethodCall()
) or
// Request headers
exists(DataFlow::CallNode request |
request.getMethodName() = "request" and
source = request.getAMethodCall("headers")
) or
// Cookies
exists(DataFlow::CallNode cookies |
cookies.getMethodName() = "cookies" and
source = cookies.getAMethodCall()
)
}
ActiveRecord Sinks
Track dangerous database operations:
predicate isSink(DataFlow::Node sink) {
// Raw SQL execution
exists(DataFlow::CallNode query |
query.getMethodName() in ["find_by_sql", "execute", "exec_query"] and
sink = query.getAnArgument()
) or
// String interpolation in where clauses
exists(DataFlow::CallNode where |
where.getMethodName() = "where" and
exists(StringInterpolation interp |
sink.asExpr().getExpr() = interp and
interp = where.getArgument(0).asExpr().getExpr()
)
)
}
Code Execution via Metaprogramming
Track dynamic code execution:
predicate isSink(DataFlow::Node sink) {
// eval family
exists(DataFlow::CallNode evalCall |
evalCall.getMethodName() in ["eval", "instance_eval", "class_eval", "module_eval"] and
sink = evalCall.getArgument(0)
) or
// send with dynamic method names
exists(DataFlow::CallNode send |
send.getMethodName() in ["send", "public_send"] and
sink = send.getArgument(0)
) or
// define_method with dynamic names
exists(DataFlow::CallNode define |
define.getMethodName() = "define_method" and
sink = define.getArgument(0)
)
}
String Interpolation Flows
Track flows through string interpolation components and concatenation (AddExpr).
Block and Lambda Flows
Track flows through block parameters and lambda/proc creation with DataFlow::localFlow.
Gem-Specific Patterns
Sinatra: Route parameters via regexpMatch("^(get|post|put|delete|patch)$"), request object.
Rack: Middleware call method with env hash.
MCP Tools Reference
- •
codeql_test_run: Run tests and compare with expected results - •
codeql_test_extract: Extract test databases from Ruby source code - •
codeql_query_compile: Compile queries and check for errors - •
codeql_query_run: Run queries for analysis - •
codeql_bqrs_decode: Decode binary query results - •
codeql_query_format: Format query files for consistency - •
codeql_pack_install: Install query pack dependencies
Common Migration Pitfalls
❌ Don't:
- •Skip baseline test establishment before migration
- •Change query logic alongside API migration (separate concerns)
- •Accept test results without verifying equivalence
- •Remove v1 baseline until migration is confirmed successful
- •Ignore performance regressions
- •Forget to update imports if needed
- •Overlook Ruby-specific CFG node semantics (
asExpr()returns CFG nodes)
✅ Do:
- •Establish test baseline BEFORE any changes
- •Make purely mechanical API changes first
- •Verify exact result equivalence after migration
- •Keep v1 baseline for comparison during migration
- •Test edge cases specific to Ruby (metaprogramming, Rails, gems)
- •Document any intentional behavior changes separately
- •Understand difference between CFG nodes and AST nodes
Troubleshooting Non-Equivalent Results
If results differ after migration:
- •Check node type conversions: Ensure
asExpr()CFG semantics are correct - •Verify predicate renames: Confirm
isBarriervsisSanitizerlogic is identical - •Review flow predicates: Check
isAdditionalFlowStepmirrorsisAdditionalTaintStep - •Inspect CFG vs AST confusion: Use
.getExpr()on CFG nodes to get AST nodes - •Debug with partial flow: Use flow exploration to find missing edges
- •Check API graph usage: Ensure API graph predicates are correctly structured
Documentation References
- •New dataflow API for writing custom CodeQL queries - Official v2 API announcement
- •Analyzing data flow in Ruby - Ruby dataflow guide
- •CodeQL Ruby Library Reference - Standard library documentation
Related Resources
- •Create CodeQL Query TDD Generic - TDD workflow for queries
Success Criteria
Your dataflow migration is successful when:
- •✅ Test baseline established before migration
- •✅ Query compiles without errors using v2 API
- •✅ All configuration classes converted to modules
- •✅ All
isSanitizerrenamed toisBarrier - •✅ All
isAdditionalTaintSteprenamed toisAdditionalFlowStep - •✅ All
cfg.hasFlow()calls replaced with module flow predicates - •✅ Test results EXACTLY match v1 baseline (zero diff)
- •✅ No performance regressions
- •✅ Query metadata updated appropriately
- •✅ Ruby-specific patterns (metaprogramming, Rails, CFG nodes) handled correctly