Skip to content

Commit d6fd43f

Browse files
authored
Merge pull request #9364 from asgerf/ruby/api-graph-api
Ruby: API graph renaming an documentation
2 parents 861a368 + a1af9c3 commit d6fd43f

File tree

18 files changed

+200
-60
lines changed

18 files changed

+200
-60
lines changed

ruby/ql/lib/codeql/ruby/ApiGraphs.qll

Lines changed: 126 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -19,48 +19,154 @@ private import codeql.ruby.dataflow.internal.DataFlowDispatch as DataFlowDispatc
1919
*/
2020
module API {
2121
/**
22-
* An abstract representation of a definition or use of an API component such as a Ruby module,
23-
* or the result of a method call.
22+
* A node in the API graph, representing a value that has crossed the boundary between this
23+
* codebase and an external library (or in general, any external codebase).
24+
*
25+
* ### Basic usage
26+
*
27+
* API graphs are typically used to identify "API calls", that is, calls to an external function
28+
* whose implementation is not necessarily part of the current codebase.
29+
*
30+
* The most basic use of API graphs is typically as follows:
31+
* 1. Start with `API::getTopLevelMember` for the relevant library.
32+
* 2. Follow up with a chain of accessors such as `getMethod` describing how to get to the relevant API function.
33+
* 3. Map the resulting API graph nodes to data-flow nodes, using `asSource` or `asSink`.
34+
*
35+
* For example, a simplified way to get arguments to `Foo.bar` would be
36+
* ```codeql
37+
* API::getTopLevelMember("Foo").getMethod("bar").getParameter(0).asSink()
38+
* ```
39+
*
40+
* The most commonly used accessors are `getMember`, `getMethod`, `getParameter`, and `getReturn`.
41+
*
42+
* ### API graph nodes
43+
*
44+
* There are two kinds of nodes in the API graphs, distinguished by who is "holding" the value:
45+
* - **Use-nodes** represent values held by the current codebase, which came from an external library.
46+
* (The current codebase is "using" a value that came from the library).
47+
* - **Def-nodes** represent values held by the external library, which came from this codebase.
48+
* (The current codebase "defines" the value seen by the library).
49+
*
50+
* API graph nodes are associated with data-flow nodes in the current codebase.
51+
* (Since external libraries are not part of the database, there is no way to associate with concrete
52+
* data-flow nodes from the external library).
53+
* - **Use-nodes** are associated with data-flow nodes where a value enters the current codebase,
54+
* such as the return value of a call to an external function.
55+
* - **Def-nodes** are associated with data-flow nodes where a value leaves the current codebase,
56+
* such as an argument passed in a call to an external function.
57+
*
58+
*
59+
* ### Access paths and edge labels
60+
*
61+
* Nodes in the API graph are associated with a set of access paths, describing a series of operations
62+
* that may be performed to obtain that value.
63+
*
64+
* For example, the access path `API::getTopLevelMember("Foo").getMethod("bar")` represents the action of
65+
* reading the top-level constant `Foo` and then accessing the method `bar` on the resulting object.
66+
* It would be associated with a call such as `Foo.bar()`.
67+
*
68+
* Each edge in the graph is labelled by such an "operation". For an edge `A->B`, the type of the `A` node
69+
* determines who is performing the operation, and the type of the `B` node determines who ends up holding
70+
* the result:
71+
* - An edge starting from a use-node describes what the current codebase is doing to a value that
72+
* came from a library.
73+
* - An edge starting from a def-node describes what the external library might do to a value that
74+
* came from the current codebase.
75+
* - An edge ending in a use-node means the result ends up in the current codebase (at its associated data-flow node).
76+
* - An edge ending in a def-node means the result ends up in external code (its associated data-flow node is
77+
* the place where it was "last seen" in the current codebase before flowing out)
78+
*
79+
* Because the implementation of the external library is not visible, it is not known exactly what operations
80+
* it will perform on values that flow there. Instead, the edges starting from a def-node are operations that would
81+
* lead to an observable effect within the current codebase; without knowing for certain if the library will actually perform
82+
* those operations. (When constructing these edges, we assume the library is somewhat well-behaved).
83+
*
84+
* For example, given this snippet:
85+
* ```ruby
86+
* Foo.bar(->(x) { doSomething(x) })
87+
* ```
88+
* A callback is passed to the external function `Foo.bar`. We can't know if `Foo.bar` will actually invoke this callback.
89+
* But _if_ the library should decide to invoke the callback, then a value will flow into the current codebase via the `x` parameter.
90+
* For that reason, an edge is generated representing the argument-passing operation that might be performed by `Foo.bar`.
91+
* This edge is going from the def-node associated with the callback to the use-node associated with the parameter `x` of the lambda.
2492
*/
2593
class Node extends Impl::TApiNode {
2694
/**
27-
* Gets a data-flow node corresponding to a use of the API component represented by this node.
95+
* Gets a data-flow node where this value may flow after entering the current codebase.
2896
*
29-
* For example, `Kernel.format "%s world!", "Hello"` is a use of the return of the `format` function of
30-
* the `Kernel` module.
31-
*
32-
* This includes indirect uses found via data flow.
97+
* This is similar to `asSource()` but additionally includes nodes that are transitively reachable by data flow.
98+
* See `asSource()` for examples.
3399
*/
34-
DataFlow::Node getAUse() {
100+
DataFlow::Node getAValueReachableFromSource() {
35101
exists(DataFlow::LocalSourceNode src | Impl::use(this, src) |
36102
Impl::trackUseNode(src).flowsTo(result)
37103
)
38104
}
39105

40106
/**
41-
* Gets an immediate use of the API component represented by this node.
107+
* Gets a data-flow node where this value enters the current codebase.
108+
*
109+
* For example:
110+
* ```ruby
111+
* # API::getTopLevelMember("Foo").asSource()
112+
* Foo
42113
*
43-
* Unlike `getAUse()`, this predicate only gets the immediate references, not the indirect uses
44-
* found via data flow.
114+
* # API::getTopLevelMember("Foo").getMethod("bar").getReturn().asSource()
115+
* Foo.bar
116+
*
117+
* # 'x' is found by:
118+
* # API::getTopLevelMember("Foo").getMethod("bar").getBlock().getParameter(0).asSource()
119+
* Foo.bar do |x|
120+
* end
121+
* ```
45122
*/
46-
DataFlow::LocalSourceNode getAnImmediateUse() { Impl::use(this, result) }
123+
DataFlow::LocalSourceNode asSource() { Impl::use(this, result) }
47124

48125
/**
49-
* Gets a data-flow node corresponding the value flowing into this API component.
126+
* Gets a data-flow node where this value leaves the current codebase and flows into an
127+
* external library (or in general, any external codebase).
128+
*
129+
* Concretely, this corresponds to an argument passed to a call to external code.
130+
*
131+
* For example:
132+
* ```ruby
133+
* # 'x' is found by:
134+
* # API::getTopLevelMember("Foo").getMethod("bar").getParameter(0).asSink()
135+
* Foo.bar(x)
136+
*
137+
* Foo.bar(-> {
138+
* # 'x' is found by:
139+
* # API::getTopLevelMember("Foo").getMethod("bar").getParameter(0).getReturn().asSink()
140+
* x
141+
* })
142+
* ```
50143
*/
51-
DataFlow::Node getARhs() { Impl::def(this, result) }
144+
DataFlow::Node asSink() { Impl::def(this, result) }
52145

53146
/**
54-
* Gets a data-flow node that may interprocedurally flow to the value escaping into this API component.
147+
* Get a data-flow node that transitively flows to an external library (or in general, any external codebase).
148+
*
149+
* This is similar to `asSink()` but additionally includes nodes that transitively reach a sink by data flow.
150+
* See `asSink()` for examples.
55151
*/
56-
DataFlow::Node getAValueReachingRhs() { result = Impl::trackDefNode(this.getARhs()) }
152+
DataFlow::Node getAValueReachingSink() { result = Impl::trackDefNode(this.asSink()) }
153+
154+
/** DEPRECATED. This predicate has been renamed to `getAValueReachableFromSource()`. */
155+
deprecated DataFlow::Node getAUse() { result = this.getAValueReachableFromSource() }
156+
157+
/** DEPRECATED. This predicate has been renamed to `asSource()`. */
158+
deprecated DataFlow::LocalSourceNode getAnImmediateUse() { result = this.asSource() }
159+
160+
/** DEPRECATED. This predicate has been renamed to `asSink()`. */
161+
deprecated DataFlow::Node getARhs() { result = this.asSink() }
162+
163+
/** DEPRECATED. This predicate has been renamed to `getAValueReachingSink()`. */
164+
deprecated DataFlow::Node getAValueReachingRhs() { result = this.getAValueReachingSink() }
57165

58166
/**
59167
* Gets a call to a method on the receiver represented by this API component.
60168
*/
61-
DataFlow::CallNode getAMethodCall(string method) {
62-
result = this.getReturn(method).getAnImmediateUse()
63-
}
169+
DataFlow::CallNode getAMethodCall(string method) { result = this.getReturn(method).asSource() }
64170

65171
/**
66172
* Gets a node representing member `m` of this API component.
@@ -135,7 +241,7 @@ module API {
135241
/**
136242
* Gets a `new` call to the function represented by this API component.
137243
*/
138-
DataFlow::ExprNode getAnInstantiation() { result = this.getInstance().getAnImmediateUse() }
244+
DataFlow::ExprNode getAnInstantiation() { result = this.getInstance().asSource() }
139245

140246
/**
141247
* Gets a node representing a (direct or indirect) subclass of the class represented by this node.

ruby/ql/lib/codeql/ruby/frameworks/ActionController.qll

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ class ActionControllerControllerClass extends ClassDeclaration {
3333
// In Rails applications `ApplicationController` typically extends `ActionController::Base`, but we
3434
// treat it separately in case the `ApplicationController` definition is not in the database.
3535
API::getTopLevelMember("ApplicationController")
36-
].getASubclass().getAUse().asExpr().getExpr()
36+
].getASubclass().getAValueReachableFromSource().asExpr().getExpr()
3737
}
3838

3939
/**

ruby/ql/lib/codeql/ruby/frameworks/ActiveRecord.qll

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ class ActiveRecordModelClass extends ClassDeclaration {
5454
// In Rails applications `ApplicationRecord` typically extends `ActiveRecord::Base`, but we
5555
// treat it separately in case the `ApplicationRecord` definition is not in the database.
5656
API::getTopLevelMember("ApplicationRecord")
57-
].getASubclass().getAUse().asExpr().getExpr()
57+
].getASubclass().getAValueReachableFromSource().asExpr().getExpr()
5858
}
5959

6060
// Gets the class declaration for this class and all of its super classes

ruby/ql/lib/codeql/ruby/frameworks/GraphQL.qll

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,12 @@ private API::Node graphQlSchema() { result = API::getTopLevelMember("GraphQL").g
4141
private class GraphqlRelayClassicMutationClass extends ClassDeclaration {
4242
GraphqlRelayClassicMutationClass() {
4343
this.getSuperclassExpr() =
44-
graphQlSchema().getMember("RelayClassicMutation").getASubclass*().getAUse().asExpr().getExpr()
44+
graphQlSchema()
45+
.getMember("RelayClassicMutation")
46+
.getASubclass*()
47+
.getAValueReachableFromSource()
48+
.asExpr()
49+
.getExpr()
4550
}
4651
}
4752

@@ -71,7 +76,12 @@ private class GraphqlRelayClassicMutationClass extends ClassDeclaration {
7176
private class GraphqlSchemaResolverClass extends ClassDeclaration {
7277
GraphqlSchemaResolverClass() {
7378
this.getSuperclassExpr() =
74-
graphQlSchema().getMember("Resolver").getASubclass().getAUse().asExpr().getExpr()
79+
graphQlSchema()
80+
.getMember("Resolver")
81+
.getASubclass()
82+
.getAValueReachableFromSource()
83+
.asExpr()
84+
.getExpr()
7585
}
7686
}
7787

@@ -92,7 +102,12 @@ private class GraphqlSchemaResolverClass extends ClassDeclaration {
92102
class GraphqlSchemaObjectClass extends ClassDeclaration {
93103
GraphqlSchemaObjectClass() {
94104
this.getSuperclassExpr() =
95-
graphQlSchema().getMember("Object").getASubclass().getAUse().asExpr().getExpr()
105+
graphQlSchema()
106+
.getMember("Object")
107+
.getASubclass()
108+
.getAValueReachableFromSource()
109+
.asExpr()
110+
.getExpr()
96111
}
97112

98113
/** Gets a `GraphqlFieldDefinitionMethodCall` called in this class. */

ruby/ql/lib/codeql/ruby/frameworks/Rails.qll

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -63,11 +63,7 @@ private module Config {
6363
)
6464
or
6565
// `Rails.application.config`
66-
this =
67-
API::getTopLevelMember("Rails")
68-
.getReturn("application")
69-
.getReturn("config")
70-
.getAnImmediateUse()
66+
this = API::getTopLevelMember("Rails").getReturn("application").getReturn("config").asSource()
7167
or
7268
// `Rails.application.configure { ... config ... }`
7369
// `Rails::Application.configure { ... config ... }`

ruby/ql/lib/codeql/ruby/frameworks/XmlParsing.qll

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,7 @@ private DataFlow::LocalSourceNode trackFeature(Feature f, boolean enable, TypeTr
143143
or
144144
// Use of a constant f
145145
enable = true and
146-
result = parseOptionsModule().getMember(f.getConstantName()).getAUse()
146+
result = parseOptionsModule().getMember(f.getConstantName()).getAValueReachableFromSource()
147147
or
148148
// Treat `&`, `&=`, `|` and `|=` operators as if they preserve the on/off states
149149
// of their operands. This is an overapproximation but likely to work well in practice

ruby/ql/lib/codeql/ruby/frameworks/core/Hash.qll

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,8 @@ module Hash {
9999
HashNewSummary() { this = "Hash[]" }
100100

101101
final override ElementReference getACall() {
102-
result.getReceiver() = API::getTopLevelMember("Hash").getAUse().asExpr().getExpr() and
102+
result.getReceiver() =
103+
API::getTopLevelMember("Hash").getAValueReachableFromSource().asExpr().getExpr() and
103104
result.getNumberOfArguments() = 1
104105
}
105106

@@ -138,7 +139,8 @@ module Hash {
138139
}
139140

140141
final override ElementReference getACall() {
141-
result.getReceiver() = API::getTopLevelMember("Hash").getAUse().asExpr().getExpr() and
142+
result.getReceiver() =
143+
API::getTopLevelMember("Hash").getAValueReachableFromSource().asExpr().getExpr() and
142144
key = result.getArgument(i - 1).getConstantValue() and
143145
exists(result.getArgument(i))
144146
}

ruby/ql/lib/codeql/ruby/frameworks/data/ModelsAsData.qll

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ private import codeql.ruby.dataflow.RemoteFlowSources
2525
* A remote flow source originating from a CSV source row.
2626
*/
2727
private class RemoteFlowSourceFromCsv extends RemoteFlowSource::Range {
28-
RemoteFlowSourceFromCsv() { this = ModelOutput::getASourceNode("remote").getAnImmediateUse() }
28+
RemoteFlowSourceFromCsv() { this = ModelOutput::getASourceNode("remote").asSource() }
2929

3030
override string getSourceType() { result = "Remote flow (from model)" }
3131
}

ruby/ql/lib/codeql/ruby/frameworks/http_clients/Excon.qll

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,8 @@ class ExconHttpRequest extends HTTP::Client::Request::Range {
3030
DataFlow::Node connectionUse;
3131

3232
ExconHttpRequest() {
33-
requestUse = requestNode.getAnImmediateUse() and
34-
connectionUse = connectionNode.getAnImmediateUse() and
33+
requestUse = requestNode.asSource() and
34+
connectionUse = connectionNode.asSource() and
3535
connectionNode =
3636
[
3737
// one-off requests
@@ -66,7 +66,8 @@ class ExconHttpRequest extends HTTP::Client::Request::Range {
6666
override predicate disablesCertificateValidation(DataFlow::Node disablingNode) {
6767
// Check for `ssl_verify_peer: false` in the options hash.
6868
exists(DataFlow::Node arg, int i |
69-
i > 0 and arg = connectionNode.getAUse().(DataFlow::CallNode).getArgument(i)
69+
i > 0 and
70+
arg = connectionNode.getAValueReachableFromSource().(DataFlow::CallNode).getArgument(i)
7071
|
7172
argSetsVerifyPeer(arg, false, disablingNode)
7273
)
@@ -79,7 +80,8 @@ class ExconHttpRequest extends HTTP::Client::Request::Range {
7980
disableCall.asExpr().getASuccessor+() = requestUse.asExpr() and
8081
disablingNode = disableCall and
8182
not exists(DataFlow::Node arg, int i |
82-
i > 0 and arg = connectionNode.getAUse().(DataFlow::CallNode).getArgument(i)
83+
i > 0 and
84+
arg = connectionNode.getAValueReachableFromSource().(DataFlow::CallNode).getArgument(i)
8385
|
8486
argSetsVerifyPeer(arg, true, _)
8587
)

ruby/ql/lib/codeql/ruby/frameworks/http_clients/Faraday.qll

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -38,8 +38,8 @@ class FaradayHttpRequest extends HTTP::Client::Request::Range {
3838
] and
3939
requestNode =
4040
connectionNode.getReturn(["get", "head", "delete", "post", "put", "patch", "trace"]) and
41-
requestUse = requestNode.getAnImmediateUse() and
42-
connectionUse = connectionNode.getAnImmediateUse() and
41+
requestUse = requestNode.asSource() and
42+
connectionUse = connectionNode.asSource() and
4343
this = requestUse.asExpr().getExpr()
4444
}
4545

@@ -58,7 +58,8 @@ class FaradayHttpRequest extends HTTP::Client::Request::Range {
5858
// or
5959
// `{ ssl: { verify_mode: OpenSSL::SSL::VERIFY_NONE } }`
6060
exists(DataFlow::Node arg, int i |
61-
i > 0 and arg = connectionNode.getAUse().(DataFlow::CallNode).getArgument(i)
61+
i > 0 and
62+
arg = connectionNode.getAValueReachableFromSource().(DataFlow::CallNode).getArgument(i)
6263
|
6364
// Either passed as an individual key:value argument, e.g.:
6465
// Faraday.new(..., ssl: {...})
@@ -132,7 +133,11 @@ private predicate isVerifyModeNonePair(CfgNodes::ExprNodes::PairCfgNode p) {
132133
key.asExpr() = p.getKey() and
133134
value.asExpr() = p.getValue() and
134135
isSymbolLiteral(key, "verify_mode") and
135-
value = API::getTopLevelMember("OpenSSL").getMember("SSL").getMember("VERIFY_NONE").getAUse()
136+
value =
137+
API::getTopLevelMember("OpenSSL")
138+
.getMember("SSL")
139+
.getMember("VERIFY_NONE")
140+
.getAValueReachableFromSource()
136141
)
137142
}
138143

ruby/ql/lib/codeql/ruby/frameworks/http_clients/HttpClient.qll

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ class HttpClientRequest extends HTTP::Client::Request::Range {
2929
API::getTopLevelMember("HTTPClient").getInstance()
3030
] and
3131
requestNode = connectionNode.getReturn(method) and
32-
requestUse = requestNode.getAnImmediateUse() and
32+
requestUse = requestNode.asSource() and
3333
method in [
3434
"get", "head", "delete", "options", "post", "put", "trace", "get_content", "post_content"
3535
] and
@@ -52,10 +52,12 @@ class HttpClientRequest extends HTTP::Client::Request::Range {
5252
// Look for calls to set
5353
// `c.ssl_config.verify_mode = OpenSSL::SSL::VERIFY_NONE`
5454
// on an HTTPClient connection object `c`.
55-
disablingNode =
56-
connectionNode.getReturn("ssl_config").getReturn("verify_mode=").getAnImmediateUse() and
55+
disablingNode = connectionNode.getReturn("ssl_config").getReturn("verify_mode=").asSource() and
5756
disablingNode.(DataFlow::CallNode).getArgument(0) =
58-
API::getTopLevelMember("OpenSSL").getMember("SSL").getMember("VERIFY_NONE").getAUse()
57+
API::getTopLevelMember("OpenSSL")
58+
.getMember("SSL")
59+
.getMember("VERIFY_NONE")
60+
.getAValueReachableFromSource()
5961
}
6062

6163
override string getFramework() { result = "HTTPClient" }

ruby/ql/lib/codeql/ruby/frameworks/http_clients/Httparty.qll

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ class HttpartyRequest extends HTTP::Client::Request::Range {
2828
DataFlow::CallNode requestUse;
2929

3030
HttpartyRequest() {
31-
requestUse = requestNode.getAnImmediateUse() and
31+
requestUse = requestNode.asSource() and
3232
requestNode =
3333
API::getTopLevelMember("HTTParty")
3434
.getReturn(["get", "head", "delete", "options", "post", "put", "patch"]) and

0 commit comments

Comments
 (0)