-
Notifications
You must be signed in to change notification settings - Fork 273
Symex: resolve class-identifier comparisons #2122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Symex: resolve class-identifier comparisons #2122
Conversation
Ideally we would resolve comparisons like `x.@class_identifier == "A"` using our existing constant propagation framework. However, it only classifies a structured object as a constant if all its fields are constant -- there is no scope for a part-constant object, in the way that @class_identifier is initialised once and then cannot change subsequently. Therefore, instead we enforce the invariant that @class_identifier must match the actual type of the object `x`. This significantly improves symex processing virtual function dispatch, as it can determine when certain callees are unreachable, and so potentially vastly simplify the resulting formula.
A few remarks:
|
Possible courses of action:
My preference is to start with RTTI fields then go from there, since they're so easy to solve and so profitable when symex can resolve them. What do you think? |
This one is harder than #2121 and #2125 to make less specific without lots of extra work to implement part or all of field sensitivity, and deal with whatever complications that raises. I suggest permit the name |
Moving this to a private branch for now, as @kroening took the view this was too specific and the field sensitivity patch set should be used instead. Over to you on that bit @tautschnig as we don't have the bandwidth to revive and properly test the patchset in a reasonable timeframe. |
When a conditional change of control flow in goto-symex (i.e. gotos and assumes), if there is a pointer-typed symbol in the condition then try to filter its value set separately for each branch based on which values are actually possible on that branch. We only do this when there is only one pointer-typed symbol in the condition. The intention is to make the value-sets more accurate. In particular, this lays the groundwork for dealing with virtual method dispatch much more efficiently. See diffblue#2122 for an approach that was rejected. For example in java, code like `x.equals(y)`, where `x` could point to an Object or a String, gets turned into code like ``` if (x.@class_identifier == 'java.lang.Object') x . java.lang.Object.equals(y) else if (x.@class_identifier == 'java.lang.String') x . java.lang.String.equals(y) ``` When symex goes into java.lang.String.equals it doesn't filter the value-set so that `this` (which is `x`) only points to the String, not the Object. This causes symex to add complicated expressions to the formula for field accesses to fields that x won't have if it points to an Object.
When a conditional change of control flow in goto-symex (i.e. gotos and assumes), if there is a pointer-typed symbol in the condition then try to filter its value set separately for each branch based on which values are actually possible on that branch. We only do this when there is only one pointer-typed symbol in the condition. The intention is to make the value-sets more accurate. In particular, this lays the groundwork for dealing with virtual method dispatch much more efficiently. See diffblue#2122 for an approach that was rejected. For example in java, code like `x.equals(y)`, where `x` could point to an Object or a String, gets turned into code like ``` if (x.@class_identifier == 'java.lang.Object') x . java.lang.Object.equals(y) else if (x.@class_identifier == 'java.lang.String') x . java.lang.String.equals(y) ``` When symex goes into java.lang.String.equals it doesn't filter the value-set so that `this` (which is `x`) only points to the String, not the Object. This causes symex to add complicated expressions to the formula for field accesses to fields that x won't have if it points to an Object.
When a conditional change of control flow in goto-symex (i.e. gotos and assumes), if there is a pointer-typed symbol in the condition then try to filter its value set separately for each branch based on which values are actually possible on that branch. We only do this when there is only one pointer-typed symbol in the condition. The intention is to make the value-sets more accurate. In particular, this lays the groundwork for dealing with virtual method dispatch much more efficiently. See diffblue#2122 for an approach that was rejected. For example in java, code like `x.equals(y)`, where `x` could point to an Object or a String, gets turned into code like ``` if (x.@class_identifier == 'java.lang.Object') x . java.lang.Object.equals(y) else if (x.@class_identifier == 'java.lang.String') x . java.lang.String.equals(y) ``` When symex goes into java.lang.String.equals it doesn't filter the value-set so that `this` (which is `x`) only points to the String, not the Object. This causes symex to add complicated expressions to the formula for field accesses to fields that x won't have if it points to an Object.
When a conditional change of control flow in goto-symex (i.e. gotos and assumes), if there is a pointer-typed symbol in the condition then try to filter its value set separately for each branch based on which values are actually possible on that branch. We only do this when there is only one pointer-typed symbol in the condition. The intention is to make the value-sets more accurate. In particular, this lays the groundwork for dealing with virtual method dispatch much more efficiently. See diffblue#2122 for an approach that was rejected. For example in java, code like `x.equals(y)`, where `x` could point to an Object or a String, gets turned into code like ``` if (x.@class_identifier == 'java.lang.Object') x . java.lang.Object.equals(y) else if (x.@class_identifier == 'java.lang.String') x . java.lang.String.equals(y) ``` When symex goes into java.lang.String.equals it doesn't filter the value-set so that `this` (which is `x`) only points to the String, not the Object. This causes symex to add complicated expressions to the formula for field accesses to fields that x won't have if it points to an Object.
When a conditional change of control flow in goto-symex (i.e. gotos and assumes), if there is a pointer-typed symbol in the condition then try to filter its value set separately for each branch based on which values are actually possible on that branch. We only do this when there is only one pointer-typed symbol in the condition. The intention is to make the value-sets more accurate. In particular, this lays the groundwork for dealing with virtual method dispatch much more efficiently. See diffblue#2122 for an approach that was rejected. For example in java, code like `x.equals(y)`, where `x` could point to an Object or a String, gets turned into code like ``` if (x.@class_identifier == 'java.lang.Object') x . java.lang.Object.equals(y) else if (x.@class_identifier == 'java.lang.String') x . java.lang.String.equals(y) ``` When symex goes into java.lang.String.equals it doesn't filter the value-set so that `this` (which is `x`) only points to the String, not the Object. This causes symex to add complicated expressions to the formula for field accesses to fields that x won't have if it points to an Object.
When a conditional change of control flow in goto-symex (i.e. gotos and assumes), if there is a pointer-typed symbol in the condition then try to filter its value set separately for each branch based on which values are actually possible on that branch. We only do this when there is only one pointer-typed symbol in the condition. The intention is to make the value-sets more accurate. In particular, this lays the groundwork for dealing with virtual method dispatch much more efficiently. See diffblue#2122 for an approach that was rejected. For example in java, code like `x.equals(y)`, where `x` could point to an Object or a String, gets turned into code like ``` if (x.@class_identifier == 'java.lang.Object') x . java.lang.Object.equals(y) else if (x.@class_identifier == 'java.lang.String') x . java.lang.String.equals(y) ``` When symex goes into java.lang.String.equals it doesn't filter the value-set so that `this` (which is `x`) only points to the String, not the Object. This causes symex to add complicated expressions to the formula for field accesses to fields that x won't have if it points to an Object.
When a conditional change of control flow in goto-symex (i.e. gotos and assumes), if there is a pointer-typed symbol in the condition then try to filter its value set separately for each branch based on which values are actually possible on that branch. We only do this when there is only one pointer-typed symbol in the condition. The intention is to make the value-sets more accurate. In particular, this lays the groundwork for dealing with virtual method dispatch much more efficiently. See diffblue#2122 for an approach that was rejected. For example in java, code like `x.equals(y)`, where `x` could point to an Object or a String, gets turned into code like ``` if (x.@class_identifier == 'java.lang.Object') x . java.lang.Object.equals(y) else if (x.@class_identifier == 'java.lang.String') x . java.lang.String.equals(y) ``` When symex goes into java.lang.String.equals it doesn't filter the value-set so that `this` (which is `x`) only points to the String, not the Object. This causes symex to add complicated expressions to the formula for field accesses to fields that x won't have if it points to an Object.
When a conditional change of control flow in goto-symex (i.e. gotos and assumes), if there is a pointer-typed symbol in the condition then try to filter its value set separately for each branch based on which values are actually possible on that branch. We only do this when there is only one pointer-typed symbol in the condition. The intention is to make the value-sets more accurate. In particular, this lays the groundwork for dealing with virtual method dispatch much more efficiently. See diffblue#2122 for an approach that was rejected. For example in java, code like `x.equals(y)`, where `x` could point to an Object or a String, gets turned into code like ``` if (x.@class_identifier == 'java.lang.Object') x . java.lang.Object.equals(y) else if (x.@class_identifier == 'java.lang.String') x . java.lang.String.equals(y) ``` When symex goes into java.lang.String.equals it doesn't filter the value-set so that `this` (which is `x`) only points to the String, not the Object. This causes symex to add complicated expressions to the formula for field accesses to fields that x won't have if it points to an Object.
When a conditional change of control flow in goto-symex (i.e. gotos and assumes), if there is a pointer-typed symbol in the condition then try to filter its value set separately for each branch based on which values are actually possible on that branch. We only do this when there is only one pointer-typed symbol in the condition. The intention is to make the value-sets more accurate. In particular, this lays the groundwork for dealing with virtual method dispatch much more efficiently. See diffblue#2122 for an approach that was rejected. For example in java, code like `x.equals(y)`, where `x` could point to an Object or a String, gets turned into code like ``` if (x.@class_identifier == 'java.lang.Object') x . java.lang.Object.equals(y) else if (x.@class_identifier == 'java.lang.String') x . java.lang.String.equals(y) ``` When symex goes into java.lang.String.equals it doesn't filter the value-set so that `this` (which is `x`) only points to the String, not the Object. This causes symex to add complicated expressions to the formula for field accesses to fields that x won't have if it points to an Object.
When a conditional change of control flow in goto-symex (i.e. gotos and assumes), if there is a pointer-typed symbol in the condition then try to filter its value set separately for each branch based on which values are actually possible on that branch. We only do this when there is only one pointer-typed symbol in the condition. The intention is to make the value-sets more accurate. In particular, this lays the groundwork for dealing with virtual method dispatch much more efficiently. See #2122 for an approach that was rejected. For example in java, code like `x.equals(y)`, where `x` could point to an Object or a String, gets turned into code like ``` if (x.@class_identifier == 'java.lang.Object') x . java.lang.Object.equals(y) else if (x.@class_identifier == 'java.lang.String') x . java.lang.String.equals(y) ``` When symex goes into java.lang.String.equals it doesn't filter the value-set so that `this` (which is `x`) only points to the String, not the Object. This causes symex to add complicated expressions to the formula for field accesses to fields that x won't have if it points to an Object.
When a conditional change of control flow in goto-symex (i.e. gotos and assumes), if there is a pointer-typed symbol in the condition then try to filter its value set separately for each branch based on which values are actually possible on that branch. We only do this when there is only one pointer-typed symbol in the condition. The intention is to make the value-sets more accurate. In particular, this lays the groundwork for dealing with virtual method dispatch much more efficiently. See #2122 for an approach that was rejected. For example in java, code like `x.equals(y)`, where `x` could point to an Object or a String, gets turned into code like ``` if (x.@class_identifier == 'java.lang.Object') x . java.lang.Object.equals(y) else if (x.@class_identifier == 'java.lang.String') x . java.lang.String.equals(y) ``` When symex goes into java.lang.String.equals it doesn't filter the value-set so that `this` (which is `x`) only points to the String, not the Object. This causes symex to add complicated expressions to the formula for field accesses to fields that x won't have if it points to an Object.
When a conditional change of control flow in goto-symex (i.e. gotos and assumes), if there is a pointer-typed symbol in the condition then try to filter its value set separately for each branch based on which values are actually possible on that branch. We only do this when there is only one pointer-typed symbol in the condition. The intention is to make the value-sets more accurate. In particular, this lays the groundwork for dealing with virtual method dispatch much more efficiently. See diffblue#2122 for an approach that was rejected. For example in java, code like `x.equals(y)`, where `x` could point to an Object or a String, gets turned into code like ``` if (x.@class_identifier == 'java.lang.Object') x . java.lang.Object.equals(y) else if (x.@class_identifier == 'java.lang.String') x . java.lang.String.equals(y) ``` When symex goes into java.lang.String.equals it doesn't filter the value-set so that `this` (which is `x`) only points to the String, not the Object. This causes symex to add complicated expressions to the formula for field accesses to fields that x won't have if it points to an Object.
When a conditional change of control flow in goto-symex (i.e. gotos and assumes), if there is a pointer-typed symbol in the condition then try to filter its value set separately for each branch based on which values are actually possible on that branch. We only do this when there is only one pointer-typed symbol in the condition. The intention is to make the value-sets more accurate. In particular, this lays the groundwork for dealing with virtual method dispatch much more efficiently. See diffblue#2122 for an approach that was rejected. For example in java, code like `x.equals(y)`, where `x` could point to an Object or a String, gets turned into code like ``` if (x.@class_identifier == 'java.lang.Object') x . java.lang.Object.equals(y) else if (x.@class_identifier == 'java.lang.String') x . java.lang.String.equals(y) ``` When symex goes into java.lang.String.equals it doesn't filter the value-set so that `this` (which is `x`) only points to the String, not the Object. This causes symex to add complicated expressions to the formula for field accesses to fields that x won't have if it points to an Object.
When a conditional change of control flow in goto-symex (i.e. gotos and assumes), if there is a pointer-typed symbol in the condition then try to filter its value set separately for each branch based on which values are actually possible on that branch. We only do this when there is only one pointer-typed symbol in the condition. The intention is to make the value-sets more accurate. In particular, this lays the groundwork for dealing with virtual method dispatch much more efficiently. See diffblue#2122 for an approach that was rejected. For example in java, code like `x.equals(y)`, where `x` could point to an Object or a String, gets turned into code like ``` if (x.@class_identifier == 'java.lang.Object') x . java.lang.Object.equals(y) else if (x.@class_identifier == 'java.lang.String') x . java.lang.String.equals(y) ``` When symex goes into java.lang.String.equals it doesn't filter the value-set so that `this` (which is `x`) only points to the String, not the Object. This causes symex to add complicated expressions to the formula for field accesses to fields that x won't have if it points to an Object.
When a conditional change of control flow in goto-symex (i.e. gotos and assumes), if there is a pointer-typed symbol in the condition then try to filter its value set separately for each branch based on which values are actually possible on that branch. We only do this when there is only one pointer-typed symbol in the condition. The intention is to make the value-sets more accurate. In particular, this lays the groundwork for dealing with virtual method dispatch much more efficiently. See diffblue#2122 for an approach that was rejected. For example in java, code like `x.equals(y)`, where `x` could point to an Object or a String, gets turned into code like ``` if (x.@class_identifier == 'java.lang.Object') x . java.lang.Object.equals(y) else if (x.@class_identifier == 'java.lang.String') x . java.lang.String.equals(y) ``` When symex goes into java.lang.String.equals it doesn't filter the value-set so that `this` (which is `x`) only points to the String, not the Object. This causes symex to add complicated expressions to the formula for field accesses to fields that x won't have if it points to an Object.
When a conditional change of control flow in goto-symex (i.e. gotos and assumes), if there is a pointer-typed symbol in the condition then try to filter its value set separately for each branch based on which values are actually possible on that branch. We only do this when there is only one pointer-typed symbol in the condition. The intention is to make the value-sets more accurate. In particular, this lays the groundwork for dealing with virtual method dispatch much more efficiently. See diffblue#2122 for an approach that was rejected. For example in java, code like `x.equals(y)`, where `x` could point to an Object or a String, gets turned into code like ``` if (x.@class_identifier == 'java.lang.Object') x . java.lang.Object.equals(y) else if (x.@class_identifier == 'java.lang.String') x . java.lang.String.equals(y) ``` When symex goes into java.lang.String.equals it doesn't filter the value-set so that `this` (which is `x`) only points to the String, not the Object. This causes symex to add complicated expressions to the formula for field accesses to fields that x won't have if it points to an Object.
When a conditional change of control flow in goto-symex (i.e. gotos and assumes), if there is a pointer-typed symbol in the condition then try to filter its value set separately for each branch based on which values are actually possible on that branch. We only do this when there is only one pointer-typed symbol in the condition. The intention is to make the value-sets more accurate. In particular, this lays the groundwork for dealing with virtual method dispatch much more efficiently. See diffblue#2122 for an approach that was rejected. For example in java, code like `x.equals(y)`, where `x` could point to an Object or a String, gets turned into code like ``` if (x.@class_identifier == 'java.lang.Object') x . java.lang.Object.equals(y) else if (x.@class_identifier == 'java.lang.String') x . java.lang.String.equals(y) ``` When symex goes into java.lang.String.equals it doesn't filter the value-set so that `this` (which is `x`) only points to the String, not the Object. This causes symex to add complicated expressions to the formula for field accesses to fields that x won't have if it points to an Object.
When a conditional change of control flow in goto-symex (i.e. gotos and assumes), if there is a pointer-typed symbol in the condition then try to filter its value set separately for each branch based on which values are actually possible on that branch. We only do this when there is only one pointer-typed symbol in the condition. The intention is to make the value-sets more accurate. In particular, this lays the groundwork for dealing with virtual method dispatch much more efficiently. See diffblue#2122 for an approach that was rejected. For example in java, code like `x.equals(y)`, where `x` could point to an Object or a String, gets turned into code like ``` if (x.@class_identifier == 'java.lang.Object') x . java.lang.Object.equals(y) else if (x.@class_identifier == 'java.lang.String') x . java.lang.String.equals(y) ``` When symex goes into java.lang.String.equals it doesn't filter the value-set so that `this` (which is `x`) only points to the String, not the Object. This causes symex to add complicated expressions to the formula for field accesses to fields that x won't have if it points to an Object.
When a conditional change of control flow in goto-symex (i.e. gotos and assumes), if there is a pointer-typed symbol in the condition then try to filter its value set separately for each branch based on which values are actually possible on that branch. We only do this when there is only one pointer-typed symbol in the condition. The intention is to make the value-sets more accurate. In particular, this lays the groundwork for dealing with virtual method dispatch much more efficiently. See diffblue#2122 for an approach that was rejected. For example in java, code like `x.equals(y)`, where `x` could point to an Object or a String, gets turned into code like ``` if (x.@class_identifier == 'java.lang.Object') x . java.lang.Object.equals(y) else if (x.@class_identifier == 'java.lang.String') x . java.lang.String.equals(y) ``` When symex goes into java.lang.String.equals it doesn't filter the value-set so that `this` (which is `x`) only points to the String, not the Object. This causes symex to add complicated expressions to the formula for field accesses to fields that x won't have if it points to an Object.
When a conditional change of control flow in goto-symex (i.e. gotos and assumes), if there is a pointer-typed symbol in the condition then try to filter its value set separately for each branch based on which values are actually possible on that branch. We only do this when there is only one pointer-typed symbol in the condition. The intention is to make the value-sets more accurate. In particular, this lays the groundwork for dealing with virtual method dispatch much more efficiently. See diffblue#2122 for an approach that was rejected. For example in java, code like `x.equals(y)`, where `x` could point to an Object or a String, gets turned into code like ``` if (x.@class_identifier == 'java.lang.Object') x . java.lang.Object.equals(y) else if (x.@class_identifier == 'java.lang.String') x . java.lang.String.equals(y) ``` When symex goes into java.lang.String.equals it doesn't filter the value-set so that `this` (which is `x`) only points to the String, not the Object. This causes symex to add complicated expressions to the formula for field accesses to fields that x won't have if it points to an Object.
Ideally we would resolve comparisons like
x.@class_identifier == "A"
using our existing constantpropagation framework. However, it only classifies a structured object as a constant if all its fields
are constant -- there is no scope for a part-constant object, in the way that @class_identifier is
initialised once and then cannot change subsequently.
Therefore, instead we enforce the invariant that @class_identifier must match the actual type of the
object
x
.This significantly improves symex processing virtual function dispatch, as it can determine when certain
callees are unreachable, and so potentially vastly simplify the resulting formula.
This needs tests writing, but adding early for comment. I don't expect this to get in exactly as written, but some avenue to provide this sort of information is needed.