Skip to content

Symex: don't phi-merge Java dynamic objects with uninitialised data #2121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

smowton
Copy link
Contributor

@smowton smowton commented Apr 26, 2018

Previously given code like A a = null; if(x) a = new A(); // PHI, at the merge point marked PHI
symex would merge two versions of the newly allocated dynamic object: the one initialised by the A()
constructor, and the one carrying uninitialised data (an unbound symbol with a name like dynamic_object1#0)

This means all subsequent accesses to the dynamic object would operate on x ? dynamic_object1#2 : dynamic_object1#0,
stymying future constant propagation and expression simplification.

By observing that in Java having a pointer to an object implies it has been initialised, we can safely throw away the dynamic_object1#0 case and thus improve constant propagation.

This needs tests writing, but adding early for comment.

Previously given code like "A a = null; if(x) a = new A(); // PHI", at the merge point marked PHI
symex would merge two versions of the newly allocated dynamic object: the one initialised by the A()
constructor, and the one carrying uninitialised data (an unbound symbol with a name like dynamic_object1#0)

This means all subsequent accesses to the dynamic object would operate on `x ? dynamic_object1#2 : dynamic_object1#0`,
stymying future constant propagation and expression simplification.

By observing that in Java having a pointer to an object implies it has been initialised, we can safely throw away
the dynamic_object1#0 case and thus improve constant propagation.
@tautschnig
Copy link
Collaborator

Is this redundant with #2125?

@smowton
Copy link
Contributor Author

smowton commented Apr 27, 2018

No, the two problems are distinct. The null-pointer-might-be-accessed situation results in a pessimistic assumption about the pointer (ptr == &dynamic_object1 ? dynamic_object1.accessed_field : ptr$invalid_object.accessed_field), whereas the phi-merge-of-version-0 problem introduces uncertainty about the value of dynamic_object1.accessed_field itself by merging dynamic_object1#1, which is initialised, with dynamic_object1#0, which isn't (introducing dynamic_object1#2 == guard ? dynamic_object1#1 : dynamic_object1#0). Thus even if we were certain about the value of ptr an access to ptr->accessed_field would not be constant propagated.

It is however related to #2125 in that both rely on a safe memory model to prevent me from accessing the memory associated with dynamic_object1 when it hasn't been initialised.

@smowton
Copy link
Contributor Author

smowton commented Apr 27, 2018

To phrase that differently, in the program

A a = null;
if(nondet)
  a = new A(1);
// PHI

if(a != null)
  return a.somefield;

There are two things preventing symex from propagating 1 to the return value. Symex thinks there are two possible paths: one where new A was called, a is non-null, and the return value is 1, and one where new A was not called, a is null, and the return value is nondet.

#2125 says that because the access is checked, at the line return a.somefield we can safely assume a != null, and this PR says that we can disregard the dynamic-object-is-uninitialised path from consideration at the phi node labelled // PHI, because ability to access the object implies it has been initialised.

@smowton
Copy link
Contributor Author

smowton commented May 1, 2018

@tautschnig possible idea for removing language-specificity in Symex: the ID_allocate side-effect expression already has its zero-init attribute. Even if we intend to overwrite it immediately, we could just set that to true for Java allocations, then symex would copy the attribute onto the dynamic object it creates.

Then the test wouldn't be "is this Java," it would be "does the object have ID_C_always_initialized?"

@smowton
Copy link
Contributor Author

smowton commented May 3, 2018

Discussed in person with @kroening: the agreed solution is for symex to ensure some explicit nondet value is written on malloc, similar to what is already done for code_declt, then we can always assume that a particular identifier being of generation 0 implies it doesn't exist at all on that path, and avoid merging it (i.e. remove the limitation to Java code)

@smowton
Copy link
Contributor Author

smowton commented May 3, 2018

Note on that: check that array allocations, in C, C++ and Java, are all subject to this rule (that some nondet init happens if there isn't an explicit zero init)

@smowton
Copy link
Contributor Author

smowton commented May 9, 2018

Replaced by #2168

@smowton smowton closed this May 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants