Base-type-eq: check pointer types really match [blocks: #4023] #4020

smowton · 2019-02-01T16:07:25Z

Generally base-type-eq's task is to ensure that types match up to substitution of a type's body for a symbol referring to that type. However now that derivatives of pointers exist with named subexpressions (e.g. java_generic_parametert), it is now possible for types not involving symbols to be base_type_eq but not ==. This resolves that particular case by checking that the names subexpressions match exactly.

Generally base-type-eq's task is to ensure that types match up to substitution of a type's body for a symbol referring to that type. However now that derivatives of pointers exist with named subexpressions (e.g. java_generic_parametert), it is now possible for types not involving symbols to be `base_type_eq` but not `==`. This resolves that particular case by checking that the names subexpressions match exactly.

tautschnig · 2019-02-01T16:36:53Z

src/util/base_type.cpp

+    const auto &named_subs1 = type1.get_named_sub();
+    const auto &named_subs2 = type2.get_named_sub();
+
+    for(const auto &name_and_sub : named_subs1)


Please use two iterators over the two collections, just like irept::operator== does. Using find is expensive and unnecessary as you can do this with a linear algorithm.

Before it would unintentionally erase SomeClass.field from A<B> to just the unqualified A type.

… just to compare This makes the invariant pass, but this surely warrants more attention -- type consistency is difficult to achieve when pointers are qualified and dereferencing and then taking the address again doesn't retain that information (but I don't know the logic behind the design decision in enough detail to change it)

smowton · 2019-02-01T17:39:11Z

Haven't done that yet, but did notice a problem with this check exposing two other inconsistency problems, which I've pushed commits to fix. One is easy (static fields in the Java frontend) and the other is hard (symex-dereference assuming that you can strip a pointer type and add it back with a plain pointer_type(t) and get the same thing back (in fact that discards generic qualifiers).

I have a feeling my "fix" is really just weakening the test and actual inconsistency will still occur, but I'm not sure what to do about it yet. It looks an awful lot like encoding generics on the pointer rather than the struct type is the root of the problem, but I don't understand enough about the original design decision to try to change that (yet). @thk123 @allredj do you have any comment?

thk123 · 2019-02-01T17:49:45Z

@smowton the problem with putting in the struct is the struct/struct_tag_type, most of the time we have the latter, but then you end up in the weird situation where:

type has generic info
ns.follow(type) doesn't have generic info (and can't - the symbol table contains an ArrayList not an ArrayList<Integer>, ArrayList<Float> etc).

Intuitively (to me at least), it is the pointer that has the generic info because you can do:

ArrayList<Foo> f = new ArrayList<>();
ArrayList b = f;

Let's discuss in person on Monday - my view remains that the pointer is the Right Place(tm) for this information, but perhaps not.

tautschnig · 2019-02-02T15:54:39Z

Having thought about this a bit more the following three comments may be worth considering:

If the info is only needed in the language front-end, then an ID_frontend_pointer should be used. Maybe the Java front-end then just needs an extra pass to rewrite all these once sufficient information is available?
A pointer should just remain a pointer, and all the details should be in its subtype(). Those subtypes may need to be decorated in a number of ways, but I think all pointers should remain equal. No Animal Farm, please.
If Java does need/have a special kind-of-pointer-but-not-quite-a-pointer then let's just introduce a new type. There is hardly any runtime cost to introduce something like an ID_pointer_to_generic, it's just that additional code needs to be written where we currently deal with pointers.

kroening

We really want to get rid of base_type_eq (and replace it by ==). Echoing @tautschnig , if this is something that happens in the Java-front end only, then add ID_frontend_reference and a local function?

smowton · 2019-02-04T18:45:54Z

@kroening @tautschnig @thk123 summary of an in-person discussion with @thk123 just now:

Definition: a decorated pointer is a pointer with additional non-comment attributes beyond its subtype. java_generic_typet is a decorated pointer type.

Desired invariants:

(until base_type_eq dies): if t1 base_type_eq t2 and neither t1 nor t2 contains any symbols, then t1 == t2 (currently untrue, hence this PR)
Anywhere a symbol_exprt is used, its type matches the type recorded in the symbol table
&*e has the same as e (currently untrue; this may discard decorations from e)

Desired functionality: Java parameter and return-value types with generic signatures should be available both before and after running the goto-convert -> symex -> solving pipeline, to generate better method stubs / harnesses and to generate better tests respectively. Currently this is done by examining the types of symbols observed in a goto_tracet, which in turn come from the types in the GOTO program.

Current plan: ensure that any code dealing with decorated pointers preserves their type. This has proven easy in the Java frontend, but doing this to symex-dereference and the various ways of querying value_sett seems error-prone. This basically discards invariant (3); every transformation needs to be aware that decorated pointers exist and may need to introduce casts to preserve type. However it is at least incremental and doesn't involve any wholesale changes.

Proposed replacement plan: store generic information in the symbol table as comments, so that decorated types are == compatible with undecorated ones; erase decorations prior to calling goto_convert (they could also be harmlessly propagated, as they are only comments, but this risks people using them when they're not reliable).

Advantages:

Invariant 3 is restored; core code does not need to know about decorated pointers at all.

Disadvantages:

comments are fragile; we have previously seen problems with merge_ireps, as used by symex_target_equationt, erasing or conflating decorated types where those decorations are just comments.
Even if we erase them prior to goto_convert, discouraging people from trying to use decorations on expressions (cf. looking a symbol up in the symbol table to find the definitive, fully decorated version of its type), there's a risk of e.g. ns.lookup(some_symbol.get_identifier()).symbol_expr() reintroducing a decorated type into the GOTO program.

Both disadvantages would go away if there was somewhere to store the decorated type that wasn't symbolt::type, whether that was an additional field of symbolt (the ever-dreamt-of symbolt::annotations perhaps) or some out-of-band data structure produced by java_bytecode_languaget and accessed and used by Java frontend programs). However this requires some awkward gymnastics to route that information around.

Thoughts?

tautschnig · 2019-02-04T19:10:50Z

Thanks a lot for the detailed write-up! IMHO we do already have decorated types: these are now tags (struct/union/enum tags). Essentially a named pointer to another symbol that holds a lot more information. With tags there is no risk of .symbol_expr() introducing decorated types. Introducing a new kind of tag is of course a lot more work than going via comments, but it may well be worth it?

smowton · 2019-02-04T22:14:06Z

Interesting. We do already have generic_struct_type_tagt, which is indeed a struct tag with arguments. It means that ns.follow actually discards information... but then I suppose so does following the pointer! An interesting possibility, I'll think about it further overnight...

smowton · 2019-02-07T17:50:30Z

Replaced by #4114

smowton requested review from kroening, peterschrammel, pkesseli and tautschnig as code owners February 1, 2019 16:07

tautschnig reviewed Feb 1, 2019

View reviewed changes

smowton mentioned this pull request Feb 1, 2019

Test for member type consistency [depends-on: #4021] #4023

Merged

smowton added 2 commits February 1, 2019 17:31

Java frontend: get qualified generic types of static fields

76328dc

Before it would unintentionally erase SomeClass.field from A<B> to just the unqualified A type.

smowton requested review from cesaro, cristina-david, NathanJPhillips and thk123 as code owners February 1, 2019 17:34

tautschnig changed the title ~~Base-type-eq: check pointer types really match~~ Base-type-eq: check pointer types really match [blocks: #4023] Feb 1, 2019

tautschnig assigned smowton Feb 1, 2019

tautschnig added the blocker label Feb 1, 2019

kroening requested changes Feb 3, 2019

View reviewed changes

smowton closed this Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Base-type-eq: check pointer types really match [blocks: #4023] #4020

Base-type-eq: check pointer types really match [blocks: #4023] #4020

smowton commented Feb 1, 2019

tautschnig Feb 1, 2019

smowton commented Feb 1, 2019

thk123 commented Feb 1, 2019 •

edited

Loading

tautschnig commented Feb 2, 2019

kroening left a comment

smowton commented Feb 4, 2019 •

edited

Loading

tautschnig commented Feb 4, 2019

smowton commented Feb 4, 2019

smowton commented Feb 7, 2019

Base-type-eq: check pointer types really match [blocks: #4023] #4020

Base-type-eq: check pointer types really match [blocks: #4023] #4020

Conversation

smowton commented Feb 1, 2019

tautschnig Feb 1, 2019

Choose a reason for hiding this comment

smowton commented Feb 1, 2019

thk123 commented Feb 1, 2019 • edited Loading

tautschnig commented Feb 2, 2019

kroening left a comment

Choose a reason for hiding this comment

smowton commented Feb 4, 2019 • edited Loading

tautschnig commented Feb 4, 2019

smowton commented Feb 4, 2019

smowton commented Feb 7, 2019

thk123 commented Feb 1, 2019 •

edited

Loading

smowton commented Feb 4, 2019 •

edited

Loading