Skip to content

Base-type-eq: check pointer types really match [blocks: #4023] #4020

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

smowton
Copy link
Contributor

@smowton smowton commented Feb 1, 2019

Generally base-type-eq's task is to ensure that types match up to substitution of a type's body for a symbol referring to that type. However now that derivatives of pointers exist with named subexpressions (e.g. java_generic_parametert), it is now possible for types not involving symbols to be base_type_eq but not ==. This resolves that particular case by checking that the names subexpressions match exactly.

Generally base-type-eq's task is to ensure that types match up to substitution of a type's body for a symbol referring to
that type. However now that derivatives of pointers exist with named subexpressions (e.g. java_generic_parametert), it is
now possible for types not involving symbols to be `base_type_eq` but not `==`. This resolves that particular case by
checking that the names subexpressions match exactly.
const auto &named_subs1 = type1.get_named_sub();
const auto &named_subs2 = type2.get_named_sub();

for(const auto &name_and_sub : named_subs1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use two iterators over the two collections, just like irept::operator== does. Using find is expensive and unnecessary as you can do this with a linear algorithm.

Before it would unintentionally erase SomeClass.field from A<B> to just the unqualified A type.
… just to compare

This makes the invariant pass, but this surely warrants more attention -- type consistency is
difficult to achieve when pointers are qualified and dereferencing and then taking the address
again doesn't retain that information (but I don't know the logic behind the design decision
in enough detail to change it)
@smowton
Copy link
Contributor Author

smowton commented Feb 1, 2019

Haven't done that yet, but did notice a problem with this check exposing two other inconsistency problems, which I've pushed commits to fix. One is easy (static fields in the Java frontend) and the other is hard (symex-dereference assuming that you can strip a pointer type and add it back with a plain pointer_type(t) and get the same thing back (in fact that discards generic qualifiers).

I have a feeling my "fix" is really just weakening the test and actual inconsistency will still occur, but I'm not sure what to do about it yet. It looks an awful lot like encoding generics on the pointer rather than the struct type is the root of the problem, but I don't understand enough about the original design decision to try to change that (yet). @thk123 @allredj do you have any comment?

@tautschnig tautschnig changed the title Base-type-eq: check pointer types really match Base-type-eq: check pointer types really match [blocks: #4023] Feb 1, 2019
@thk123
Copy link
Contributor

thk123 commented Feb 1, 2019

@smowton the problem with putting in the struct is the struct/struct_tag_type, most of the time we have the latter, but then you end up in the weird situation where:

type has generic info
ns.follow(type) doesn't have generic info (and can't - the symbol table contains an ArrayList not an ArrayList<Integer>, ArrayList<Float> etc).

Intuitively (to me at least), it is the pointer that has the generic info because you can do:

ArrayList<Foo> f = new ArrayList<>();
ArrayList b = f;

Let's discuss in person on Monday - my view remains that the pointer is the Right Place(tm) for this information, but perhaps not.

@tautschnig
Copy link
Collaborator

Having thought about this a bit more the following three comments may be worth considering:

  1. If the info is only needed in the language front-end, then an ID_frontend_pointer should be used. Maybe the Java front-end then just needs an extra pass to rewrite all these once sufficient information is available?
  2. A pointer should just remain a pointer, and all the details should be in its subtype(). Those subtypes may need to be decorated in a number of ways, but I think all pointers should remain equal. No Animal Farm, please.
  3. If Java does need/have a special kind-of-pointer-but-not-quite-a-pointer then let's just introduce a new type. There is hardly any runtime cost to introduce something like an ID_pointer_to_generic, it's just that additional code needs to be written where we currently deal with pointers.

Copy link
Member

@kroening kroening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really want to get rid of base_type_eq (and replace it by ==). Echoing @tautschnig , if this is something that happens in the Java-front end only, then add ID_frontend_reference and a local function?

@smowton
Copy link
Contributor Author

smowton commented Feb 4, 2019

@kroening @tautschnig @thk123 summary of an in-person discussion with @thk123 just now:

Definition: a decorated pointer is a pointer with additional non-comment attributes beyond its subtype. java_generic_typet is a decorated pointer type.

Desired invariants:

  1. (until base_type_eq dies): if t1 base_type_eq t2 and neither t1 nor t2 contains any symbols, then t1 == t2 (currently untrue, hence this PR)
  2. Anywhere a symbol_exprt is used, its type matches the type recorded in the symbol table
  3. &*e has the same as e (currently untrue; this may discard decorations from e)

Desired functionality: Java parameter and return-value types with generic signatures should be available both before and after running the goto-convert -> symex -> solving pipeline, to generate better method stubs / harnesses and to generate better tests respectively. Currently this is done by examining the types of symbols observed in a goto_tracet, which in turn come from the types in the GOTO program.

Current plan: ensure that any code dealing with decorated pointers preserves their type. This has proven easy in the Java frontend, but doing this to symex-dereference and the various ways of querying value_sett seems error-prone. This basically discards invariant (3); every transformation needs to be aware that decorated pointers exist and may need to introduce casts to preserve type. However it is at least incremental and doesn't involve any wholesale changes.

Proposed replacement plan: store generic information in the symbol table as comments, so that decorated types are == compatible with undecorated ones; erase decorations prior to calling goto_convert (they could also be harmlessly propagated, as they are only comments, but this risks people using them when they're not reliable).

Advantages:

  1. Invariant 3 is restored; core code does not need to know about decorated pointers at all.

Disadvantages:

  1. comments are fragile; we have previously seen problems with merge_ireps, as used by symex_target_equationt, erasing or conflating decorated types where those decorations are just comments.
  2. Even if we erase them prior to goto_convert, discouraging people from trying to use decorations on expressions (cf. looking a symbol up in the symbol table to find the definitive, fully decorated version of its type), there's a risk of e.g. ns.lookup(some_symbol.get_identifier()).symbol_expr() reintroducing a decorated type into the GOTO program.

Both disadvantages would go away if there was somewhere to store the decorated type that wasn't symbolt::type, whether that was an additional field of symbolt (the ever-dreamt-of symbolt::annotations perhaps) or some out-of-band data structure produced by java_bytecode_languaget and accessed and used by Java frontend programs). However this requires some awkward gymnastics to route that information around.

Thoughts?

@tautschnig
Copy link
Collaborator

Thanks a lot for the detailed write-up! IMHO we do already have decorated types: these are now tags (struct/union/enum tags). Essentially a named pointer to another symbol that holds a lot more information. With tags there is no risk of .symbol_expr() introducing decorated types. Introducing a new kind of tag is of course a lot more work than going via comments, but it may well be worth it?

@smowton
Copy link
Contributor Author

smowton commented Feb 4, 2019

Interesting. We do already have generic_struct_type_tagt, which is indeed a struct tag with arguments. It means that ns.follow actually discards information... but then I suppose so does following the pointer! An interesting possibility, I'll think about it further overnight...

@smowton
Copy link
Contributor Author

smowton commented Feb 7, 2019

Replaced by #4114

@smowton smowton closed this Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants