Skip to content

[TG-3694] Parse and store inner class information #2493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 5, 2018

Conversation

jeannielynnmoulton
Copy link
Contributor

@jeannielynnmoulton jeannielynnmoulton commented Jun 28, 2018

This parses the InnerClasses attribute of the java bytecode with the intention of determining

  • If an parsed class is an inner class
  • The accessibility information for inner classes

If the parsed class is an inner class, then the original accessibility is overwritten by the information from the InnerClasses attribute.

Whether the class is an inner class is then stored in java_class_typet.

...

@thk123 @thomasspriggs Currently there is one test failing in test gen with the changes that will require some investigation, but I am on AL Friday and wanted to get some preliminary reviews so with some luck, this might be finished the sprint. Thanks.

@thk123 Could you please add an in progress label or equivalent. I have no privileges.

Copy link
Contributor

@thk123 thk123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good - couple of suggestions and I think your tests need tweaking.

"The number of bytes to be read for the InnerClasses attribute does not "
"match the attribute length.");

const std::function<pool_entryt &(u2)> pool_entry_lambda =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should probably add this exception to the coding standard, but I think auto is preferable for a lambda for two reasons:

  1. By putting it into a std::function you allow the value to be null whereas the auto allows the compiler to pick a precise lambda type
  2. Probably doesn't help readability as the inputs and outputs are repeated on the rhs ([this](u2 index) -> pool_entry &)


const std::function<pool_entryt &(u2)> pool_entry_lambda =
[this](u2 index) -> pool_entryt & { return pool_entry(index); };
std::function<std::string(std::string, char)> remove_separator_char =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: const

UNUSED bool is_protected = inner_class_access_flags & ACC_PROTECTED;

// If the original parsed class name matches the inner class name
// the parsed class is an inner class, so overwrite the parsed class'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this work? Also possibly I think you might be able to check whether outer_class_info_index == 0?

If C is not a member of a class or an interface (that is, if C is a top-level class or interface (JLS §7.6) or a local class (JLS §14.3) or an anonymous class (JLS §15.9.5)), the value of the outer_class_info_index item must be zero.

Though that sounds like anonymous and local classes would be marked as not inner, which I guess is true?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anonymous classes and local classes are both inner classes. If they are not being marked as such then we have a problem. I think the accessibility for these should always be private.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this, it may be advantageous to have a is_anonymous / is_local flag set at this point. We currently have an is_anonymous method in state_typet, which attempts to work this out based on regexing the name, which isn't the ideal approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problems on test-gen were related to treating anonymous/local classes the same as inner classes. This has been updated by checking that outer_class_info_index is 0 or not (it's 0 when it is outer, interface, local or anonymous). @thk123 The reason we can't just check outer_class_info_index is because the we only want to overwrite the accessibility information when the current parsed class (not the current parsed inner class information) is an inner class. As far as I can tell, there is no magic way to do this from the spec.

@@ -1640,6 +1640,48 @@ void java_bytecode_parsert::rclass_attribute(classt &parsed_class)
parsed_class.attribute_bootstrapmethods_read = true;
read_bootstrapmethods_entry(parsed_class);
}
else if(attribute_name == "InnerClasses")
{
u2 number_of_classes = read_u2();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might consider pulling this out into a read_innerclass_attribute which would allow you to document when you'd expect to see this and an overview of the info we're getting out of it, as I can't get my head around when this attribute appears and what we find in it, despite reading the spec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec is not great, agreed.

@@ -111,6 +111,16 @@ class java_class_typet:public class_typet
return set(ID_access, access);
}

const irep_idt &get_inner_class() const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return a bool (using get_bool as done in get_final) as this hides the messiness that everything is strings from the user of this function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these methods be called get_is_inner_class and set_is_inner_class instead? The calls to the current method names superficially appear to be a call to a method on an outer class to get an inner class belonging to it.

THEN("The class should be marked as public")
{
const symbolt &class_symbol =
*new_symbol_table.lookup("java::InnerClasses$PublicInnerClass");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: use lookup_ref if you are going to immediately dereference the return from lookup as it will produce a nicer error if the symbol doesn't exist.

@@ -0,0 +1,26 @@
public class InnerClasses {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a non-public class to this file (in case that means there are two inner class attributes?) Unless that isn't true, in which case might not be worth it

Copy link
Contributor Author

@jeannielynnmoulton jeannielynnmoulton Jul 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a non-public outer class 👍

*new_symbol_table.lookup("java::InnerClasses$PublicInnerClass");
const java_class_typet java_class =
to_java_class_type(class_symbol.type);
REQUIRE(java_class.get_inner_class() == ID_is_inner_class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is wrong, I think the string will be something like "1" or "true" - which probably means if this passes CI these unit tests aren't being run.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it is the missing from Makefile as the CMake build fails:
https://travis-ci.org/diffblue/cbmc/jobs/397934975#L7792
1 == is_inner_class

{
WHEN("Parsing the InnerClasses attribute for a public inner class")
{
const symbol_tablet &new_symbol_table =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you always load the same file, I'd probably do this at the top of scenario to reduce duplication

@@ -0,0 +1,84 @@
/*******************************************************************\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing from the Makefile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what this means or how to add it. I will ask you when you are done with your current conversation.

Copy link
Contributor

@thomasspriggs thomasspriggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nit picking. Couple of requests for extensions to the unit test.

@@ -1640,6 +1640,48 @@ void java_bytecode_parsert::rclass_attribute(classt &parsed_class)
parsed_class.attribute_bootstrapmethods_read = true;
read_bootstrapmethods_entry(parsed_class);
}
else if(attribute_name == "InnerClasses")
{
Copy link
Contributor

@thomasspriggs thomasspriggs Jun 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section should be moved into a new function/method, to avoid rclass_attribute getting too long.

"The number of bytes to be read for the InnerClasses attribute does not "
"match the attribute length.");

const std::function<pool_entryt &(u2)> pool_entry_lambda =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: As this pool_entry_lambda is only used once, I would move it into the call to .get_name, rather than storing it in a variable.

UNUSED u2 inner_name_index = read_u2();
u2 inner_class_access_flags = read_u2();

if(inner_class_info_index != 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bike shedding: I would prefer if(inner_class_info_index == 0) continue; over indenting the remainder of the loop. I think opinions on this one are likely to vary.

Copy link
Contributor Author

@jeannielynnmoulton jeannielynnmoulton Jul 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been asked to do worse. I am not sure which is clearer, but I've changed it 👍 No I haven't because the bytes need to be read anyway. No I am being stupid, changed :) Can you do double strikethrough?

u2 number_of_classes = read_u2();
u4 number_of_bytes_to_be_read = number_of_classes * 8 + 2;
INVARIANT(
number_of_bytes_to_be_read == attribute_length,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is good that we have this check. However I think this code could be more robust, if we have the condition number_of_bytes_to_be_read <= attribute_length and a call to skip_bytes for the difference at the end of this block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you are saying, but I am not sure that it's best. If the bytes don't match, I don't think we can guarantee that anything makes sense because that would mean that the bytecode spec has changed basically, and so should probably just fail hard.


// If the original parsed class name matches the inner class name
// the parsed class is an inner class, so overwrite the parsed class'
// access information and mark it as an inner class
UNUSED bool is_inner_class =
bool is_inner_class =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use parsed_class.is_inner_class = here instead of declaring an additional local variable and hence save a line of code inside the if block below.

private PrivateInnerClass(int i) {
this.i = i;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you extend this test with some Inner Inner class examples, to show that things still work when the classes are nested an additional layer deep?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also extend this with an anonymous class and a local class?

this.i = i;
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: The clion/intellij settings can be tweaked to automatically add a new line to the end of files when it is missing. Should save a bunch of time on comments around this in your PRs.

@@ -111,6 +111,16 @@ class java_class_typet:public class_typet
return set(ID_access, access);
}

const irep_idt &get_inner_class() const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these methods be called get_is_inner_class and set_is_inner_class instead? The calls to the current method names superficially appear to be a call to a method on an outer class to get an inner class belonging to it.

WHEN("Parsing the InnerClasses attribute for a public inner class")
{
const symbol_tablet &new_symbol_table =
load_java_class("InnerClasses", "./java_bytecode/java_bytecode_parser");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also have some tests where an inner class is loaded first, instead of an outer class containing inner classes?

.get_name(pool_entry_lambda);
bool is_private = inner_class_access_flags & ACC_PRIVATE;
bool is_public = inner_class_access_flags & ACC_PUBLIC;
bool is_protected = inner_class_access_flags & ACC_PROTECTED;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: These three lines give a clang-tidy warning in clion, due to the implicit cast from integer to bool. This could be fixed as follows - bool is_protected = (inner_class_access_flags & ACC_PROTECTED) != 0;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or a read_bitflag method but I wouldn't block this PR on it since this is a common approach used in this file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking through the rest of this file, all the other instances of this have already been fixed to avoid the implicit cast.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered it to be an optional "NIT" comment. I guess it would have been helpful if @thk123 had commented with something along the lines of "This is not a NIT, fix it."

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passed Diffblue compatibility checks (cbmc commit: a129281).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/77913660

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passed Diffblue compatibility checks (cbmc commit: bccf9cc).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/77914476

In this commit, nothing is done with the data.
Though this data is stored, it will not be used in test-gen yet because test-gen is assuming all non-public inner classes are private.
@jeannielynnmoulton
Copy link
Contributor Author

jeannielynnmoulton commented Jul 3, 2018

@thk123 @thomasspriggs This is ready for re-review. Seems to have passed the compatibility check last night, which is what I would expect now that it is treating anonymous/local classes correctly and test-gen is still ignoring most of inner classes, so I'm reasonably confident that PR2014 will pass too. I doubt the test-gen changes one will make it into the sprint, but it's here anyway for reference.

Test-gen with changes: https://github.com/diffblue/test-gen/pull/2012

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passed Diffblue compatibility checks (cbmc commit: c0c1029).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/77954988

Copy link
Contributor

@thk123 thk123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All comments look to be addressed, couple of suggestions for making the comment on rinner_classes_attribute even better. I also think the enum for accessibility of classt would be a big win.

UNUSED u2 inner_name_index = read_u2();
u2 inner_class_access_flags = read_u2();

if(inner_class_info_index == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment for why would be good here? My reading of the spec is it can't be zero?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could still do with a comment here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

.get_name(pool_entry_lambda);
bool is_private = inner_class_access_flags & ACC_PRIVATE;
bool is_public = inner_class_access_flags & ACC_PUBLIC;
bool is_protected = inner_class_access_flags & ACC_PROTECTED;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or a read_bitflag method but I wouldn't block this PR on it since this is a common approach used in this file.

{
// This is a marker for an anonymous or local class
// which are treated as private
parsed_class.is_private = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if spotted this before, but having three booleans to represent a single value is error prone, I'd prefer an enum in the classt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a common approach in this file and it effectively matches the java bytecode.

/// Corresponds to the element_value structure
/// Described in Java 8 specification 4.7.6
/// https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.7.6
/// Parses the bytes of the InnerClasses attribute for the current parsed class,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: fewer words "Parses the bytes of the InnerClass attribute"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarity: current parsed class or current parsed class file? (If the former, ignore me)

/// Described in Java 8 specification 4.7.6
/// https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.7.6
/// Parses the bytes of the InnerClasses attribute for the current parsed class,
/// which contains any array of information about inner classes. We are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarity: Inner classes of that class?

/// in its inner class array. If the parsed class is not an inner class, then it
/// is ignored. When a parsed class is an inner class, the accessibility
/// information for the parsed class is overwritten, and the parsed class is
/// marked as an inner class. Note that is outer_class_info_index == 0, the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably stick this note by the code it relates to rather than here.

{
// This is a marker for an anonymous or local class
// which are treated as private
parsed_class.is_private = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a common approach in this file and it effectively matches the java bytecode.

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passed Diffblue compatibility checks (cbmc commit: 6ce7b13).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/77989278

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passed Diffblue compatibility checks (cbmc commit: 9ba55e2).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/77998632

@thk123
Copy link
Contributor

thk123 commented Jul 4, 2018

On a subsequent PR you said you're just waiting on CI, but I think these two comments are outstanding:
https://github.com/diffblue/cbmc/pull/2493/files#r199757047 (documenting the inner_class_index == 0 if)
#2493 (comment) (using != 0 rather than implicit conversion to bool

@jeannielynnmoulton
Copy link
Contributor Author

@thk123 I have removed the if(inner_class_index == 0) line, so that is no longer applicable. When someone says NIT: I (maybe wrongly) take that as optional, but I have changed it since I am pushing anyway.

@thk123
Copy link
Contributor

thk123 commented Jul 4, 2018

Agree NIT means optional, but since there was another change (the docs, or as it turns out, removing the if) thought I'd draw your attention to it 🙂. Will merge when passes CI.

@jeannielynnmoulton
Copy link
Contributor Author

Waiting for CI to pass on bump: https://github.com/diffblue/test-gen/pull/2012

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passed Diffblue compatibility checks (cbmc commit: 5350133).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/78105793

@thk123 thk123 merged commit 819c683 into diffblue:develop Jul 5, 2018
NathanJPhillips pushed a commit to NathanJPhillips/cbmc that referenced this pull request Aug 22, 2018
6409eae Merge pull request diffblue#2449 from tautschnig/c++-template-cleanup
1134455 Merge pull request diffblue#2443 from tautschnig/vs-rdecl
6e9e655 Merge pull request diffblue#2520 from smowton/smowton/feature/constant-propagator-selectivity
9013779 Merge pull request diffblue#2383 from tautschnig/no-sed
56a487e Constant propagator: add callback to filter tracked values
a354513 Merge pull request diffblue#2510 from polgreen/hex_trace
3545be4 Merge pull request diffblue#2523 from peterschrammel/do-not-ignore-full-slice
819c683 Merge pull request diffblue#2493 from jeannielynnmoulton/jeannie/InnerClasses
a018652 allow unsigned long instead of unsigned in regression test for hex_trace
d5bbdd8 represent numerical values in CBMC trace in hex
19bddce Do not ignore --full-slice
5350133 Refactor inner class parsing.
6e554d9 Merge pull request diffblue#2500 from diffblue/git-version-speedup
9ba55e2 Marks anonymous classes as inner classes
b96c7ba move build commands for version.h from common to util/
6ce7b13 Clarifies language in documentation.
c0c1029 Fixes parsing for anonymous classes
1930aef Refactors parsing of inner classes attribute.
b28562b Adds unit test for parsing inner classes.
c336455 Stores inner class data in java class types.
457bac9 Parses InnerClasses attribute of java bytecode.
34bd58f Clean up unused template instantiation symbols
fea1f47 Remove unused parameter from rDeclarator
e00c6ee Set BUILD_ENV via make variable instead of patching via sed

git-subtree-dir: cbmc
git-subtree-split: 6409eae
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants