-
Notifications
You must be signed in to change notification settings - Fork 273
[TG-8293] Constant propagation for empty strings and string concatenation [blocks: #4941] #4885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TG-8293] Constant propagation for empty strings and string concatenation [blocks: #4941] #4885
Conversation
src/goto-symex/symex_assign.cpp
Outdated
{}); | ||
|
||
symbol_table_baset &st_mutable = | ||
const_cast<symbol_table_baset &>(ns.get_symbol_table()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review for the first commit
src/goto-symex/symex_assign.cpp
Outdated
#include <util/mathematical_expr.h> | ||
#include <util/string_expr.h> | ||
|
||
#include <iostream> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably not necessary
src/goto-symex/symex_assign.cpp
Outdated
const ssa_exprt &char_array, | ||
const array_exprt &new_char_array) | ||
{ | ||
// assign length of string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⛏️ I find a comment like // length = new_length would be clearer
src/goto-symex/symex_assign.cpp
Outdated
{}); | ||
|
||
symbol_table_baset &st_mutable = | ||
const_cast<symbol_table_baset &>(ns.get_symbol_table()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫
src/goto-symex/symex_assign.h
Outdated
bool constant_propagate_empty_string( | ||
const function_application_exprt &f_l1); | ||
|
||
void assign_constants( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 documentation is missing for these two functions
src/goto-symex/symex_assign.h
Outdated
target(target) | ||
target(target), | ||
char_type(16), | ||
length_type(32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these fields shouldn't be necessary and this is not language independent like symex should be.
src/goto-symex/symex_assign.cpp
Outdated
} | ||
|
||
bool symex_assignt::constant_propagate_empty_string( | ||
const function_application_exprt &f_l1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⛏️ could we use a clearer name, like function_application_l1
?
src/goto-symex/symex_assign.cpp
Outdated
|
||
INVARIANT( | ||
f_l1.arguments().size() == 2, | ||
"empty string primitive does not take arguments"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not very clear because it asserts there are 2 arguments, but the comment says there are 0
src/goto-symex/symex_assign.cpp
Outdated
@@ -350,6 +358,92 @@ void symex_assignt::assign_from_struct( | |||
} | |||
} | |||
|
|||
void symex_assignt::assign_constants( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be made clear in the name that it is for string constants
src/goto-symex/symex_assign.cpp
Outdated
new_char_array, | ||
{}); | ||
|
||
address_of_exprt aoe( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐤
src/goto-symex/symex_assign.cpp
Outdated
f_l1.arguments().size() == 2, | ||
"empty string primitive does not take arguments"); | ||
|
||
array_exprt char_array({}, char_array_type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐤
src/goto-symex/symex_assign.cpp
Outdated
const function_application_exprt &f_l1, | ||
const function_application_exprt &f_l2) | ||
{ | ||
const refined_string_exprt &s1 = to_string_expr(f_l2.arguments().at(2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
used only once, so can be inlined below. There are a lot of variables representing basically the same thing in this function (s1, s1_data_opt, s1_data) so it's bette to try to minimize that 🍋
src/goto-symex/symex_assign.cpp
Outdated
|
||
const array_exprt &s1_data = s1_data_opt->get(); | ||
|
||
const refined_string_exprt &s2 = to_string_expr(f_l2.arguments().at(3)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🍋
src/goto-symex/symex_assign.cpp
Outdated
const constant_exprt new_char_array_length = | ||
from_integer(new_size, length_type); | ||
|
||
array_typet new_char_array_type( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🍓
src/goto-symex/symex_assign.cpp
Outdated
operands.insert( | ||
operands.end(), | ||
s2_data.operands().begin(), | ||
s2_data.operands().end()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we should avoid duplication with the code from solver/strings/string_concatenation_builtin_function.h
, concatenation is pretty simple and unlikely to change, but for functions like format we definitely need to share the code
src/goto-symex/symex_assign.cpp
Outdated
|
||
array_exprt new_char_array( | ||
std::move(operands), | ||
new_char_array_type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🍓
src/util/simplify_expr.h
Outdated
@@ -27,4 +31,9 @@ bool simplify( | |||
// this is the preferred interface | |||
exprt simplify_expr(exprt src, const namespacet &ns); | |||
|
|||
optionalt<std::reference_wrapper<const array_exprt>> | |||
try_get_string_data_array( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 needs documentation
@@ -27,4 +31,9 @@ bool simplify( | |||
// this is the preferred interface | |||
exprt simplify_expr(exprt src, const namespacet &ns); | |||
|
|||
optionalt<std::reference_wrapper<const array_exprt>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⛏️ optional references can be replaced by pointers
8bd258e
to
19f8245
Compare
Codecov Report
@@ Coverage Diff @@
## develop #4885 +/- ##
===========================================
+ Coverage 69.27% 69.28% +0.01%
===========================================
Files 1309 1309
Lines 108460 108515 +55
===========================================
+ Hits 75133 75184 +51
- Misses 33327 33331 +4
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
Passed Diffblue compatibility checks (cbmc commit: 19f8245).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/119510875
df38ed0
to
c9c6033
Compare
c9c6033
to
686dd8a
Compare
686dd8a
to
d0b5245
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR failed Diffblue compatibility checks (cbmc commit: 686dd8a).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/120002129
Status will be re-evaluated on next push.
Common spurious failures include: the cbmc commit has disappeared in the mean time (e.g. in a force-push); the author is not in the list of contributors (e.g. first-time contributors); compatibility was already broken by an earlier merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
Passed Diffblue compatibility checks (cbmc commit: d0b5245).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/120004102
…space Create auxiliary symbols with a different name than symbols in namespace [blocks: #4885]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
Passed Diffblue compatibility checks (cbmc commit: 71d11d5).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/121745220
f363449
to
329e603
Compare
const auto &ibv_type = | ||
to_integer_bitvector_type(to_array_type(char_array.type()).subtype()); | ||
|
||
const std::size_t n_bits = ibv_type.get_width(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⛏️ ibv_type
is used only once, it can be inlined here since the variable name doesn't give any information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does provide information, and is more readable as is imo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't know what ibv
meant if it wasn't followed by to_integer_bitvector_type
, so at least to me it gives no information
return new_aux_symbol; | ||
} | ||
|
||
void goto_symext::associate_array_to_pointer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ns
can be given as an argument
src/goto-symex/goto_symex.cpp
Outdated
ns, | ||
state.symbol_table); | ||
|
||
ssa_exprt ssa_expr(return_symbol.symbol_expr()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const
src/goto-symex/goto_symex.cpp
Outdated
function_application_exprt array_to_pointer_app( | ||
function_symbol.symbol_expr(), {new_char_array, string_data}); | ||
|
||
symbolt &return_symbol = get_fresh_aux_symbol( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const
src/goto-symex/goto_symex.cpp
Outdated
const symbolt &function_symbol = | ||
ns.lookup(ID_cprover_associate_array_to_pointer_func); | ||
|
||
function_application_exprt array_to_pointer_app( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const and {
}
initialization is prefered
src/goto-symex/goto_symex.cpp
Outdated
aux_symbol.value == new_char_array, | ||
"symbol shall have value derived from char array content"); | ||
|
||
address_of_exprt string_data( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const
return false; | ||
} | ||
|
||
void goto_symext::assign_string_constant( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this function belongs in the goto_symext
class, it doesn't seem to use any of its field (apart from ns which could be given as argument). 🐙
} | ||
} | ||
|
||
const symbolt &goto_symext::get_new_string_data_symbol( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐙
return try_get_string_data_array(s_pointer_opt->get(), ns); | ||
} | ||
|
||
void goto_symext::constant_propagate_empty_string( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐙
329e603
to
e78d73a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR failed Diffblue compatibility checks (cbmc commit: e78d73a).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/122123110
Status will be re-evaluated on next push.
Common spurious failures include: the cbmc commit has disappeared in the mean time (e.g. in a force-push); the author is not in the list of contributors (e.g. first-time contributors); compatibility was already broken by an earlier merge.
/// \param symex_assign: object handling symbol assignments | ||
/// \param f_l1: application of function ID_cprover_string_empty_string_func | ||
/// with l1 renaming applied | ||
void constant_propagate_empty_string( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I forgot to ask for unit tests for this function and the one below (ideally also the others that are added to goto_symext
)
I don't like the changes, but if a code owner thinks they are fine then I won't block it
Regarding @romainbrenguier's comment above, we have agreed that adding unit tests for the added methods in |
String s2 = "\\"; | ||
String s3 = s1 + s2; | ||
assert s3.length() == 2; | ||
assert s3.startsWith("\t"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add assert s3.endsWith("\\");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s1.equals(s2)
is implemented in the models as s1.length() == s2.length() && s1.startsWith(s2)
so here we're checking that the strings are equal. To keep the tests as simple as possible we're not using the models library. I don't think it's necessary to use endsWith()
here as this is a test for concatenation, endsWith()
is tested elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then it should better be assert s3.startsWith("\t\\");
if you want to emulate equals?
StringBuilder sb = new StringBuilder(); | ||
String s = sb.toString(); | ||
assert s.isEmpty(); | ||
assert s.startsWith(""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that always true
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it should be.
CORE | ||
Main.class | ||
--function Main.test --property "java::Main.test:()V.assertion.1" --property "java::Main.test:()V.assertion.2" | ||
^Generated [0-9]+ VCC\(s\), 0 remaining after simplification$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be exactly 1 VCC
if you use --property
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--property
is passed in twice here, so yes I think the value should be 2. I'll change it to use the precise value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it turns out that jbmc
translates those assertions to something like:
if(length_return_value != 6)
{
assert(false);
}
So if length_return_value
is 6, we don't symex the assertion and thus get 0 VCCs before simplification. This is probably not stable behaviour though so I'll leave it as is. The important thing is that after simplification we have 0 remaining VCCs.
lhs, expr_skeletont{}, rhs, lhs_if_then_else_conditions); | ||
} | ||
} | ||
|
||
bool goto_symext::constant_propagate_assignment_with_side_effects( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but might be needed in the future
@danpoe, are there any operations where this is actually used?
} | ||
|
||
optionalt<std::reference_wrapper<const array_exprt>> | ||
goto_symext::try_evaluate_constant_string( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐙
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer those methods to live in goto_symext
for now as opposed to being static. The plan is to eventually create a goto_symex_constant_propagationt
member of goto_symext
which handles the constant propagation.
e78d73a
to
4fecf4e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
Passed Diffblue compatibility checks (cbmc commit: 4fecf4e).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/122424687
4fecf4e
to
d9f75eb
Compare
The function is generally useful for obtaining strings that contain only alphanumeric characters.
This is so the function can be called outside of simplify_expr.cpp.
Take the content field of a refined string expression instead of taking the whole refined string expression. This will allow it to be used from the string mutators constant propagation code in goto_symex.cpp.
….cpp Previously the string solver returned an error when constant propagation of strings was used as `array_poolt::insert()` might be called for array expressions for which there is already a mapping in the `length_of_array` map.
This implements constant propagation of ID_cprover_string_concat_func and ID_cprover_string_empty_string_func which are the primitives used by StringBuilder.append(), which in turn is used to implement String concatenation using +.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR failed Diffblue compatibility checks (cbmc commit: d9f75eb).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/122439299
Status will be re-evaluated on next push.
Common spurious failures include: the cbmc commit has disappeared in the mean time (e.g. in a force-push); the author is not in the list of contributors (e.g. first-time contributors); compatibility was already broken by an earlier merge.
Uh oh!
There was an error while loading. Please reload this page.