-
Notifications
You must be signed in to change notification settings - Fork 273
Symex: ignore null dereferences when targeting Java #2125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Symex: ignore null dereferences when targeting Java #2125
Conversation
So why do we have |
Flow insensitivity. A a = null;
if(some_param)
a = new A(5);
// VS for a = { dynamic_object1, NULL }
int val = a.value; Here propagating A a = null;
if(some_param)
a = new A(5);
if(a == null)
throw new NullPointerException();
int val = a.value; The result is a needlessly complex Generally symex could figure this out by using the guard at |
Similar to the other PR, if you don't want to see |
0040a7d
to
ce3f8d0
Compare
Discussed in person with @kroening: agreed solution is for the driver program to pass a parameter to |
ce3f8d0
to
72b6b60
Compare
72b6b60
to
178c8f6
Compare
@tautschnig @peterschrammel @kroening @mgudemann this has been updated with a first shot at the design described above and is ready for review. |
I believe we first need to converge on the discussion in #2031. No need to get the code in that PR merged, but the semantics of
and the proposal in this PR would only apply if the semantics of Edit: if the front-end directly tells the back-end what the semantics of the intermediate representation is then we're violating established compiler design principles. Maybe there are good reasons for doing so, but I'd rather see those reasons first. So let's please sort out the semantics of the above snippet first. |
Interesting. In that case jbmc should only pass this flag (at present) when running in --cover mode (when assertions-to-assumptions means we really can't pass the assertion). Reason for the ugly design: because the general problem "can this pointer possibly be null at this program point?" is quite hard to answer -- in principle the state guard tells us the answer, but extracting it might be difficult when e.g. trying to check whether |
@tautschnig commented at #2031; perhaps we can make assert sane rather than having to restrict the applicability of this patch. |
I guess after all of the ranting it's clear what my opinion / suggestion / preferred solution for this is... |
@tautschnig now the fatal-pointer-checks patch is in, are you happy for this to move forwards? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This must not be merged as is, for two reasons:
- It is unsound: a concurrent update setting a reference to null would be missed.
- The clarified semantics of assert/assume now enable tweaking either symex or (preferred) the value-set analysis to take into account non-nullness constraints. (Most likely it will be a combination of both: symex should update the value-set analysis upon processing an assume.) This can be done soundly, even in presence of concurrency.
We could get reasonable power by adding a new data structure to symex to track expressions that are currently known not null (blanked on any assignment or control flow, added to by assume-not-null statements), but perhaps this is overpowered for the desired result? |
If that's the case, then the assert+assume+access should be placed in an atomic section. I think there is a fair amount of C code that does |
Benchmark: I generated a synthetic benchmark using this script: #!/usr/bin/env python
import random
import sys
if len(sys.argv) != 2 or sys.argv[1] not in ("good", "bad"):
print >>sys.stderr, "Usage: gen.py (good | bad)"
sys.exit(1)
spoil_assumption = sys.argv[1] == "bad"
print "int main(int argc, char **argv) {"
for i in range(100):
print " int local%d = %d;" % (i, i)
vars = ["&local%d" % i for i in range(100)]
choices = 0
def tchoice(l):
n = len(l)
global choices
out = ""
while len(l) > 1:
out += "(argc == %d ? %s : " % (choices, l[-1])
choices += 1
l.pop()
if len(l) == 1:
out += "(%s" % l[0]
out += ")" * n
return out
print " "
for i in range(10000):
naliases = random.randint(1, 5)
aliases = random.sample(vars, naliases) + ["0"]
print " int *ptr%d = %s;" % (i, tchoice(aliases))
print " __CPROVER_assume(ptr%d != 0);" % i
if spoil_assumption:
print " int incidental%d = 0;" % i
print " int deref%d = *ptr%d;" % (i, i)
print " "
print " assert(0);"
print "}" In the "good" case it generates something of the form int local1 = 1, local2 = 2, ...;
int *ptr1 = nondet-choice-of(&local1, &local2, ...., null);
__CPROVER_assume(ptr1 != 0);
int deref1 = *ptr1; and in the bad case it adds an extra Results (Old and New are cbmc before and after this patch; Good and Bad are the scenarios above where the optimisation works and doesn't work respectively):
Size of VCCs (printed, million characters):
In all cases I used the debug build (for conveinence) and |
@smowton Thanks a lot for the detailed benchmarking report! So it seems that in case of C it doesn't make a whole lot of a difference (some improvement, some cost). Hence I'm now curious about results on the benchmarks/problems you had originally looked at that actually lead to you starting this PR? |
39e1812
to
dc27988
Compare
To give a simple example, I added a new "best" case to the script, which always uses alias sets of size 1 + null, and includes some junk code only executed when we're uncertain -- for example, So the situation looks like int x = 0;
int *ptr = nondet(int) ? &x : null);
assume(ptr != null);
int x2 = *ptr;
if(x != x2) { // something expensive } I had to lower the number of ops to 1000 (from 10000) for this version to complete in a reasonable timeframe. Predictably this takes the old CBMC 42s / 1.4Mc of VCCs, and the new one 6s / 0.16Mc of VCCs. So very large gains are possible when improved constant prop results in certainty about a control-flow choice. The most common form of this in Java is a virtual function call (e.g. By itself I suspect this won't deliver a massive improvement on practical Java software, as without #2122 we become uncertain about |
Ok, thanks, so what is your preferred way to proceed? Should this wait until we have all the bits together to demonstrate big gains, or should we try to merge this soon/now? (I'd actually prefer to hold off on such a change until after 5.9 is released, which should be happening really soon -> @kroening). My suggestion: discussion (offline) between @smowton and @kroening. |
Personally I'd like to get this merged as big merges are very difficult. Re: #2122 we need to decide-- is your field-sensitive constant prop stuff going to get in any time soon, and will it be good enough to always resolve a clsid test? If it's too difficult and too performance sensitive to get it in, can we special-case it in a way that's less ugly than the rejected #2122 (which used magic knowledge that |
Discussed with @kroening just now: he is to confer with @tautschnig re: when this goes in; I will follow up with a separate PR to re-use the |
dc27988
to
aa1b249
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passed Diffblue compatibility checks (cbmc commit: aa1b249).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/76979426
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, some nitpicks
|
||
// Shouldn't work due to suspicion *ptr6 might alter ptr7: | ||
int *ptr7 = &x; | ||
__CPROVER_assume(ptr7 != 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird indentation (several occurrences)
#include <util/format_expr.h> | ||
|
||
/// If `expr` is of the form `x != nullptr`, return x. Otherwise return null | ||
static const exprt *get_null_checked_expr(const exprt &expr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use optional
here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael argued for a pointer earlier, so I'm going to stick with that rather than go back and forth :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright
src/analyses/local_safe_pointers.h
Outdated
/// In the interests of a very cheap analysis we only search for very local | ||
/// guards -- at the moment only `if(x != null) { *x }` | ||
/// and `assume(x != null); *x` are handled. Control-flow convergence and | ||
/// possibly-alising operations are handled pessimistically. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aliasing
public: | ||
void operator()(const goto_programt &goto_program); | ||
|
||
bool is_non_null_at_program_point( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this sizeable method intentionally defined in the header file?
-- | ||
^warning: ignoring | ||
-- | ||
The test should fail, as it might dereference a null pointer, but as Java is a safe language we should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: I guess JBMC should report FAILED (not the 'test should fail').
@smowton do you plan on doing a test-gen PR for that? |
ok |
aa1b249
to
ab42137
Compare
Applied all of Peter's comments apart from one response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passed Diffblue compatibility checks (cbmc commit: ab42137).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/77037324
Bump is failing some regression tests. |
This approaches the safe-pointers problem from the opposite direction compared to local- bitvector-analysis -- local-bitvector tries to establish a pointer is safe because it definitely points to known object, whereas local-safe-pointers tries to establish a particular usage is certainly safe because it is trivially dominated by a not-null test, which at present means a conditional GOTO or an ASSUME.
This uses local_safe_pointerst to determine when symex dereferences a pointer that cannot be null. When it does the null result is excluded from the possible values, and therefore a $invalid_object reference may be excluded from the result of dereferencing such a pointer. This can improve constant propagation.
ab42137
to
9fd3434
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passed Diffblue compatibility checks (cbmc commit: 9fd3434).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/77151625
7c1de91 Merge pull request diffblue#2465 from tautschnig/vs-criteria 9480092 Merge pull request diffblue#2437 from tautschnig/vs-empty a025add Merge pull request diffblue#2463 from tautschnig/vs-xref 0f3d345 Remove unused parameter criteria cdb7e52 Merge pull request diffblue#2453 from tautschnig/vs-deprected-uint bb7607d Merge pull request diffblue#2451 from tautschnig/c++-parser a07639d Merge pull request diffblue#2447 from tautschnig/cpp-regression-tests bdac907 Merge pull request diffblue#2459 from tautschnig/cmake-cleanup 261e883 Merge pull request diffblue#2461 from tautschnig/vs-auto 8acbc6c Merge pull request diffblue#2457 from tautschnig/vs-goto-convert bd2547a Merge pull request diffblue#2458 from tautschnig/vs-long-long fbc56db Remove unnecessarily inlined implementations to otherwise empty cpp file 6adcebc Merge pull request diffblue#2456 from tautschnig/vs-reserve a905ac0 Replace lambda by member function reference to silence Visual Studio ce6a297 Use auto to avoid unnecessary signed/unsigned conversion 79ef045 Deprecate get_unsigned_int c528c25 Remove no-longer-existent-files from exclusion lists in CMake files 4c6fc61 Use long-long integer constant as the left-hand side is long long 7dbf60a Remove unused parameters in goto_convert 0482889 Use reserve instead of generating blank strings 5e5e264 Merge pull request diffblue#2441 from tautschnig/vs-concur 8b03ac1 Merge pull request diffblue#2404 from tautschnig/vs-zero-2 18a3070 Merge pull request diffblue#2454 from tautschnig/vs-code-typet 062a886 Merge pull request diffblue#2408 from tautschnig/vs-set-map ccfc4e0 Merge pull request diffblue#2417 from tautschnig/vs-missing-arg 2bf4097 Merge pull request diffblue#2419 from tautschnig/vs-loc-num 7308bf6 Avoid deprecated code_typet() constructor c2a8fb8 Do not use count() when returning a bool 62ec461 Merge pull request diffblue#2360 from smowton/smowton/fix/dont-deref-null-for-class-identifier-v2 5e1f365 C++ parser: actually use parameters dca5b76 Merge pull request diffblue#2413 from tautschnig/vs-cpp-fix ee2421a Test passes 0cf27c6 type_traits requires C++ 11 660ae90 Remove unused parameter from cpp_destructor and fix types 0784f77 Merge pull request diffblue#2125 from smowton/smowton/feature/symex-ignore-null-derefs 001fca0 Merge pull request diffblue#2280 from cesaro/java-concurrency-synchronization 2369df3 Add message handler to remove_instanceof and _exceptions 419bc1b Java instanceof: avoid dereferencing null pointer 9fd3434 Use local-safe-pointers analysis to improve Symex pointer resolution 8078569 Add local-safe-pointers analysis 0062a9c Merge pull request diffblue#2444 from tautschnig/vs-value-set 68ac566 JBMC: Regression tests for synchronized methods c0ee316 JBMC: Support for synchronized methods 7efa7bf JBMC: Regression tests for multi-threaded java programs 4d91aa5 JBMC: Modified the instrumentation of monitorexit/enter instructions 0691f03 JBMC: Zero-initialized 'cproverMonitorCount' component and removed '@lock' field (fixes diffblue#2307) 0b90c17 JBMC: Moved format_classpath.sh to scripts/format_classpath.sh 2a0d2f9 AppVeyor fix: remove existing clones of the java models library. 0c5a497 Merge pull request diffblue#2446 from tautschnig/vs-unlink f4cb6a9 Merge pull request diffblue#2442 from tautschnig/vs-side 5b12b25 Merge pull request diffblue#2445 from tautschnig/vs-simpl 53e35c2 Move side effect out of conditional aab593b Use C++ standard library function instead of POSIX function f5b465d Simplify code to avoid Visual Studio warnings 7621a86 Remove unused parameter in value_set_analysis d50cec0 Remove unused parameter from concurrency_instrumentationt::instrument 475fe20 Merge pull request diffblue#2393 from tautschnig/git-info-cmake-fixes 5cd87c1 Merge pull request diffblue#2436 from tautschnig/vs-zero b4c0e22 Merge pull request diffblue#2410 from tautschnig/vs-deref-type 662d256 Merge pull request diffblue#2431 from tautschnig/vs-with 1aceb6c Merge pull request diffblue#2391 from diffblue/compilation-instructions 7c894e7 Merge pull request diffblue#2402 from tautschnig/vs-no-dummy 822de57 Merge pull request diffblue#2432 from tautschnig/vs-slice a3fe43b Merge pull request diffblue#2427 from tautschnig/vs-read-bin 0ab38e5 Merge pull request diffblue#2433 from tautschnig/vs-is-zero 95396bc Merge pull request diffblue#2438 from tautschnig/vs-partial 551ab81 Merge pull request diffblue#2411 from tautschnig/vs-message-handler 7464904 Merge pull request diffblue#2424 from tautschnig/vs-template 1feaa94 Merge pull request diffblue#2407 from tautschnig/vs-linker 82c7e48 Remove dummy implementations from propt 209d459 Partial-order concurrency: Remove unused parameter 0e3958a Zero-length array warning is C4200 in Visual Studio ed2cf6a Remove unused parameter from float_bvt::is_zero cb54ae5 Remove unused, empty function a5ecfb4 Remove unused parameters in convert_with_* 373636e Remove parameters that read_bin_goto_object_v3 does not use d6bdee6 Merge pull request diffblue#2369 from polgreen/disconnect_unreachable 0498da9 Merge pull request diffblue#2400 from tautschnig/vs-return-type 2eb9156 Merge pull request diffblue#2406 from tautschnig/vs-sizet1 d5ea06f Merge pull request diffblue#2422 from tautschnig/vs-thread-id 609c934 Merge pull request diffblue#2414 from tautschnig/vs-want 9263e43 Merge pull request diffblue#2403 from tautschnig/vs-zero-1 e638f72 CBMC_VERSION: Use generated include files instead of command-line defines 1360f7f CMake refers to Clang on a Mac as AppleClang f27e724 Merge pull request diffblue#2421 from tautschnig/vs-bound ed98fd7 Merge pull request diffblue#2412 from tautschnig/vs-build 16fa26d Merge pull request diffblue#2405 from diffblue/aws-codebuild-jbmc d7dd598 build jbmc on AWS codebuild 001f684 fix compilation instructions 040fd91 Remove unused parameter dereference_type 6ee60cd Do not unnecessarily convert mp_integer to bounded type d605d56 goto-program instruction's location_number is unsigned 9305ebd Remove unused parameter message_handler 4361cc4 goto-instrument/wmm: add missing argument 6256290 Fix return type of nodes_empty 8c2f6e1 Use std::set instead of map as value is never used bd4faad Use std::size_t instead of int to match types at callsites d07d418 Templatize architecture_string to avoid type conversion 9c72f61 Fix type of thread_id to match goto-trace fb512f0 Remove unused paramter in cpp_typecheck_resolvet 2afa919 Explicitly compare int to zero to avoid Visual Studio warning a1be6c8 Remove unused parameter from local_bitvector_analysist::build 4b5677d Bound the number of attempts my_mkstemps tries to compute a file name b8743fa Merge pull request diffblue#2415 from tautschnig/fix-models d7ef0bc Fix coreModels test to match latest java models library 90c56b3 Merge pull request diffblue#2396 from tautschnig/vs-constructor a45e777 Merge pull request diffblue#2397 from tautschnig/vs-indent 38c7889 Merge pull request diffblue#2399 from tautschnig/vs-defined 8324386 Merge pull request diffblue#2401 from tautschnig/vs-make-constant-rec 1c1562e Merge pull request diffblue#2398 from tautschnig/vs-remove-adjust 2bf9de7 Remove empty function make_constant_rec cdb4075 Use defined(...) when testing for defined-ness of macros bef9866 Remove unused function adjust_lhs_object f0ceaf7 expr2c: indent output of code_deadt 576c0a6 Remove constructor declaration that has no implementation 8f6dab8 Merge pull request diffblue#2261 from thk123/bugfix/TG-3652/wrong-generic-type-two-params 262affb Merge pull request diffblue#2350 from thk123/feature/TG-3813/load-specified-methods 122108a Merge pull request diffblue#2387 from diffblue/fix-dist-scripts 1d461c4 version numbers are now followed by git tag 8f93163 increased version number in preparation for release 5.9 8bc9ecf Merge pull request diffblue#2386 from tautschnig/make-version 7b4b8fe Make's file function is only available from 4.2 onwards ca982a5 Merge pull request diffblue#2373 from tautschnig/git-version-output af880a4 Merge pull request diffblue#2382 from tautschnig/missing-dest e149397 Merge pull request diffblue#2384 from tautschnig/quiet-vs 4b124c8 add centered git version to CBMC banner 718e926 Build .release_info files containing the version string 6416372 windowsify compiler options 23455af Print git revision with all --version outputs 7ca835b Added CBMC_VERSION defines to CMake configuration 783fcea Extend lazy-methods-extra-entry-point to support regex methods 1e55404 CI lazy methods take vector of method loaders 2fb9e21 Add utility for checking two vectors are equal without caring about order e1398bc Add tests for generic type erasure 0c874bf Don't have generic type variables as a comment 1f9a5c4 Adding test about the type bd907a4 Diversified the test for multiple fields 6f98511 Merge pull request diffblue#2335 from thk123/bugfix/load-classes-from-opaque-calls ae1535f Do not print version info when linking using Visual Studio 7e7619c Add missing virtual destructor c0dd633 Merge pull request diffblue#2376 from diffblue/gcc_attributes5_KNOWNBUG 3423e44 gcc/clang treat __attribute__((aligned())) differently c509ece Merge pull request diffblue#2379 from tautschnig/fix-bswap e95f59d Don't instantiate abstract types when they are returned from stubs d7d5f63 Disable lazy loading opaque return for symex driven loading 409d892 Don't forcibly instantiate abstract classes a718893 Adding test to ensure methods on return type of opaque function 20645c9 Add classes to needed classes for parameters and returns for opaque calls 9d42bc8 Move method for gathering all instantiated types into ci_lazy_methods_needed b67ae26 Add can_cast_type for symbol_typet 9981fab Correct doxygen documentation 67c056b Unit test for disconnecting unreachable nodes eca3366 Disconnect unreachable nodes in a graph 3126dea Fix bswap_exprt constructor ac02bbc Merge pull request diffblue#2243 from diffblue/no-warning-ptr-array 0ef77c7 Merge pull request diffblue#2377 from diffblue/fix-tests3 b603a63 make test independent of index type 5ca7410 ignore size of arrays on ptr-to-array conversions git-subtree-dir: cbmc git-subtree-split: 7c1de91
In Java (and other safe languages) we know that any path that led to dereferencing
a null pointer would have thrown or asserted beforehand (i.e. all pointer dereferences
are checked). Therefore when dereferencing a pointer with value-set {NULL, someobj}
there is no need to generate (for example)
ptr == &someobj ? someobj.somefield : invalid_object.somefield
, but rather we can simplyproduce
someobj.somefield
.In principle this could also be extended to C programs that have
--pointer-check
enabled,to other safe languages, and to specific pointer accesses that are analysed never-null at a
particular program point.