Cache dereferences during symex to avoid repeatedly inserting large dereferences into SSA #5964

hannes-steffenhagen-diffblue · 2021-03-22T16:36:13Z

Repeated large expressions within the SSA (and the resulting final formula) can be problem for both symex (because we need to e.g. simplify the same/similar large expressions over and over again) as well as for sat solvers. Especially for the latter case the performance impact of making changes to the formula are somewhat unpredictable, but in our testing seemed to average out to an improvement. However, since it can lead to performance regressions in some cases and does change what traces look like a bit (due to the extra assignments) this is opt-in behaviour.

The idea here comes from an interesting example originally observe by Natasha Jeppu and raised as an issue here. The approach doesn't replicate exactly what she did in the issue because we're changing things at the symex stage, but we still produce a noticeable (if not quite as a dramatic) speedup on her example in our testing.

Here's a table with performance measured on one test machine with the default miniSAT (times in seconds). Exact numbers here are going to be hard to reproduce, but manual checking on a variety of machines showed that these are roughly consistent between different machines. Since a lot of the runtime difference for these is in the sat solver swapping that out is expected to produce different results, but preliminary testing suggests that the caching is still often beneficial there. The examples are the proofs from S2N, specifically on commit bb0f4c9b3f7e7ec0192dca9521a37e72190d0d30.

The runtimes are one-shots, but random manual testing doesn't show a lot of run-to-run variation so that should be enough to evaluate a trend; However this does mean that small changes (<5 %) are mostly attributable to noise. As you can see for most examples there isn't a particularly large difference in runtime with most examples, however a couple of examples have dramatic improvements (e.g. s2n_hash_copy or s2n_hmac_free), with a couple of small but also a handful of very notable regressions (e.g. s2n_stuffer_write_base64 or s2n_stuffer_read_expected_str).

Unfortunately I still don't have a great working theory to explain or predict the large performance differences observed. One thing we've tried is limiting the caching to only larger dereference sets (based on expression size), but the results from that were somewhat inconclusive (it does seem to help in some cases but make things worse in others). IMHO this is still interesting enough to ship out behind a flag for people to play around with. I imagine this could be useful for use cases where the same tests are run over and over again in CI (such as is the case in a lot of projects in awslabs), where it ought to be possible to run once with and without the flag and just set it for subsequent runs if it turns out to help.

Proof Name	No Cache	With Cache
s2n_add_overflow	3.58	3.4240239
s2n_align_to	3.58	3.4362915
s2n_alloc	5.70	5.628684
s2n_array_capacity	8.04	9.868183
s2n_array_free	7.90	7.5829077
s2n_array_free_p	7.76	7.617459
s2n_array_get	22.49	22.189644
s2n_array_init	5.96	5.824204
s2n_array_insert	30.55	32.71463
s2n_array_insert_and_copy	28.57	40.96395
s2n_array_new	18.21	18.220312
s2n_array_num_elements	7.76	7.8674545
s2n_array_pushback	20.26	18.236073
s2n_array_remove	11.92	9.943441
s2n_blob_char_to_lower	5.74	7.8930793
s2n_blob_init	5.71	5.829118
s2n_blob_is_growable	5.68	5.8832836
s2n_blob_slice	12.05	16.140965
s2n_blob_zero	5.63	5.8369093
s2n_blob_zeroize_free	7.75	7.9051423
s2n_constant_time_copy_or_dont	3.56	3.7424119
s2n_constant_time_equals	3.57	3.756031
s2n_constant_time_pkcs1_unpad_or_dont	5.65	5.8065968
s2n_dh_compute_shared_secret_as_client	100.77	105.31152
s2n_dh_compute_shared_secret_as_server	12.00	16.161072
s2n_dh_generate_ephemeral_key	5.75	5.7910213
s2n_dh_p_g_Ys_to_dh_params	9.83	9.926799
s2n_dh_params_check	5.71	5.7941995
s2n_dh_params_copy	12.03	9.931996
s2n_dh_params_free	5.67	5.8152165
s2n_dh_params_to_p_g_Ys	356.28	274.63785
s2n_digest_allow_md5_for_fips	5.80	3.4139042
s2n_digest_allow_md5_for_fips_boringssl_awslc	3.57	3.4306116
s2n_digest_is_md5_allowed_for_fips	3.56	3.4356272
s2n_dup	9.87	7.5362425
s2n_free	5.65	5.4805684
s2n_free_object	5.68	5.4907384
s2n_hash_allow_md5_for_fips	22.13	23.967434
s2n_hash_block_size	3.63	3.430097
s2n_hash_const_time_get_currently_in_hash_block	24.27	26.061907
s2n_hash_copy	187.04	75.45381
s2n_hash_digest	102.66	42.551033
s2n_hash_digest_size	3.58	3.4198506
s2n_hash_free	18.02	21.9476
s2n_hash_get_currently_in_hash_total	20.07	24.01466
s2n_hash_hmac_alg	3.56	3.4200382
s2n_hash_init	86.08	36.380283
s2n_hash_is_available	3.57	3.4205186
s2n_hash_is_ready_for_input	14.07	17.852907
s2n_hash_new	22.13	26.078522
s2n_hash_reset	90.29	40.531204
s2n_hash_update	67.50	34.369072
s2n_hex_string_to_bytes	11.86	11.844026
s2n_hmac_copy	503.37	522.30853
s2n_hmac_digest	86.02	110.41693
s2n_hmac_digest_size	3.51	3.4066567
s2n_hmac_digest_two_compression_rounds	86.09	106.258736
s2n_hmac_digest_verify	3.54	3.3816335
s2n_hmac_free	603.86	221.33414
s2n_hmac_hash_alg	3.57	3.5312474
s2n_hmac_hash_block_size	3.59	3.3929143
s2n_hmac_init	253.54	243.994
s2n_hmac_is_available	3.85	3.3850865
s2n_hmac_new	92.55	106.28307
s2n_hmac_reset	102.82	116.60039
s2n_hmac_restore_evp_hash_state	94.78	106.24553
s2n_hmac_save_evp_hash_state	99.04	106.218475
s2n_hmac_update	125.96	122.76804
s2n_hmac_xor_pad_size	3.74	3.4800537
s2n_is_base64_char	4.02	3.3855205
s2n_mem_cleanup	5.86	5.4746475
s2n_mem_init	3.73	3.388369
s2n_mul_overflow_harness	3.82	3.3773787
s2n_pkcs3_to_dh_params	7.90	5.515982
s2n_pkcs3_to_dh_params_openssl_1_1_0	7.96	5.486836
s2n_realloc	26.52	19.877207
s2n_set_add	126.44	65.286964
s2n_set_free	10.25	9.5650425
s2n_set_free_p	10.04	9.566902
s2n_set_get	9.98	7.5025673
s2n_set_len	9.97	23.969208
s2n_set_new	26.87	19.879662
s2n_set_remove	16.18	9.564915
s2n_strcpy	9.93	7.5194793
s2n_stuffer_alloc	7.97	7.5148582
s2n_stuffer_alloc_ro_from_fd	7.92	5.448575
s2n_stuffer_alloc_ro_from_file	7.91	7.507769
s2n_stuffer_alloc_ro_from_string	20.36	15.742065
s2n_stuffer_certificate_from_pem	72.44	61.07199
s2n_stuffer_copy	41.19	42.64672
s2n_stuffer_dhparams_from_pem	65.85	61.032284
s2n_stuffer_erase_and_read	61.69	42.533005
s2n_stuffer_erase_and_read_bytes	78.23	56.944366
s2n_stuffer_extract_blob	26.40	21.931807
s2n_stuffer_free	5.80	5.4527793
s2n_stuffer_growable_alloc	7.87	7.544679
s2n_stuffer_init	5.80	5.4475665
s2n_stuffer_is_consumed	5.80	5.452171
s2n_stuffer_peek_char	14.04	9.588717
s2n_stuffer_peek_check_for_str	92.65	67.30819
s2n_stuffer_private_key_from_pem	2,520.30	1598.5614
s2n_stuffer_raw_read	9.83	7.5156026
s2n_stuffer_raw_write	51.04	30.19543
s2n_stuffer_read	12.02	9.57049
s2n_stuffer_read_base64	121.78	108.66475
s2n_stuffer_read_bytes	22.40	13.671385
s2n_stuffer_read_expected_str	152.90	244.78969
s2n_stuffer_read_line	119.55	120.96212
s2n_stuffer_read_token	65.64	48.842674
s2n_stuffer_read_uint16	11.88	9.580877
s2n_stuffer_read_uint24	11.90	11.629028
s2n_stuffer_read_uint32	13.98	15.720942
s2n_stuffer_read_uint64	17.99	17.849932
s2n_stuffer_read_uint8	9.81	7.485984
s2n_stuffer_recv_from_fd	78.23	71.46028
s2n_stuffer_reread	5.68	5.4947977
s2n_stuffer_reserve	63.61	61.062935
s2n_stuffer_reserve_space	36.73	36.36068
s2n_stuffer_reserve_uint16	82.29	48.729816
s2n_stuffer_reserve_uint24	55.35	65.28461
s2n_stuffer_resize	26.34	19.869013
s2n_stuffer_resize_if_empty	11.88	9.546221
s2n_stuffer_rewind_read	7.76	5.491348
s2n_stuffer_rewrite	5.63	5.4409328
s2n_stuffer_send_to_fd	11.92	9.57044
s2n_stuffer_skip_expected_char	9.86	7.5069346
s2n_stuffer_skip_read	5.72	5.476155
s2n_stuffer_skip_read_until	59.66	63.41011
s2n_stuffer_skip_to_char	9.79	7.5813227
s2n_stuffer_skip_whitespace	1,181.78	786.0508
s2n_stuffer_skip_write	36.39	32.247444
s2n_stuffer_wipe	5.44	5.446298
s2n_stuffer_wipe_n	5.45	5.429211
s2n_stuffer_write	90.02	147.59996
s2n_stuffer_write_base64	455.16	801.07336
s2n_stuffer_write_bytes	161.16	87.93065
s2n_stuffer_write_network_order	117.87	112.78121
s2n_stuffer_write_reservation	68.00	65.242744
s2n_stuffer_write_uint16	51.40	51.21315
s2n_stuffer_write_uint24	74.31	61.687992
s2n_stuffer_write_uint32	59.79	65.83865
s2n_stuffer_write_uint64	63.92	76.24611
s2n_stuffer_write_uint8	51.32	57.53328
s2n_stuffer_write_vector_size	26.51	28.610888
s2n_stuffer_writev_bytes	296.10	272.83923
s2n_sub_overflow	3.73	3.7351184
Total	10,691.31	8847.576776

Each commit message has a non-empty body, explaining why the change was made.
Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
My commit message includes data points confirming performance improvements (if claimed).
My PR is restricted to a single feature or bugfix.
White-space or formatting changes outside the feature-related changed lines are in commits of their own.

codecov · 2021-03-22T18:02:18Z

Codecov Report

Merging #5964 (6eb7c91) into develop (e76d811) will decrease coverage by 1.10%.
The diff coverage is 60.33%.

❗ Current head 6eb7c91 differs from pull request most recent head d554326. Consider uploading reports for the commit d554326 to get more accurate results

@@             Coverage Diff             @@
##           develop    #5964      +/-   ##
===========================================
- Coverage    75.12%   74.01%   -1.11%     
===========================================
  Files         1435     1444       +9     
  Lines       156301   157344    +1043     
===========================================
- Hits        117416   116466     -950     
- Misses       38885    40878    +1993

Impacted Files	Coverage Δ
src/goto-symex/goto_state.h	`100.00% <ø> (ø)`
src/goto-symex/goto_symex.h	`92.30% <ø> (ø)`
src/goto-symex/symex_config.h	`100.00% <ø> (ø)`
src/solvers/smt2/smt2_conv.h	`100.00% <ø> (ø)`
src/solvers/smt2/smt2_conv.cpp	`59.95% <12.76%> (+0.56%)`	⬆️
src/memory-analyzer/gdb_api.cpp	`86.89% <50.00%> (ø)`
src/goto-symex/symex_dereference.cpp	`89.36% <93.02%> (+0.75%)`	⬆️
unit/memory-analyzer/gdb_api.cpp	`86.46% <94.73%> (ø)`
jbmc/src/jbmc/jbmc_parse_options.cpp	`72.38% <100.00%> (+0.11%)`	⬆️
src/cbmc/cbmc_parse_options.cpp	`76.19% <100.00%> (+0.10%)`	⬆️
... and 59 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cee5acf...d554326. Read the comment docs.

martin-cs

This is a very interesting patch and it would be great to have it in. I think it would be good to have some extensive performance evaluation.

hannes-steffenhagen-diffblue · 2021-03-24T00:12:09Z

@martin-cs I added some additional explanation and some performance data we've measured on awslabs/s2n.

NlightNFotis

LGTM

thomasspriggs

LGTM. Could do with a little clean up. But is otherwise great.

src/goto-symex/symex_dereference.cpp

jbmc/regression/jbmc/CMakeLists.txt

src/goto-symex/symex_main.cpp

hannes-steffenhagen-diffblue · 2021-03-24T15:58:47Z

@martin-cs Also follow up with examples from was-c-common. Pretty good example for why I don't want this to be enabled by default; Here we do get a measurably worse result on average, with lots of cases of being slightly slower and a couple where we're way slower with the flag enabled. Also at least one case where we're 30x faster though. Like I said I think there's definitely something here worth exploring further, but this is what we got so far.

name	without_cache	with_cache
aws_add_size_checked	5.34849	5.22711
aws_add_size_saturating	5.36896	5.25828
aws_array_eq	5.25811	5.2794
aws_array_eq_c_str	5.5579	5.61952
aws_array_eq_c_str_ignore_case	7.46955	5.32299
aws_array_eq_ignore_case	7.43995	7.48209
aws_array_list_back	11.6313	11.5017
aws_array_list_capacity	7.45949	7.43154
aws_array_list_clean_up	5.38279	5.30145
aws_array_list_clear	5.39931	5.34076
aws_array_list_comparator_string	7.42372	7.42794
aws_array_list_copy	13.5641	13.6337
aws_array_list_ensure_capacity	11.6082	9.372
aws_array_list_erase	189.822	174.903
aws_array_list_front	7.2506	7.26737
aws_array_list_get_at	9.29149	9.31296
aws_array_list_get_at_ptr	9.28033	11.3623
aws_array_list_init_dynamic	5.19168	5.21431
aws_array_list_init_static	7.21807	7.24981
aws_array_list_length	5.2082	5.21395
aws_array_list_pop_back	17.524	11.3021
aws_array_list_pop_front	21.5319	17.4823
aws_array_list_pop_front_n	21.529	17.582
aws_array_list_push_back	31.7644	25.7144
aws_array_list_set_at	19.4918	19.5355
aws_array_list_shrink_to_fit	25.7512	36.2447
aws_array_list_sort	15.5637	17.4592
aws_array_list_swap	50.2002	50.1996
aws_array_list_swap_contents	7.23739	9.29987
aws_byte_buf_advance	5.25917	5.23849
aws_byte_buf_append	5.22038	5.2529
aws_byte_buf_append_dynamic	94.9888	101.133
aws_byte_buf_append_with_lookup	7.24588	9.53086
aws_byte_buf_cat	46.0269	52.2111
aws_byte_buf_clean_up	5.19226	5.24663
aws_byte_buf_clean_up_secure	7.22614	7.2951
aws_byte_buf_eq	5.18508	7.49421
aws_byte_buf_eq_c_str	5.19344	7.62905
aws_byte_buf_eq_c_str_ignore_case	5.20487	7.58072
aws_byte_buf_eq_ignore_case	5.21332	7.44052
aws_byte_buf_from_array	5.1834	5.45461
aws_byte_buf_from_c_str	5.21701	5.34599
aws_byte_buf_from_empty_array	5.19224	5.35366
aws_byte_buf_init	5.19293	5.36175
aws_byte_buf_init_copy	5.19229	7.37957
aws_byte_buf_init_copy_from_cursor	5.23649	7.5109
aws_byte_buf_reserve	5.18283	5.48973
aws_byte_buf_reserve_relative	5.19317	5.71309
aws_byte_buf_reset	5.24179	7.47118
aws_byte_buf_secure_zero	9.30489	9.52983
aws_byte_buf_write	7.24654	9.5531
aws_byte_buf_write_be16	5.20418	5.78288
aws_byte_buf_write_be32	5.20466	7.56896
aws_byte_buf_write_be64	5.19817	7.44716
aws_byte_buf_write_from_whole_buffer	7.22842	7.36437
aws_byte_buf_write_from_whole_cursor	5.20251	5.69068
aws_byte_buf_write_from_whole_string	9.29137	11.6288
aws_byte_buf_write_u8	5.59758	5.4694
aws_byte_cursor_advance	5.18168	5.45356
aws_byte_cursor_advance_nospec	5.18392	7.4861
aws_byte_cursor_compare_lexical	5.20507	7.58059
aws_byte_cursor_compare_lookup	13.3818	15.8195
aws_byte_cursor_eq	5.1951	5.59166
aws_byte_cursor_eq_byte_buf	5.20728	5.34979
aws_byte_cursor_eq_byte_buf_ignore_case	5.21102	7.52197
aws_byte_cursor_eq_c_str	5.22069	7.89736
aws_byte_cursor_eq_c_str_ignore_case	5.20249	7.83735
aws_byte_cursor_eq_ignore_case	5.23482	7.60314
aws_byte_cursor_from_array	5.19079	5.54902
aws_byte_cursor_from_buf	5.18662	5.64883
aws_byte_cursor_from_c_str	5.18693	5.59193
aws_byte_cursor_from_string	5.17612	7.6852
aws_byte_cursor_left_trim_pred	5.20771	7.60853
aws_byte_cursor_read	7.23296	7.5981
aws_byte_cursor_read_and_fill_buffer	7.22846	9.74565
aws_byte_cursor_read_be16	5.21676	7.7092
aws_byte_cursor_read_be32	5.19823	7.55356
aws_byte_cursor_read_be64	5.30727	5.58793
aws_byte_cursor_read_u8	5.19451	5.44093
aws_byte_cursor_right_trim_pred	5.20742	5.46312
aws_byte_cursor_satisfies_pred	5.19044	5.61226
aws_byte_cursor_trim_pred	5.19462	7.72455
aws_hash_array_ignore_case	5.18306	5.46393
aws_hash_byte_cursor_ptr	7.27182	7.55493
aws_hash_byte_cursor_ptr_ignore_case	5.18218	7.53029
aws_hash_c_string	5.21895	7.6104
aws_hash_callback_c_str_eq	7.39346	11.6311
aws_hash_callback_string_destroy	5.2589	7.52058
aws_hash_callback_string_eq	13.4	21.9474
aws_hash_iter_begin	9.3259	17.9134
aws_hash_iter_delete	7.23526	9.66561
aws_hash_iter_done	7.23189	13.7037
aws_hash_iter_next	25.9305	48.564
aws_hash_ptr	5.2197	5.59584
aws_hash_string	7.28817	9.62245
aws_hash_table_clean_up	11.3085	25.9952
aws_hash_table_clear	31.6853	91.7943
aws_hash_table_create	76.6852	341.398
aws_hash_table_eq	31.7122	132.493
aws_hash_table_find	35.897	77.2255
aws_hash_table_foreach	9.31925	7.25433
aws_hash_table_get_entry_count	5.19806	5.22742
aws_hash_table_init_bounded	5.18347	5.18682
aws_hash_table_init_unbounded	5.20073	5.1831
aws_hash_table_move	5.18372	5.24502
aws_hash_table_put	146.314	1293.85
aws_hash_table_remove	33.8024	66.8177
aws_hash_table_swap	7.21202	7.33582
aws_is_power_of_two	3.12452	3.33873
aws_linked_list_back	9.28402	11.625
aws_linked_list_begin	11.307	13.5844
aws_linked_list_end	11.3261	13.598
aws_linked_list_front	9.30938	11.5395
aws_linked_list_init	5.15763	5.26946
aws_linked_list_insert_after	5.22838	5.28349
aws_linked_list_insert_before	5.16415	5.27562
aws_linked_list_next	5.21718	5.31384
aws_linked_list_node_reset	5.18991	5.22743
aws_linked_list_pop_back	680.422	23.6414
aws_linked_list_pop_front	15.3807	19.7264
aws_linked_list_prev	5.18483	5.24108
aws_linked_list_push_back	9.27033	13.5158
aws_linked_list_push_front	13.4603	17.8151
aws_linked_list_rbegin	11.3156	13.6048
aws_linked_list_remove	5.17802	5.48733
aws_linked_list_rend	9.25677	15.8307
aws_linked_list_swap_contents	19.4857	27.7346
aws_mul_size_checked	5.17988	5.41375
aws_mul_size_saturating	5.17873	5.40323
aws_nospec_mask	5.18333	5.41382
aws_priority_queue_capacity	7.2296	9.52387
aws_priority_queue_clean_up	5.2686	7.46737
aws_priority_queue_init_dynamic	5.2057	7.44086
aws_priority_queue_init_static	5.17449	7.98568
aws_priority_queue_pop	19.4814	24.1706
aws_priority_queue_push	371.641	318.6
aws_priority_queue_push_ref	285.578	310.15
aws_priority_queue_remove	40.0089	37.8816
aws_priority_queue_s_remove_node	1052.03	961.284
aws_priority_queue_s_sift_down	278.424	299.476
aws_priority_queue_s_sift_either	73.1357	81.5027
aws_priority_queue_s_sift_up	31.925	28.0322
aws_priority_queue_s_swap	261.285	315.904
aws_priority_queue_size	9.61211	9.61814
aws_priority_queue_top	11.3043	11.6467
aws_ptr_eq	3.31227	5.4615
aws_ring_buffer_acquire	43.8941	59.0003
aws_ring_buffer_acquire_up_to	41.8664	54.9278
aws_ring_buffer_buf_belongs_to_pool	15.7888	18.1992
aws_ring_buffer_clean_up	5.39169	7.65127
aws_ring_buffer_init	5.33614	5.73926
aws_ring_buffer_release	24.8987	31.5549
aws_round_up_to_power_of_two	3.1745	3.37743
aws_string_bytes	5.2202	5.45192
aws_string_compare	7.37373	10.0681
aws_string_destroy	5.18032	7.69152
aws_string_destroy_secure	5.1666	7.74118
aws_string_eq	5.35412	7.71596
aws_string_eq_byte_buf	7.20432	7.7515
aws_string_eq_byte_buf_ignore_case	7.24293	9.7447
aws_string_eq_byte_cursor	7.20028	7.7513
aws_string_eq_byte_cursor_ignore_case	7.26554	7.71494
aws_string_eq_c_str	5.17485	7.82536
aws_string_eq_c_str_ignore_case	7.40675	7.76278
aws_string_eq_ignore_case	7.23019	7.81457
aws_string_new_from_array	7.22716	9.81863
aws_string_new_from_c_str	5.27922	7.7615
aws_string_new_from_string	9.41183	11.8719
memcpy_using_uint64	21.5526	26.1901
memset_override_0	17.6202	22.1138
memset_using_uint64	27.9043	34.4435
Total	5115.8099081	6335.407116699999

hannes-steffenhagen-diffblue · 2021-03-24T19:44:31Z

@thomasspriggs I'm not that familiar with the jbmc regression tests. Would it be OK to add the changes there in a later PR? Manual checking shows the tests ought to still pass with the flag turned on, just don't have the time to do the edits right now.

thomasspriggs · 2021-03-25T09:42:49Z

@thomasspriggs I'm not that familiar with the jbmc regression tests. Would it be OK to add the changes there in a later PR? Manual checking shows the tests ought to still pass with the flag turned on, just don't have the time to do the edits right now.

Yes. I am fine with that. Which is why I approved.

tautschnig

Mostly looks good to me, but I really would like to understand whether it was a conscious decision to not use sharing_map; if it wasn't, then I'd ask for experiments to be re-run.

Also, could the commit message please not use the "CSE" abbreviation without providing any kind of context?

src/goto-symex/symex_dereference.cpp

src/goto-symex/goto_state.h

src/goto-symex/symex_dereference.cpp

regression/cbmc/dereference-cache-flag/README.md

src/goto-checker/bmc_util.h

tautschnig

Thank you for all the updates!

src/goto-symex/goto_state.h

src/goto-symex/goto_state.cpp

src/goto-symex/symex_config.h

martin-cs

Please can you simplify the option handling / default in the tests.

regression/cbmc/CMakeLists.txt

regression/cbmc/dereference-cache-flag/README.md

regression/cbmc/dereference-cache-flag/test-no-cache.desc

martin-cs · 2021-03-28T17:44:36Z

src/goto-checker/bmc_util.h

@@ -251,7 +252,8 @@ void run_property_decider(
  "                              iteration are allowed to fail due to\n" \
  "                              complexity violations before the loop\n" \
  "                              gets blacklisted\n" \
-  " --graphml-witness filename   write the witness in GraphML format to filename\n" // NOLINT(*)
+  " --graphml-witness filename   write the witness in GraphML format to filename\n" /* NOLINT(*) */ \
+  " --symex-cache-dereferences   enable caching of repeated dereferences" \


It would be good if new options followed the pattern of having OPT_ and HELP_ macros.

src/goto-symex/goto_state.cpp

src/goto-symex/goto_state.h

martin-cs · 2021-03-28T19:24:59Z

I'm not sure I really understand why this patch has an effect. This is bothering me because I wonder if there is actually some kind of bug and this is just covering it rather than resolving it.

As I understand it, this only affects cases where there is a non-trivial expression generated by a dereference on the RHS of two or more expressions. For example if p could point to a or b then:

x = *p;
y = *p;

would normally become:

x = (p = &a) ? a : b;
y = (p = &a) ? a : b;

but with this it would become

cached = (p = &a) ? a : b;
x = cached;
y = cached;

Am I on the right track?

As far as I can see this could affect performance in a number of ways:

Save on rewriting / SSA. I can see that this could have an effect but if it is having the effect that @hannes-steffenhagen-diffblue is reporting at the high-end of the results then we should be doing this more widely and possibly profiling the SSA in those cases because it shouldn't be that slow.
It could change the number of clauses / variables used. I'd love to think that it reduces them but what I can't understand is how the code above doesn't result in a cache hit in the solver's expression cache. If it misses then we should fix that cache. If it hits then I don't see how this could reduce the number of clauses.
It could just change the formula which rerolls the SAT solver "dice" so some will get faster (possibly a lot) and some will get slower.

I do wonder whether this is an improvement of 1 and 2 but because it covers a bug that results in a cache miss in the solver for two dereferences of the same thing. Maybe something is re-SSA'd and so comes out with a new name unnecessarily? Maybe the ordering of the pointer set changes so it is () ? a : () ? : b; and () ? b : () ? : a;? Maybe there is some bug in the rewriter so that rewrites aren't deterministic for this kind of expression?

@hannes-steffenhagen-diffblue can you pull out from your experimental runs the numbers of clauses and variables? That should help narrow down why this works.

TGWDB · 2021-03-29T08:15:35Z

Deleting this due to an error in the runtime data.

TGWDB · 2021-03-29T09:41:32Z

Replying to @martin-cs here.

I'm going to add some comments and details here, apologies if this is incomplete (and please let me know if this i not clear). I'll try to respond inline.

I'm not sure I really understand why this patch has an effect. This is bothering me because I wonder if there is actually some kind of bug and this is just covering it rather than resolving it.

As I understand it, this only affects cases where there is a non-trivial expression generated by a dereference on the RHS of two or more expressions. For example if p could point to a or b then:
x = *p;
y = *p;
would normally become:
x = (p = &a) ? a : b;
y = (p = &a) ? a : b;
but with this it would become
cached = (p = &a) ? a : b;
x = cached;
y = cached;
Am I on the right track?

Yes, but one subtle point here, but the dereference does not have to be in two or more expressions. The following is also possible for:

if(*p == 10 || *p == 11 || *p == 21)

to become

if(((p = &a) ? a : b) == 10 || ((p = &a) ? a : b) == 11 || ((p = &a) ? a : b) == 21)

to be rewritten

cached = ((p = &a) ? a : b);
if(cached == 10 || cached == 11 || cached == 21)

This may be significant to caching and optimisation in other places (see also responses below).

Further, in the example raised here the performance is improved by (effectively) caching by splitting expressions and caching the dereference, which was why caching is also operating within a single expression.

As far as I can see this could affect performance in a number of ways:

Save on rewriting / SSA. I can see that this could have an effect but if it is having the effect that @hannes-steffenhagen-diffblue is reporting at the high-end of the results then we should be doing this more widely and possibly profiling the SSA in those cases because it shouldn't be that slow.

It could change the number of clauses / variables used. I'd love to think that it reduces them but what I can't understand is how the code above doesn't result in a cache hit in the solver's expression cache. If it misses then we should fix that cache. If it hits then I don't see how this could reduce the number of clauses.

I speculate this may be where the subtle difference above is most significant since this allows for caching of sub-expressions (i.e. dereferences). This may have some interaction with the solver expression cache, but it seems like dereferences should be cached consistently and so this should not have too much impact. (The exception would be when expressions are not cached in the solver due to appearing in a write or under quantification scope, but it is not clear to me that these would be be cachable anyway.)

As for the number of clauses/variables, I don't have detailed data (I don't think it was logged, and I don't have the actual run data for some of the runs since they were done by others). However, I can comment that the result of using the dereference cache is to generally increase both the number of clauses and number of variables. (This seems entirely expected to me since as in the examples above there needs to be an extra variable for the dereference_cache variable and also a clause, but the generally improved runtime implies this is a simplification in most cases, or at least changing the form in a way that the SAT solver finds easier. Of course, most observations and guesses about the SAT solver are somewhat speculative.)

It could just change the formula which rerolls the SAT solver "dice" so some will get faster (possibly a lot) and some will get slower.

This does match quite well with observations. That said, as I mentioned above there seems to be a simplification of the expressions and so this would not appear to purely be a reroll of the SAT solver "dice".

I do wonder whether this is an improvement of 1 and 2 but because it covers a bug that results in a cache miss in the solver for two dereferences of the same thing. Maybe something is re-SSA'd and so comes out with a new name unnecessarily? Maybe the ordering of the pointer set changes so it is () ? a : () ? : b; and () ? b : () ? : a;? Maybe there is some bug in the rewriter so that rewrites aren't deterministic for this kind of expression?

I hope I've given some clarity above that would help clarify here. That said, I don't think there is a clear answer either above or in theory - it's difficult to prove the absence of potential bug. I can see reason why even with a bug-free SAT solver expression cache this PR would improve performance, but this does not rule out there being a bug there either.

@hannes-steffenhagen-diffblue can you pull out from your experimental runs the numbers of clauses and variables? That should help narrow down why this works.

I don't think Hannes will be working on this further, I may be able to gather some of this.

martin-cs · 2021-03-29T11:53:36Z

On Mon, 2021-03-29 at 02:41 -0700, TGWDB wrote: I'm going to add some comments and details here, apologies if this is incomplete (and please let me know if this i not clear). I'll try to respond inline.

Thanks for the fast and helpful responses. It hasn't resolved my concerns but it is good to discuss them. I guess fundamentally I think we shouldn't be merging something that improves performance if we don't have evidence of why it works. Especially if it adds complexity to a critical part of the code. I'd like to try to help gain the understanding of why it works. <snip>

> but with this it would become > > ``` > cached = (p = &a) ? a : b; > x = cached; > y = cached; > ``` > > Am I on the right track? Yes, but one subtle point here, but the dereference does not have to be in two or more expressions. The following is also possible for: ``` if(*p == 10 || *p == 11 || *p == 21) ``` to become ``` if(((p = &a) ? a : b) == 10 || ((p = &a) ? a : b) == 11 || ((p = &a) ? a : b) == 21) ``` to be rewritten ``` cached = ((p = &a) ? a : b); if(cached == 10 || cached == 11 || cached == 21) ```

Thanks for the example.

This may be significant to caching and optimisation in other places (see also responses below). Further, in the example raised [here]( #5494) the performance is improved by (effectively) caching by splitting expressions and caching the dereference, which was why caching is also operating within a single expression.

Maybe we should work with this example to try to explore why it helps.

> As far as I can see this could affect performance in a number of > ways: > > 1. Save on rewriting / SSA. I can see that this could have an > effect but if it is having the effect that @hannes-steffenhagen- > diffblue is reporting at the high-end of the results then we should > be doing this more widely and possibly profiling the SSA in those > cases because it shouldn't be that slow. > 2. It could change the number of clauses / variables used. I'd > love to think that it reduces them but what I can't understand is > how the code above doesn't result in a cache hit in the solver's > expression cache. If it misses then we should fix that cache. If > it hits then I don't see how this could reduce the number of > clauses. I *speculate* this may be where the subtle difference above is most significant since this allows for caching of sub-expressions (i.e. dereferences).

I can believe that this could be where it changes things but if that is the case then it is a missed caching / optimisation in the solver and should be fixed there.

This may have some interaction with the solver expression cache, but it seems like dereferences should be cached consistently and so this should not have too much impact. (The exception would be when expressions are not cached in the solver due to appearing in a write or under quantification scope, but it is not clear to me that these would be be cachable anyway.)

Yes, as far as I understand this patch, it does not cache more than what the solver's expression cache handles.

As for the number of clauses/variables, I don't have detailed data (I don't think it was logged, and I don't have the actual run data for some of the runs since they were done by others). However, I can comment that the result of using the dereference cache is to generally increase both the number of clauses and number of variables. (This seems entirely expected to me since as in the examples above there needs to be an extra variable for the dereference_cache variable and also a clause, but the generally improved runtime implies this is a simplification in most cases, or at least changing the form in a way that the SAT solver finds easier. Of course, most observations and guesses about the SAT solver are somewhat speculative.)

Each cached dereference should add sizeof() + 1 variables and 4*sizeof() + 1 variables. If there is a net decrease in the number of variables then it is clear that the solver cache is not working correctly. If there is a slight increase then it is not clear to me that there is simplification going on.

> 3. It could just change the formula which rerolls the SAT solver > "dice" so some will get faster (possibly a lot) and some will get > slower. This does match quite well with observations. That said, as I mentioned above there seems to be a simplification of the expressions and so this would not appear to purely be a reroll of the SAT solver "dice".

I want there to be simplification but I am not sure I see the evidence for it at the moment.

> @hannes-steffenhagen-diffblue can you pull out from your > experimental runs the numbers of clauses and variables? That should > help narrow down why this works. I don't think Hannes will be working on this further, I may be able to gather some of this.

I would appreciate that. I don't have time to do experiments myself but some things that I would try: *. Get @natasha-jeppu 's example working. *. See if there is actual a decrease in clauses or variables generated. *. See how it affects the run-time of various SMT solvers. *. Use --show-vcc and diff to establish what is actually changing. *. Modify the solver cache to log hits and see what is happening there.

TGWDB · 2021-03-30T13:57:32Z

@martin-cs Continuing on this discussion. Note that I've cut parts that are not important to this point of the discussion and responded inline.

On Mon, 2021-03-29 at 02:41 -0700, TGWDB wrote: I'm going to add some comments and details here, apologies if this is incomplete (and please let me know if this i not clear). I'll try to respond inline.
Thanks for the fast and helpful responses. It hasn't resolved my concerns but it is good to discuss them. I guess fundamentally I think we shouldn't be merging something that improves performance if we don't have evidence of why it works. Especially if it adds complexity to a critical part of the code. I'd like to try to help gain the understanding of why it works.

I'll make the counter-argument here for discussion, don't take this as a disagreement with the principle of your position.

The counter-argument is that we should merge something that improves performance when we have evidence that it works and information on why it works. We have several benchmarks that show this improves performance on programs that have dereferences that are (re)used multiple times. We can observe that the SSA is rewritten to reduce repetition of larger expressions and replace them with single variables (the cache symbols). To me it appears fairly clear how and why it works.

The main challenges I see to the above as the following:

The main performance improvements are in the SAT solver after the caching has been implemented. Since we do not control or fully understand how the SAT solver operates we cannot be sure this is a stable and guaranteed performance improvement in all cases. The response to this is that we have done benchmarking to evaluate and made this an option that can be turned on when it is helpful.
These kinds of optimisations should be caught somewhere else and so we should first check that area for bugs before implementing a new performance optimisation. The response here is two fold. Firstly, we don't have good evidence of bugs/problems in the SAT cache and exploring this will take significant resources that may yield nothing. Secondly, why should we only improve performance in one place (and quite late in the pipeline) when an earlier optimisation is effective?

Again, the above are to make some counter arguments for the sake of discussion. I can see validity in both sides.

This may be significant to caching and optimisation in other places (see also responses below). Further, in the example raised here the performance is improved by (effectively) caching by splitting expressions and caching the dereference, which was why caching is also operating within a single expression.
Maybe we should work with this example to try to explore why it helps.

See below.

As far as I can see this could affect performance in a number of > ways: > > 1. Save on rewriting / SSA. I can see that this could have an > effect but if it is having the effect that @hannes-steffenhagen- > diffblue is reporting at the high-end of the results then we should > be doing this more widely and possibly profiling the SSA in those > cases because it shouldn't be that slow. > 2. It could change the number of clauses / variables used. I'd > love to think that it reduces them but what I can't understand is > how the code above doesn't result in a cache hit in the solver's > expression cache. If it misses then we should fix that cache. If > it hits then I don't see how this could reduce the number of > clauses. I speculate this may be where the subtle difference above is most significant since this allows for caching of sub-expressions (i.e. dereferences).
I can believe that this could be where it changes things but if that is the case then it is a missed caching / optimisation in the solver and should be fixed there.

See above.

I would appreciate that. I don't have time to do experiments myself but some things that I would try:

Following this.

*. Get @natasha-jeppu 's example working.

I have a modified version working now (modified since both cbmc and the repository have evolved since the original issue).

*. See if there is actual a decrease in clauses or variables generated.

There is an increase in both. Note that an increase is expected since the dereference caching adds new variables and clauses. The data is as follows

no-opt-develop.log:2908848 variables, 11455830 clauses
no-opt-unstable-no-CSE.log:2908848 variables, 11455830 clauses
no-opt-unstable-with-CSE.log:6095527 variables, 18458258 clauses

Observe that the unstable is the PR version of cbmc and turning on with-CSE is when the caching is turned on. The data for the no-CSE version is also here to show the variables and clauses are the same as develop.

*. See how it affects the run-time of various SMT solvers.

Runtime data is as follows:

no-opt-develop.log:Runtime Symex: 40.2637s
no-opt-develop.log:Runtime Postprocess Equation: 0.0033315s
no-opt-develop.log:Runtime Convert SSA: 1.63727s
no-opt-develop.log:Runtime Post-process: 1.74453s
no-opt-develop.log:Runtime Solver: 8124.48s
no-opt-develop.log:Runtime decision procedure: 8126.15s
no-opt-unstable-with-CSE.log:Runtime Symex: 60.7123s
no-opt-unstable-with-CSE.log:Runtime Postprocess Equation: 0.188855s
no-opt-unstable-with-CSE.log:Runtime Convert SSA: 6.78803s
no-opt-unstable-with-CSE.log:Runtime Post-process: 1.59298s
no-opt-unstable-with-CSE.log:Runtime Solver: 5149.39s
no-opt-unstable-with-CSE.log:Runtime decision procedure: 5156.25s

Here we can observe that there is more time spent in Symex and Convert SSA, however significantly less spent in the Solver and thus the total.

Aside: note that this does NOT come close to the manual optimisation since the manual optimisation changes the control flow of the function. To give a rough understanding of why, the original (no-opt) program has the following structure:

return (*p != 0 && *p != x &&  *(p->a) != 0 && *(p->a) == q->a)

and the manual conversion does (something like) the following

ref_p = *p;
if(ref_p == 0) return false;
if(ref_p == x) return false;
ref_p_a = *(ref_p.a);
if(ref_p_a == 0) return false;
return ref_p_a == q->a;

that not only "caches" the dereferences, but also changes the control flow.

*. Use --show-vcc and diff to establish what is actually changing.

A quick look at the log shows that adding the --show-vcc option creates a log file of ~190GB, so this is not really feasible to explore in depth.

*. Modify the solver cache to log hits and see what is happening there.

Not done (yet?).

chrisr-diffblue · 2021-03-31T10:40:34Z

@martin-cs I agree it would be good to explore some of this a bit more - however practical realities are that we have a limited amount of time available to complete this work, and we think we probably have enough evidence to show that where it works, it works consistently well enough that it is worth adding as a (default disabled) feature. There's definitely more that could be done, but not the budget to do so :-) Given that most of the PR changes are actually in options handling and tests, I don't think this PR adds that much more complexity (albeit, in a somewhat subtle area of code). If its a help to resolve the conversation I'm happy to setup a three-way call ?

TGWDB · 2021-03-31T14:40:41Z

I've done a LOT more experiments and checking of various details related to the runtime performance. There are two main results that are very stable and confident now.

Using CSE has little impact in many places, but can provide a big improvement or a big degradation. This is good evidence for making it available, albeit under a flag to turn on or off.
There is no significantly interesting runtime performance difference between unordered_map and sharing_map. Although the unordered_map is a little faster, this is in the order of 1-2% of the total runtime, and so within the margin for variance of the runs and so not worth making big choices over.

regression/cbmc/dereference-cache-flag/test-no-cache.desc

regression/cbmc/dereference-cache-flag/test-with-cache.desc

regression/cbmc/Makefile

regression/cbmc/double_deref/README

tautschnig · 2021-03-31T15:59:24Z

src/goto-symex/goto_state.h

@@ -38,6 +38,11 @@ class goto_statet
  symex_level2t level2;

 public:
+  /// This is used for eliminating repeated complicated dereferences.
+  /// This is pretty much just a simple wrapper around a map.


This sentence probably isn't needed anymore?

tautschnig · 2021-03-31T16:07:57Z

I've done a LOT more experiments and checking of various details related to the runtime performance. There are two main results that are very stable and confident now.

1. Using CSE has little impact in many places, but can provide a big improvement or a big degradation. This is good evidence for making it available, albeit under a flag to turn on or off.

2. There is no significantly interesting runtime performance difference between unordered_map and sharing_map. Although the unordered_map is a little faster, this is in the order of 1-2% of the total runtime, and so within the margin for variance of the runs and so not worth making big choices over.

Thank you for doing further experiments. Would you mind sharing details about those that exhibit "big degradation?" Just so that someone can pick those up at a later point and investigate in more depth.

In general, I think we might be solving the same problem that #4977 attempted to solve. That PR doesn't have nearly as much data, but has explanations on expressions that make such a change desirable.

tautschnig · 2021-03-31T16:16:12Z

Trying to bring together two threads of discussion on this PR here: the example

return (*p != 0 && *p != x &&  *(p->a) != 0 && *(p->a) == q->a)

suggests that it's indeed double dereferences that are causing trouble, which is what #4977 tried to address. I therefore remain inclined that this is ok to be merged as-is (except for the minor comments I added earlier), and we can likely close #4977 (as @smowton had to leave it to others to address this problem).

@martin-cs: with the background of #4977, would you be happy to go ahead with this?

martin-cs · 2021-04-01T09:29:06Z

I think there may have been some kind of problem with rebasing or merging as there are now a bunch of Z3 specific changes in this PR. I'll have another view when things are a bit more normal.

martin-cs · 2021-04-01T09:30:27Z

If you can squash the fixes into the relevant commits it will make reviewing easier.

TGWDB · 2021-04-01T10:25:58Z

@martin-cs Yes, I made a mess in a rebase. I've pushed now and also tidied up the commit history. The reviewing should be a lot easier and cleaner now.

martin-cs · 2021-04-01T11:28:31Z

@TGWDB thanks for tidying up the PR, I will have a read through everything and another review ASAP.

This makes the dereference caching behaviour configurable

TGWDB · 2021-04-01T14:31:58Z

Thank you for doing further experiments. Would you mind sharing details about those that exhibit "big degradation?" Just so that someone can pick those up at a later point and investigate in more depth.

See a set of runs here. Note that these should be considered a little noisy and so small variances are not likely to be significant. The runs are averages of a couple of invocations and all done on the same system with very limited potential for contention of CPU/memory/disk/etc. Also these are not necessarily good samples to test with, just one I had set up to be able to evaluate.

Test	Develop	CSE with unordered_map	CSE with sharing_map
aws_add_size_checked	5.85	5.66	5.71
aws_add_size_saturating	5.74	5.81	5.68
aws_array_eq	5.80	5.79	5.82
aws_array_eq_c_str	5.82	5.85	5.78
aws_array_eq_c_str_ignore_case	5.84	5.64	5.68
aws_array_eq_ignore_case	5.78	5.80	5.68
aws_array_list_back	20.21	18.12	20.13
aws_array_list_capacity	11.98	9.92	9.86
aws_array_list_clean_up	5.79	5.82	5.65
aws_array_list_clear	5.71	5.58	5.69
aws_array_list_comparator_string	7.65	7.77	7.85
aws_array_list_copy	20.29	36.70	38.78
aws_array_list_ensure_capacity	14.99	18.16	18.19
aws_array_list_erase	334.31	340.43	365.16
aws_array_list_front	9.94	7.87	9.86
aws_array_list_get_at	22.24	34.42	34.62
aws_array_list_get_at_ptr	17.21	15.98	16.14
aws_array_list_init_dynamic	5.69	5.81	5.72
aws_array_list_init_static	7.80	7.87	7.84
aws_array_list_length	6.73	5.76	5.73
aws_array_list_pop_back	13.89	26.45	27.37
aws_array_list_pop_front	30.54	103.53	104.72
aws_array_list_pop_front_n	26.41	24.31	26.26
aws_array_list_push_back	86.01	50.13	53.27
aws_array_list_set_at	40.84	34.67	36.77
aws_array_list_shrink_to_fit	36.79	53.18	55.18
aws_array_list_sort	33.57	24.28	24.35
aws_array_list_swap	78.92	80.94	86.16
aws_array_list_swap_contents	11.89	11.94	12.99
aws_byte_buf_advance	7.80	7.88	7.81
aws_byte_buf_append	5.80	5.73	5.89
aws_byte_buf_append_dynamic	75.88	103.66	108.85
aws_byte_buf_append_with_lookup	7.90	7.84	7.90
aws_byte_buf_cat	76.82	82.02	87.23
aws_byte_buf_clean_up	5.81	5.72	5.75
aws_byte_buf_clean_up_secure	7.84	7.86	7.90
aws_byte_buf_eq	6.71	5.85	5.86
aws_byte_buf_eq_c_str	5.82	5.79	5.74
aws_byte_buf_eq_c_str_ignore_case	5.87	5.77	5.81
aws_byte_buf_eq_ignore_case	5.86	5.84	6.79
aws_byte_buf_from_array	5.76	5.64	5.81
aws_byte_buf_from_c_str	5.79	5.86	5.71
aws_byte_buf_from_empty_array	5.76	5.78	5.79
aws_byte_buf_init	5.73	5.71	5.71
aws_byte_buf_init_copy	6.88	6.81	7.77
aws_byte_buf_init_copy_from_cursor	5.77	5.72	5.70
aws_byte_buf_reserve	5.77	5.73	5.74
aws_byte_buf_reserve_relative	5.79	5.79	5.77
aws_byte_buf_reset	5.85	5.73	5.83
aws_byte_buf_secure_zero	9.86	9.90	9.91
aws_byte_buf_write	11.98	11.86	11.92
aws_byte_buf_write_be16	5.79	5.75	5.71
aws_byte_buf_write_be32	5.64	5.86	5.73
aws_byte_buf_write_be64	5.77	5.74	5.83
aws_byte_buf_write_from_whole_buffer	7.89	7.76	7.84
aws_byte_buf_write_from_whole_cursor	5.85	5.75	5.78
aws_byte_buf_write_from_whole_string	11.81	11.93	12.01
aws_byte_buf_write_u8	5.81	5.83	5.75
aws_byte_cursor_advance	6.69	6.73	6.78
aws_byte_cursor_advance_nospec	7.85	7.92	7.85
aws_byte_cursor_compare_lexical	6.84	5.74	6.84
aws_byte_cursor_compare_lookup	19.21	20.17	20.15
aws_byte_cursor_eq	5.86	5.77	5.77
aws_byte_cursor_eq_byte_buf	5.77	5.79	5.79
aws_byte_cursor_eq_byte_buf_ignore_case	5.69	5.81	5.85
aws_byte_cursor_eq_c_str	5.75	5.86	5.84
aws_byte_cursor_eq_c_str_ignore_case	5.87	5.75	5.81
aws_byte_cursor_eq_ignore_case	5.84	5.75	5.74
aws_byte_cursor_from_array	5.89	5.66	5.67
aws_byte_cursor_from_buf	5.83	5.63	5.83
aws_byte_cursor_from_c_str	5.70	5.75	5.78
aws_byte_cursor_from_string	5.70	5.73	5.81
aws_byte_cursor_left_trim_pred	5.68	5.86	5.85
aws_byte_cursor_read	11.99	10.97	12.00
aws_byte_cursor_read_and_fill_buffer	14.05	14.08	13.92
aws_byte_cursor_read_be16	5.79	5.73	5.76
aws_byte_cursor_read_be32	5.78	5.87	5.89
aws_byte_cursor_read_be64	5.74	5.87	5.84
aws_byte_cursor_read_u8	5.79	5.69	5.75
aws_byte_cursor_right_trim_pred	5.68	5.66	5.79
aws_byte_cursor_satisfies_pred	5.82	5.81	5.75
aws_byte_cursor_trim_pred	7.96	6.84	7.84
aws_hash_array_ignore_case	5.88	5.74	5.80
aws_hash_byte_cursor_ptr	5.86	5.69	5.78
aws_hash_byte_cursor_ptr_ignore_case	5.79	5.74	5.78
aws_hash_c_string	5.71	5.88	5.90
aws_hash_callback_c_str_eq	7.91	8.91	7.69
aws_hash_callback_string_destroy	5.67	5.86	5.69
aws_hash_callback_string_eq	17.18	20.13	20.13
aws_hash_iter_begin	9.95	14.02	14.06
aws_hash_iter_delete	5.82	7.76	7.85
aws_hash_iter_done	9.86	14.99	14.98
aws_hash_iter_next	26.28	55.29	58.40
aws_hash_ptr	5.82	5.80	5.70
aws_hash_string	7.68	7.90	7.78
aws_hash_table_clean_up	11.98	20.23	20.18
aws_hash_table_clear	32.68	67.49	70.71
aws_hash_table_create	86.35	202.75	207.98
aws_hash_table_eq	40.86	120.25	123.29
aws_hash_table_find	42.84	73.87	75.82
aws_hash_table_foreach	9.91	9.86	9.81
aws_hash_table_get_entry_count	5.76	5.89	5.70
aws_hash_table_init_bounded	5.74	5.72	5.76
aws_hash_table_init_unbounded	5.76	5.75	5.80
aws_hash_table_move	5.82	5.71	5.69
aws_hash_table_put	166.76	296.59	303.85
aws_hash_table_remove	34.73	59.53	60.32
aws_hash_table_swap	7.77	7.72	7.89
aws_is_power_of_two	3.54	3.72	3.64
aws_linked_list_back	9.95	9.98	9.86
aws_linked_list_begin	9.86	11.97	11.90
aws_linked_list_end	9.91	12.03	12.07
aws_linked_list_front	9.88	9.77	10.03
aws_linked_list_init	5.76	5.77	5.76
aws_linked_list_insert_after	5.82	5.85	5.82
aws_linked_list_insert_before	5.82	5.79	5.72
aws_linked_list_next	5.71	5.77	5.73
aws_linked_list_node_reset	5.89	5.73	5.76
aws_linked_list_pop_back	896.49	22.30	23.34
aws_linked_list_pop_front	16.05	18.14	18.12
aws_linked_list_prev	5.69	5.79	5.85
aws_linked_list_push_back	9.96	11.96	12.05
aws_linked_list_push_front	14.01	14.07	16.09
aws_linked_list_rbegin	9.88	11.96	11.94
aws_linked_list_remove	5.82	5.71	5.79
aws_linked_list_rend	9.89	11.93	11.88
aws_linked_list_swap_contents	22.22	22.21	22.23
aws_mul_size_checked	5.80	5.68	5.63
aws_mul_size_saturating	5.80	5.68	5.72
aws_nospec_mask	5.77	5.70	5.87
aws_priority_queue_capacity	11.93	7.81	7.86
aws_priority_queue_clean_up	5.73	5.78	5.80
aws_priority_queue_init_dynamic	5.87	5.83	5.88
aws_priority_queue_init_static	7.86	5.82	5.74
aws_priority_queue_pop	61.45	52.19	53.30
aws_priority_queue_push	688.75	5,852.57	5,910.88
aws_priority_queue_push_ref	745.58	803.21	825.88
aws_priority_queue_remove	60.40	73.62	73.51
aws_priority_queue_s_remove_node	1,945.39	1,868.87	1,911.14
aws_priority_queue_s_sift_down	397.25	468.33	475.45
aws_priority_queue_s_sift_either	242.71	255.17	258.20
aws_priority_queue_s_sift_up	40.61	36.63	37.86
aws_priority_queue_s_swap	395.18	431.36	442.50
aws_priority_queue_size	7.92	9.82	9.93
aws_priority_queue_top	30.48	16.10	16.04
aws_ptr_eq	3.74	3.79	3.61
aws_ring_buffer_acquire	60.54	58.38	58.42
aws_ring_buffer_acquire_up_to	72.83	75.23	74.91
aws_ring_buffer_buf_belongs_to_pool	18.50	18.58	18.61
aws_ring_buffer_clean_up	5.82	5.89	5.85
aws_ring_buffer_init	5.71	5.84	5.66
aws_ring_buffer_release	29.04	27.01	28.98
aws_round_up_to_power_of_two	3.71	3.76	3.64
aws_string_bytes	5.74	5.66	5.71
aws_string_compare	7.78	6.76	7.81
aws_string_destroy	5.68	5.70	5.71
aws_string_destroy_secure	5.72	5.69	5.74
aws_string_eq	7.85	5.77	5.72
aws_string_eq_byte_buf	7.80	7.88	7.97
aws_string_eq_byte_buf_ignore_case	6.71	7.85	7.77
aws_string_eq_byte_cursor	7.80	6.76	7.81
aws_string_eq_byte_cursor_ignore_case	7.78	7.84	6.82
aws_string_eq_c_str	5.89	5.77	5.86
aws_string_eq_c_str_ignore_case	5.80	5.79	5.72
aws_string_eq_ignore_case	7.92	6.72	6.78
aws_string_new_from_array	7.87	7.77	7.83
aws_string_new_from_c_str	5.89	5.79	5.77
aws_string_new_from_string	11.89	9.96	9.98
memcpy_using_uint64	27.35	28.40	28.40
memset_override_0	25.36	26.37	26.25
memset_using_uint64	35.63	35.47	36.62
Total	8,092.01	13,047.90	13,289.43

In general, I think we might be solving the same problem that #4977 attempted to solve. That PR doesn't have nearly as much data, but has explanations on expressions that make such a change desirable.

Interesting, thanks for the link.

martin-cs

Thanks for the performance data and your patience. Keeping in mind @chrisr-diffblue 's constraints, I am not going to oppose this. All of the hard-blocking issues around option handling have been fixed. The OPT_ and HELP_ macros would be nice but are a more minor thing.

I remain unconvinced about exactly what is being improved but if it is an option and is low-overhead when off, then, why not. I think @tautschnig 's point about @smowton 's PR #4977 is very interesting and I think that is definitely worth a look if you are looking to address these problems. TBH I would rate #4977 as a higher priority.

TGWDB · 2021-04-06T12:56:30Z

@peterschrammel Any chance of a review from a code owner?

peterschrammel

Another line of further investigation that was mentioned by @hannes-steffenhagen-diffblue: a significant difference between the manual optimisation in #5505 and the present PR is that the latter does not optimise away the pointer checks introduced by goto_check. Potentially, goto_check could be made smarter or goto_symex could skip over pointer checks relating to cached dereferences.

martin-cs · 2021-04-08T12:36:13Z

Another line of further investigation that was mentioned by @hannes- steffenhagen-diffblue: a significant difference between the manual optimisation in #5505 and the present PR is that the latter does not optimise away the pointer checks introduced by goto_check. Potentially, goto_check could be made smarter or goto_symex skips over pointer checks relating to cached dereferences.

@peterschrammel I would love to get to the stage where the overhead of goto-analyze --simplify was low enough that we could run it before symbolic execution to prune "obvious" checks.

hannes-steffenhagen-diffblue changed the title ~~Cse dereference cleanup~~ Cache dereferences during symex to avoid repeatedly inserting large dereferences into SSA Mar 22, 2021

hannes-steffenhagen-diffblue force-pushed the cse-dereference-cleanup branch from 361852a to d810eb8 Compare March 23, 2021 14:01

martin-cs reviewed Mar 23, 2021

View reviewed changes

hannes-steffenhagen-diffblue marked this pull request as ready for review March 24, 2021 00:11

hannes-steffenhagen-diffblue requested review from kroening, peterschrammel, romainbrenguier, smowton and tautschnig as code owners March 24, 2021 00:11

NlightNFotis approved these changes Mar 24, 2021

View reviewed changes

thomasspriggs approved these changes Mar 24, 2021

View reviewed changes

hannes-steffenhagen-diffblue force-pushed the cse-dereference-cleanup branch 3 times, most recently from 6f53774 to f5fd138 Compare March 24, 2021 19:42

tautschnig reviewed Mar 25, 2021

View reviewed changes

tautschnig assigned hannes-steffenhagen-diffblue Mar 25, 2021

TGWDB self-assigned this Mar 26, 2021

tautschnig approved these changes Mar 26, 2021

View reviewed changes

src/goto-symex/goto_state.h Outdated Show resolved Hide resolved

src/goto-symex/goto_state.cpp Outdated Show resolved Hide resolved

src/goto-symex/symex_config.h Show resolved Hide resolved

martin-cs requested changes Mar 28, 2021

View reviewed changes

tautschnig reviewed Mar 31, 2021

View reviewed changes

tautschnig mentioned this pull request Mar 31, 2021

Move exprt::bounded_size to complexity_limitert #5988

Merged

4 tasks

tautschnig assigned martin-cs and unassigned hannes-steffenhagen-diffblue Mar 31, 2021

hannes-steffenhagen-diffblue requested review from allredj, chrisr-diffblue and a team as code owners April 1, 2021 08:31

TGWDB force-pushed the cse-dereference-cleanup branch from 6eb7c91 to ca973e2 Compare April 1, 2021 09:48

Add CSE dereference feature

661f272

TGWDB force-pushed the cse-dereference-cleanup branch from ca973e2 to 467b094 Compare April 1, 2021 10:24

Hannes Steffenhagen and others added 2 commits April 1, 2021 14:57

Add --symex-cache-dereferences flag

7714f3f

This makes the dereference caching behaviour configurable

Use sharing_map instead of unordered_map for cache

d554326

TGWDB force-pushed the cse-dereference-cleanup branch from 467b094 to d554326 Compare April 1, 2021 13:57

martin-cs reviewed Apr 6, 2021

View reviewed changes

martin-cs approved these changes Apr 6, 2021

View reviewed changes

peterschrammel approved these changes Apr 7, 2021

View reviewed changes

TGWDB merged commit 4ecf01d into diffblue:develop Apr 7, 2021

tautschnig mentioned this pull request Apr 16, 2021

Possible performance optimisation: replacing multiple pointer dereferences with temporary variable storing dereferenced pointer #5494

Closed

TGWDB mentioned this pull request May 10, 2021

Rfc sat backend mergesat #6088

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache dereferences during symex to avoid repeatedly inserting large dereferences into SSA #5964

Cache dereferences during symex to avoid repeatedly inserting large dereferences into SSA #5964

hannes-steffenhagen-diffblue commented Mar 22, 2021 •

edited

Loading

codecov bot commented Mar 22, 2021 •

edited

Loading

martin-cs left a comment

hannes-steffenhagen-diffblue commented Mar 24, 2021

NlightNFotis left a comment

thomasspriggs left a comment

hannes-steffenhagen-diffblue commented Mar 24, 2021

hannes-steffenhagen-diffblue commented Mar 24, 2021

thomasspriggs commented Mar 25, 2021

tautschnig left a comment

tautschnig left a comment

martin-cs left a comment

martin-cs Mar 28, 2021

martin-cs commented Mar 28, 2021

TGWDB commented Mar 29, 2021 •

edited

Loading

TGWDB commented Mar 29, 2021 •

edited

Loading

martin-cs commented Mar 29, 2021 via email

TGWDB commented Mar 30, 2021

chrisr-diffblue commented Mar 31, 2021

TGWDB commented Mar 31, 2021

tautschnig Mar 31, 2021

tautschnig commented Mar 31, 2021

tautschnig commented Mar 31, 2021

martin-cs commented Apr 1, 2021

martin-cs commented Apr 1, 2021

TGWDB commented Apr 1, 2021

martin-cs commented Apr 1, 2021

TGWDB commented Apr 1, 2021

martin-cs left a comment

TGWDB commented Apr 6, 2021

peterschrammel left a comment •

edited

Loading

martin-cs commented Apr 8, 2021 via email

Cache dereferences during symex to avoid repeatedly inserting large dereferences into SSA #5964

Cache dereferences during symex to avoid repeatedly inserting large dereferences into SSA #5964

Conversation

hannes-steffenhagen-diffblue commented Mar 22, 2021 • edited Loading

codecov bot commented Mar 22, 2021 • edited Loading

Codecov Report

martin-cs left a comment

Choose a reason for hiding this comment

hannes-steffenhagen-diffblue commented Mar 24, 2021

NlightNFotis left a comment

Choose a reason for hiding this comment

thomasspriggs left a comment

Choose a reason for hiding this comment

hannes-steffenhagen-diffblue commented Mar 24, 2021

hannes-steffenhagen-diffblue commented Mar 24, 2021

thomasspriggs commented Mar 25, 2021

tautschnig left a comment

Choose a reason for hiding this comment

tautschnig left a comment

Choose a reason for hiding this comment

martin-cs left a comment

Choose a reason for hiding this comment

martin-cs Mar 28, 2021

Choose a reason for hiding this comment

martin-cs commented Mar 28, 2021

TGWDB commented Mar 29, 2021 • edited Loading

TGWDB commented Mar 29, 2021 • edited Loading

martin-cs commented Mar 29, 2021 via email

TGWDB commented Mar 30, 2021

chrisr-diffblue commented Mar 31, 2021

TGWDB commented Mar 31, 2021

tautschnig Mar 31, 2021

Choose a reason for hiding this comment

tautschnig commented Mar 31, 2021

tautschnig commented Mar 31, 2021

martin-cs commented Apr 1, 2021

martin-cs commented Apr 1, 2021

TGWDB commented Apr 1, 2021

martin-cs commented Apr 1, 2021

TGWDB commented Apr 1, 2021

martin-cs left a comment

Choose a reason for hiding this comment

TGWDB commented Apr 6, 2021

peterschrammel left a comment • edited Loading

Choose a reason for hiding this comment

martin-cs commented Apr 8, 2021 via email

hannes-steffenhagen-diffblue commented Mar 22, 2021 •

edited

Loading

codecov bot commented Mar 22, 2021 •

edited

Loading

TGWDB commented Mar 29, 2021 •

edited

Loading

TGWDB commented Mar 29, 2021 •

edited

Loading

peterschrammel left a comment •

edited

Loading