Skip to content

Cache dereferences during symex to avoid repeatedly inserting large dereferences into SSA #5964

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

hannes-steffenhagen-diffblue
Copy link
Contributor

@hannes-steffenhagen-diffblue hannes-steffenhagen-diffblue commented Mar 22, 2021

Repeated large expressions within the SSA (and the resulting final formula) can be problem for both symex (because we need to e.g. simplify the same/similar large expressions over and over again) as well as for sat solvers. Especially for the latter case the performance impact of making changes to the formula are somewhat unpredictable, but in our testing seemed to average out to an improvement. However, since it can lead to performance regressions in some cases and does change what traces look like a bit (due to the extra assignments) this is opt-in behaviour.

The idea here comes from an interesting example originally observe by Natasha Jeppu and raised as an issue here. The approach doesn't replicate exactly what she did in the issue because we're changing things at the symex stage, but we still produce a noticeable (if not quite as a dramatic) speedup on her example in our testing.

Here's a table with performance measured on one test machine with the default miniSAT (times in seconds). Exact numbers here are going to be hard to reproduce, but manual checking on a variety of machines showed that these are roughly consistent between different machines. Since a lot of the runtime difference for these is in the sat solver swapping that out is expected to produce different results, but preliminary testing suggests that the caching is still often beneficial there. The examples are the proofs from S2N, specifically on commit bb0f4c9b3f7e7ec0192dca9521a37e72190d0d30.

The runtimes are one-shots, but random manual testing doesn't show a lot of run-to-run variation so that should be enough to evaluate a trend; However this does mean that small changes (<5 %) are mostly attributable to noise. As you can see for most examples there isn't a particularly large difference in runtime with most examples, however a couple of examples have dramatic improvements (e.g. s2n_hash_copy or s2n_hmac_free), with a couple of small but also a handful of very notable regressions (e.g. s2n_stuffer_write_base64 or s2n_stuffer_read_expected_str).

Unfortunately I still don't have a great working theory to explain or predict the large performance differences observed. One thing we've tried is limiting the caching to only larger dereference sets (based on expression size), but the results from that were somewhat inconclusive (it does seem to help in some cases but make things worse in others). IMHO this is still interesting enough to ship out behind a flag for people to play around with. I imagine this could be useful for use cases where the same tests are run over and over again in CI (such as is the case in a lot of projects in awslabs), where it ought to be possible to run once with and without the flag and just set it for subsequent runs if it turns out to help.

Proof Name No Cache With Cache
s2n_add_overflow 3.58 3.4240239
s2n_align_to 3.58 3.4362915
s2n_alloc 5.70 5.628684
s2n_array_capacity 8.04 9.868183
s2n_array_free 7.90 7.5829077
s2n_array_free_p 7.76 7.617459
s2n_array_get 22.49 22.189644
s2n_array_init 5.96 5.824204
s2n_array_insert 30.55 32.71463
s2n_array_insert_and_copy 28.57 40.96395
s2n_array_new 18.21 18.220312
s2n_array_num_elements 7.76 7.8674545
s2n_array_pushback 20.26 18.236073
s2n_array_remove 11.92 9.943441
s2n_blob_char_to_lower 5.74 7.8930793
s2n_blob_init 5.71 5.829118
s2n_blob_is_growable 5.68 5.8832836
s2n_blob_slice 12.05 16.140965
s2n_blob_zero 5.63 5.8369093
s2n_blob_zeroize_free 7.75 7.9051423
s2n_constant_time_copy_or_dont 3.56 3.7424119
s2n_constant_time_equals 3.57 3.756031
s2n_constant_time_pkcs1_unpad_or_dont 5.65 5.8065968
s2n_dh_compute_shared_secret_as_client 100.77 105.31152
s2n_dh_compute_shared_secret_as_server 12.00 16.161072
s2n_dh_generate_ephemeral_key 5.75 5.7910213
s2n_dh_p_g_Ys_to_dh_params 9.83 9.926799
s2n_dh_params_check 5.71 5.7941995
s2n_dh_params_copy 12.03 9.931996
s2n_dh_params_free 5.67 5.8152165
s2n_dh_params_to_p_g_Ys 356.28 274.63785
s2n_digest_allow_md5_for_fips 5.80 3.4139042
s2n_digest_allow_md5_for_fips_boringssl_awslc 3.57 3.4306116
s2n_digest_is_md5_allowed_for_fips 3.56 3.4356272
s2n_dup 9.87 7.5362425
s2n_free 5.65 5.4805684
s2n_free_object 5.68 5.4907384
s2n_hash_allow_md5_for_fips 22.13 23.967434
s2n_hash_block_size 3.63 3.430097
s2n_hash_const_time_get_currently_in_hash_block 24.27 26.061907
s2n_hash_copy 187.04 75.45381
s2n_hash_digest 102.66 42.551033
s2n_hash_digest_size 3.58 3.4198506
s2n_hash_free 18.02 21.9476
s2n_hash_get_currently_in_hash_total 20.07 24.01466
s2n_hash_hmac_alg 3.56 3.4200382
s2n_hash_init 86.08 36.380283
s2n_hash_is_available 3.57 3.4205186
s2n_hash_is_ready_for_input 14.07 17.852907
s2n_hash_new 22.13 26.078522
s2n_hash_reset 90.29 40.531204
s2n_hash_update 67.50 34.369072
s2n_hex_string_to_bytes 11.86 11.844026
s2n_hmac_copy 503.37 522.30853
s2n_hmac_digest 86.02 110.41693
s2n_hmac_digest_size 3.51 3.4066567
s2n_hmac_digest_two_compression_rounds 86.09 106.258736
s2n_hmac_digest_verify 3.54 3.3816335
s2n_hmac_free 603.86 221.33414
s2n_hmac_hash_alg 3.57 3.5312474
s2n_hmac_hash_block_size 3.59 3.3929143
s2n_hmac_init 253.54 243.994
s2n_hmac_is_available 3.85 3.3850865
s2n_hmac_new 92.55 106.28307
s2n_hmac_reset 102.82 116.60039
s2n_hmac_restore_evp_hash_state 94.78 106.24553
s2n_hmac_save_evp_hash_state 99.04 106.218475
s2n_hmac_update 125.96 122.76804
s2n_hmac_xor_pad_size 3.74 3.4800537
s2n_is_base64_char 4.02 3.3855205
s2n_mem_cleanup 5.86 5.4746475
s2n_mem_init 3.73 3.388369
s2n_mul_overflow_harness 3.82 3.3773787
s2n_pkcs3_to_dh_params 7.90 5.515982
s2n_pkcs3_to_dh_params_openssl_1_1_0 7.96 5.486836
s2n_realloc 26.52 19.877207
s2n_set_add 126.44 65.286964
s2n_set_free 10.25 9.5650425
s2n_set_free_p 10.04 9.566902
s2n_set_get 9.98 7.5025673
s2n_set_len 9.97 23.969208
s2n_set_new 26.87 19.879662
s2n_set_remove 16.18 9.564915
s2n_strcpy 9.93 7.5194793
s2n_stuffer_alloc 7.97 7.5148582
s2n_stuffer_alloc_ro_from_fd 7.92 5.448575
s2n_stuffer_alloc_ro_from_file 7.91 7.507769
s2n_stuffer_alloc_ro_from_string 20.36 15.742065
s2n_stuffer_certificate_from_pem 72.44 61.07199
s2n_stuffer_copy 41.19 42.64672
s2n_stuffer_dhparams_from_pem 65.85 61.032284
s2n_stuffer_erase_and_read 61.69 42.533005
s2n_stuffer_erase_and_read_bytes 78.23 56.944366
s2n_stuffer_extract_blob 26.40 21.931807
s2n_stuffer_free 5.80 5.4527793
s2n_stuffer_growable_alloc 7.87 7.544679
s2n_stuffer_init 5.80 5.4475665
s2n_stuffer_is_consumed 5.80 5.452171
s2n_stuffer_peek_char 14.04 9.588717
s2n_stuffer_peek_check_for_str 92.65 67.30819
s2n_stuffer_private_key_from_pem 2,520.30 1598.5614
s2n_stuffer_raw_read 9.83 7.5156026
s2n_stuffer_raw_write 51.04 30.19543
s2n_stuffer_read 12.02 9.57049
s2n_stuffer_read_base64 121.78 108.66475
s2n_stuffer_read_bytes 22.40 13.671385
s2n_stuffer_read_expected_str 152.90 244.78969
s2n_stuffer_read_line 119.55 120.96212
s2n_stuffer_read_token 65.64 48.842674
s2n_stuffer_read_uint16 11.88 9.580877
s2n_stuffer_read_uint24 11.90 11.629028
s2n_stuffer_read_uint32 13.98 15.720942
s2n_stuffer_read_uint64 17.99 17.849932
s2n_stuffer_read_uint8 9.81 7.485984
s2n_stuffer_recv_from_fd 78.23 71.46028
s2n_stuffer_reread 5.68 5.4947977
s2n_stuffer_reserve 63.61 61.062935
s2n_stuffer_reserve_space 36.73 36.36068
s2n_stuffer_reserve_uint16 82.29 48.729816
s2n_stuffer_reserve_uint24 55.35 65.28461
s2n_stuffer_resize 26.34 19.869013
s2n_stuffer_resize_if_empty 11.88 9.546221
s2n_stuffer_rewind_read 7.76 5.491348
s2n_stuffer_rewrite 5.63 5.4409328
s2n_stuffer_send_to_fd 11.92 9.57044
s2n_stuffer_skip_expected_char 9.86 7.5069346
s2n_stuffer_skip_read 5.72 5.476155
s2n_stuffer_skip_read_until 59.66 63.41011
s2n_stuffer_skip_to_char 9.79 7.5813227
s2n_stuffer_skip_whitespace 1,181.78 786.0508
s2n_stuffer_skip_write 36.39 32.247444
s2n_stuffer_wipe 5.44 5.446298
s2n_stuffer_wipe_n 5.45 5.429211
s2n_stuffer_write 90.02 147.59996
s2n_stuffer_write_base64 455.16 801.07336
s2n_stuffer_write_bytes 161.16 87.93065
s2n_stuffer_write_network_order 117.87 112.78121
s2n_stuffer_write_reservation 68.00 65.242744
s2n_stuffer_write_uint16 51.40 51.21315
s2n_stuffer_write_uint24 74.31 61.687992
s2n_stuffer_write_uint32 59.79 65.83865
s2n_stuffer_write_uint64 63.92 76.24611
s2n_stuffer_write_uint8 51.32 57.53328
s2n_stuffer_write_vector_size 26.51 28.610888
s2n_stuffer_writev_bytes 296.10 272.83923
s2n_sub_overflow 3.73 3.7351184
Total 10,691.31 8847.576776
  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@hannes-steffenhagen-diffblue hannes-steffenhagen-diffblue changed the title Cse dereference cleanup Cache dereferences during symex to avoid repeatedly inserting large dereferences into SSA Mar 22, 2021
@codecov
Copy link

codecov bot commented Mar 22, 2021

Codecov Report

Merging #5964 (6eb7c91) into develop (e76d811) will decrease coverage by 1.10%.
The diff coverage is 60.33%.

❗ Current head 6eb7c91 differs from pull request most recent head d554326. Consider uploading reports for the commit d554326 to get more accurate results
Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #5964      +/-   ##
===========================================
- Coverage    75.12%   74.01%   -1.11%     
===========================================
  Files         1435     1444       +9     
  Lines       156301   157344    +1043     
===========================================
- Hits        117416   116466     -950     
- Misses       38885    40878    +1993     
Impacted Files Coverage Δ
src/goto-symex/goto_state.h 100.00% <ø> (ø)
src/goto-symex/goto_symex.h 92.30% <ø> (ø)
src/goto-symex/symex_config.h 100.00% <ø> (ø)
src/solvers/smt2/smt2_conv.h 100.00% <ø> (ø)
src/solvers/smt2/smt2_conv.cpp 59.95% <12.76%> (+0.56%) ⬆️
src/memory-analyzer/gdb_api.cpp 86.89% <50.00%> (ø)
src/goto-symex/symex_dereference.cpp 89.36% <93.02%> (+0.75%) ⬆️
unit/memory-analyzer/gdb_api.cpp 86.46% <94.73%> (ø)
jbmc/src/jbmc/jbmc_parse_options.cpp 72.38% <100.00%> (+0.11%) ⬆️
src/cbmc/cbmc_parse_options.cpp 76.19% <100.00%> (+0.10%) ⬆️
... and 59 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cee5acf...d554326. Read the comment docs.

Copy link
Collaborator

@martin-cs martin-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very interesting patch and it would be great to have it in. I think it would be good to have some extensive performance evaluation.

@hannes-steffenhagen-diffblue
Copy link
Contributor Author

@martin-cs I added some additional explanation and some performance data we've measured on awslabs/s2n.

Copy link
Contributor

@NlightNFotis NlightNFotis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@thomasspriggs thomasspriggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Could do with a little clean up. But is otherwise great.

@hannes-steffenhagen-diffblue
Copy link
Contributor Author

@martin-cs Also follow up with examples from was-c-common. Pretty good example for why I don't want this to be enabled by default; Here we do get a measurably worse result on average, with lots of cases of being slightly slower and a couple where we're way slower with the flag enabled. Also at least one case where we're 30x faster though. Like I said I think there's definitely something here worth exploring further, but this is what we got so far.

name without_cache with_cache
aws_add_size_checked 5.34849 5.22711
aws_add_size_saturating 5.36896 5.25828
aws_array_eq 5.25811 5.2794
aws_array_eq_c_str 5.5579 5.61952
aws_array_eq_c_str_ignore_case 7.46955 5.32299
aws_array_eq_ignore_case 7.43995 7.48209
aws_array_list_back 11.6313 11.5017
aws_array_list_capacity 7.45949 7.43154
aws_array_list_clean_up 5.38279 5.30145
aws_array_list_clear 5.39931 5.34076
aws_array_list_comparator_string 7.42372 7.42794
aws_array_list_copy 13.5641 13.6337
aws_array_list_ensure_capacity 11.6082 9.372
aws_array_list_erase 189.822 174.903
aws_array_list_front 7.2506 7.26737
aws_array_list_get_at 9.29149 9.31296
aws_array_list_get_at_ptr 9.28033 11.3623
aws_array_list_init_dynamic 5.19168 5.21431
aws_array_list_init_static 7.21807 7.24981
aws_array_list_length 5.2082 5.21395
aws_array_list_pop_back 17.524 11.3021
aws_array_list_pop_front 21.5319 17.4823
aws_array_list_pop_front_n 21.529 17.582
aws_array_list_push_back 31.7644 25.7144
aws_array_list_set_at 19.4918 19.5355
aws_array_list_shrink_to_fit 25.7512 36.2447
aws_array_list_sort 15.5637 17.4592
aws_array_list_swap 50.2002 50.1996
aws_array_list_swap_contents 7.23739 9.29987
aws_byte_buf_advance 5.25917 5.23849
aws_byte_buf_append 5.22038 5.2529
aws_byte_buf_append_dynamic 94.9888 101.133
aws_byte_buf_append_with_lookup 7.24588 9.53086
aws_byte_buf_cat 46.0269 52.2111
aws_byte_buf_clean_up 5.19226 5.24663
aws_byte_buf_clean_up_secure 7.22614 7.2951
aws_byte_buf_eq 5.18508 7.49421
aws_byte_buf_eq_c_str 5.19344 7.62905
aws_byte_buf_eq_c_str_ignore_case 5.20487 7.58072
aws_byte_buf_eq_ignore_case 5.21332 7.44052
aws_byte_buf_from_array 5.1834 5.45461
aws_byte_buf_from_c_str 5.21701 5.34599
aws_byte_buf_from_empty_array 5.19224 5.35366
aws_byte_buf_init 5.19293 5.36175
aws_byte_buf_init_copy 5.19229 7.37957
aws_byte_buf_init_copy_from_cursor 5.23649 7.5109
aws_byte_buf_reserve 5.18283 5.48973
aws_byte_buf_reserve_relative 5.19317 5.71309
aws_byte_buf_reset 5.24179 7.47118
aws_byte_buf_secure_zero 9.30489 9.52983
aws_byte_buf_write 7.24654 9.5531
aws_byte_buf_write_be16 5.20418 5.78288
aws_byte_buf_write_be32 5.20466 7.56896
aws_byte_buf_write_be64 5.19817 7.44716
aws_byte_buf_write_from_whole_buffer 7.22842 7.36437
aws_byte_buf_write_from_whole_cursor 5.20251 5.69068
aws_byte_buf_write_from_whole_string 9.29137 11.6288
aws_byte_buf_write_u8 5.59758 5.4694
aws_byte_cursor_advance 5.18168 5.45356
aws_byte_cursor_advance_nospec 5.18392 7.4861
aws_byte_cursor_compare_lexical 5.20507 7.58059
aws_byte_cursor_compare_lookup 13.3818 15.8195
aws_byte_cursor_eq 5.1951 5.59166
aws_byte_cursor_eq_byte_buf 5.20728 5.34979
aws_byte_cursor_eq_byte_buf_ignore_case 5.21102 7.52197
aws_byte_cursor_eq_c_str 5.22069 7.89736
aws_byte_cursor_eq_c_str_ignore_case 5.20249 7.83735
aws_byte_cursor_eq_ignore_case 5.23482 7.60314
aws_byte_cursor_from_array 5.19079 5.54902
aws_byte_cursor_from_buf 5.18662 5.64883
aws_byte_cursor_from_c_str 5.18693 5.59193
aws_byte_cursor_from_string 5.17612 7.6852
aws_byte_cursor_left_trim_pred 5.20771 7.60853
aws_byte_cursor_read 7.23296 7.5981
aws_byte_cursor_read_and_fill_buffer 7.22846 9.74565
aws_byte_cursor_read_be16 5.21676 7.7092
aws_byte_cursor_read_be32 5.19823 7.55356
aws_byte_cursor_read_be64 5.30727 5.58793
aws_byte_cursor_read_u8 5.19451 5.44093
aws_byte_cursor_right_trim_pred 5.20742 5.46312
aws_byte_cursor_satisfies_pred 5.19044 5.61226
aws_byte_cursor_trim_pred 5.19462 7.72455
aws_hash_array_ignore_case 5.18306 5.46393
aws_hash_byte_cursor_ptr 7.27182 7.55493
aws_hash_byte_cursor_ptr_ignore_case 5.18218 7.53029
aws_hash_c_string 5.21895 7.6104
aws_hash_callback_c_str_eq 7.39346 11.6311
aws_hash_callback_string_destroy 5.2589 7.52058
aws_hash_callback_string_eq 13.4 21.9474
aws_hash_iter_begin 9.3259 17.9134
aws_hash_iter_delete 7.23526 9.66561
aws_hash_iter_done 7.23189 13.7037
aws_hash_iter_next 25.9305 48.564
aws_hash_ptr 5.2197 5.59584
aws_hash_string 7.28817 9.62245
aws_hash_table_clean_up 11.3085 25.9952
aws_hash_table_clear 31.6853 91.7943
aws_hash_table_create 76.6852 341.398
aws_hash_table_eq 31.7122 132.493
aws_hash_table_find 35.897 77.2255
aws_hash_table_foreach 9.31925 7.25433
aws_hash_table_get_entry_count 5.19806 5.22742
aws_hash_table_init_bounded 5.18347 5.18682
aws_hash_table_init_unbounded 5.20073 5.1831
aws_hash_table_move 5.18372 5.24502
aws_hash_table_put 146.314 1293.85
aws_hash_table_remove 33.8024 66.8177
aws_hash_table_swap 7.21202 7.33582
aws_is_power_of_two 3.12452 3.33873
aws_linked_list_back 9.28402 11.625
aws_linked_list_begin 11.307 13.5844
aws_linked_list_end 11.3261 13.598
aws_linked_list_front 9.30938 11.5395
aws_linked_list_init 5.15763 5.26946
aws_linked_list_insert_after 5.22838 5.28349
aws_linked_list_insert_before 5.16415 5.27562
aws_linked_list_next 5.21718 5.31384
aws_linked_list_node_reset 5.18991 5.22743
aws_linked_list_pop_back 680.422 23.6414
aws_linked_list_pop_front 15.3807 19.7264
aws_linked_list_prev 5.18483 5.24108
aws_linked_list_push_back 9.27033 13.5158
aws_linked_list_push_front 13.4603 17.8151
aws_linked_list_rbegin 11.3156 13.6048
aws_linked_list_remove 5.17802 5.48733
aws_linked_list_rend 9.25677 15.8307
aws_linked_list_swap_contents 19.4857 27.7346
aws_mul_size_checked 5.17988 5.41375
aws_mul_size_saturating 5.17873 5.40323
aws_nospec_mask 5.18333 5.41382
aws_priority_queue_capacity 7.2296 9.52387
aws_priority_queue_clean_up 5.2686 7.46737
aws_priority_queue_init_dynamic 5.2057 7.44086
aws_priority_queue_init_static 5.17449 7.98568
aws_priority_queue_pop 19.4814 24.1706
aws_priority_queue_push 371.641 318.6
aws_priority_queue_push_ref 285.578 310.15
aws_priority_queue_remove 40.0089 37.8816
aws_priority_queue_s_remove_node 1052.03 961.284
aws_priority_queue_s_sift_down 278.424 299.476
aws_priority_queue_s_sift_either 73.1357 81.5027
aws_priority_queue_s_sift_up 31.925 28.0322
aws_priority_queue_s_swap 261.285 315.904
aws_priority_queue_size 9.61211 9.61814
aws_priority_queue_top 11.3043 11.6467
aws_ptr_eq 3.31227 5.4615
aws_ring_buffer_acquire 43.8941 59.0003
aws_ring_buffer_acquire_up_to 41.8664 54.9278
aws_ring_buffer_buf_belongs_to_pool 15.7888 18.1992
aws_ring_buffer_clean_up 5.39169 7.65127
aws_ring_buffer_init 5.33614 5.73926
aws_ring_buffer_release 24.8987 31.5549
aws_round_up_to_power_of_two 3.1745 3.37743
aws_string_bytes 5.2202 5.45192
aws_string_compare 7.37373 10.0681
aws_string_destroy 5.18032 7.69152
aws_string_destroy_secure 5.1666 7.74118
aws_string_eq 5.35412 7.71596
aws_string_eq_byte_buf 7.20432 7.7515
aws_string_eq_byte_buf_ignore_case 7.24293 9.7447
aws_string_eq_byte_cursor 7.20028 7.7513
aws_string_eq_byte_cursor_ignore_case 7.26554 7.71494
aws_string_eq_c_str 5.17485 7.82536
aws_string_eq_c_str_ignore_case 7.40675 7.76278
aws_string_eq_ignore_case 7.23019 7.81457
aws_string_new_from_array 7.22716 9.81863
aws_string_new_from_c_str 5.27922 7.7615
aws_string_new_from_string 9.41183 11.8719
memcpy_using_uint64 21.5526 26.1901
memset_override_0 17.6202 22.1138
memset_using_uint64 27.9043 34.4435
Total 5115.8099081 6335.407116699999

@hannes-steffenhagen-diffblue hannes-steffenhagen-diffblue force-pushed the cse-dereference-cleanup branch 3 times, most recently from 6f53774 to f5fd138 Compare March 24, 2021 19:42
@hannes-steffenhagen-diffblue
Copy link
Contributor Author

@thomasspriggs I'm not that familiar with the jbmc regression tests. Would it be OK to add the changes there in a later PR? Manual checking shows the tests ought to still pass with the flag turned on, just don't have the time to do the edits right now.

@thomasspriggs
Copy link
Contributor

@thomasspriggs I'm not that familiar with the jbmc regression tests. Would it be OK to add the changes there in a later PR? Manual checking shows the tests ought to still pass with the flag turned on, just don't have the time to do the edits right now.

Yes. I am fine with that. Which is why I approved.

Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good to me, but I really would like to understand whether it was a conscious decision to not use sharing_map; if it wasn't, then I'd ask for experiments to be re-run.

Also, could the commit message please not use the "CSE" abbreviation without providing any kind of context?

Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all the updates!

Copy link
Collaborator

@martin-cs martin-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can you simplify the option handling / default in the tests.

@@ -251,7 +252,8 @@ void run_property_decider(
" iteration are allowed to fail due to\n" \
" complexity violations before the loop\n" \
" gets blacklisted\n" \
" --graphml-witness filename write the witness in GraphML format to filename\n" // NOLINT(*)
" --graphml-witness filename write the witness in GraphML format to filename\n" /* NOLINT(*) */ \
" --symex-cache-dereferences enable caching of repeated dereferences" \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good if new options followed the pattern of having OPT_ and HELP_ macros.

@martin-cs
Copy link
Collaborator

I'm not sure I really understand why this patch has an effect. This is bothering me because I wonder if there is actually some kind of bug and this is just covering it rather than resolving it.

As I understand it, this only affects cases where there is a non-trivial expression generated by a dereference on the RHS of two or more expressions. For example if p could point to a or b then:

x = *p;
y = *p;

would normally become:

x = (p = &a) ? a : b;
y = (p = &a) ? a : b;

but with this it would become

cached = (p = &a) ? a : b;
x = cached;
y = cached;

Am I on the right track?

As far as I can see this could affect performance in a number of ways:

  1. Save on rewriting / SSA. I can see that this could have an effect but if it is having the effect that @hannes-steffenhagen-diffblue is reporting at the high-end of the results then we should be doing this more widely and possibly profiling the SSA in those cases because it shouldn't be that slow.
  2. It could change the number of clauses / variables used. I'd love to think that it reduces them but what I can't understand is how the code above doesn't result in a cache hit in the solver's expression cache. If it misses then we should fix that cache. If it hits then I don't see how this could reduce the number of clauses.
  3. It could just change the formula which rerolls the SAT solver "dice" so some will get faster (possibly a lot) and some will get slower.

I do wonder whether this is an improvement of 1 and 2 but because it covers a bug that results in a cache miss in the solver for two dereferences of the same thing. Maybe something is re-SSA'd and so comes out with a new name unnecessarily? Maybe the ordering of the pointer set changes so it is () ? a : () ? : b; and () ? b : () ? : a;? Maybe there is some bug in the rewriter so that rewrites aren't deterministic for this kind of expression?

@hannes-steffenhagen-diffblue can you pull out from your experimental runs the numbers of clauses and variables? That should help narrow down why this works.

@TGWDB
Copy link
Contributor

TGWDB commented Mar 29, 2021

Deleting this due to an error in the runtime data.

@TGWDB
Copy link
Contributor

TGWDB commented Mar 29, 2021

Replying to @martin-cs here.

I'm going to add some comments and details here, apologies if this is incomplete (and please let me know if this i not clear). I'll try to respond inline.

I'm not sure I really understand why this patch has an effect. This is bothering me because I wonder if there is actually some kind of bug and this is just covering it rather than resolving it.

As I understand it, this only affects cases where there is a non-trivial expression generated by a dereference on the RHS of two or more expressions. For example if p could point to a or b then:

x = *p;
y = *p;

would normally become:

x = (p = &a) ? a : b;
y = (p = &a) ? a : b;

but with this it would become

cached = (p = &a) ? a : b;
x = cached;
y = cached;

Am I on the right track?

Yes, but one subtle point here, but the dereference does not have to be in two or more expressions. The following is also possible for:

if(*p == 10 || *p == 11 || *p == 21)

to become

if(((p = &a) ? a : b) == 10 || ((p = &a) ? a : b) == 11 || ((p = &a) ? a : b) == 21)

to be rewritten

cached = ((p = &a) ? a : b);
if(cached == 10 || cached == 11 || cached == 21)

This may be significant to caching and optimisation in other places (see also responses below).

Further, in the example raised here the performance is improved by (effectively) caching by splitting expressions and caching the dereference, which was why caching is also operating within a single expression.

As far as I can see this could affect performance in a number of ways:

  1. Save on rewriting / SSA. I can see that this could have an effect but if it is having the effect that @hannes-steffenhagen-diffblue is reporting at the high-end of the results then we should be doing this more widely and possibly profiling the SSA in those cases because it shouldn't be that slow.
  2. It could change the number of clauses / variables used. I'd love to think that it reduces them but what I can't understand is how the code above doesn't result in a cache hit in the solver's expression cache. If it misses then we should fix that cache. If it hits then I don't see how this could reduce the number of clauses.

I speculate this may be where the subtle difference above is most significant since this allows for caching of sub-expressions (i.e. dereferences). This may have some interaction with the solver expression cache, but it seems like dereferences should be cached consistently and so this should not have too much impact. (The exception would be when expressions are not cached in the solver due to appearing in a write or under quantification scope, but it is not clear to me that these would be be cachable anyway.)

As for the number of clauses/variables, I don't have detailed data (I don't think it was logged, and I don't have the actual run data for some of the runs since they were done by others). However, I can comment that the result of using the dereference cache is to generally increase both the number of clauses and number of variables. (This seems entirely expected to me since as in the examples above there needs to be an extra variable for the dereference_cache variable and also a clause, but the generally improved runtime implies this is a simplification in most cases, or at least changing the form in a way that the SAT solver finds easier. Of course, most observations and guesses about the SAT solver are somewhat speculative.)

  1. It could just change the formula which rerolls the SAT solver "dice" so some will get faster (possibly a lot) and some will get slower.

This does match quite well with observations. That said, as I mentioned above there seems to be a simplification of the expressions and so this would not appear to purely be a reroll of the SAT solver "dice".

I do wonder whether this is an improvement of 1 and 2 but because it covers a bug that results in a cache miss in the solver for two dereferences of the same thing. Maybe something is re-SSA'd and so comes out with a new name unnecessarily? Maybe the ordering of the pointer set changes so it is () ? a : () ? : b; and () ? b : () ? : a;? Maybe there is some bug in the rewriter so that rewrites aren't deterministic for this kind of expression?

I hope I've given some clarity above that would help clarify here. That said, I don't think there is a clear answer either above or in theory - it's difficult to prove the absence of potential bug. I can see reason why even with a bug-free SAT solver expression cache this PR would improve performance, but this does not rule out there being a bug there either.

@hannes-steffenhagen-diffblue can you pull out from your experimental runs the numbers of clauses and variables? That should help narrow down why this works.

I don't think Hannes will be working on this further, I may be able to gather some of this.

@martin-cs
Copy link
Collaborator

martin-cs commented Mar 29, 2021 via email

@TGWDB
Copy link
Contributor

TGWDB commented Mar 30, 2021

@martin-cs Continuing on this discussion. Note that I've cut parts that are not important to this point of the discussion and responded inline.

On Mon, 2021-03-29 at 02:41 -0700, TGWDB wrote: I'm going to add some comments and details here, apologies if this is incomplete (and please let me know if this i not clear). I'll try to respond inline.
Thanks for the fast and helpful responses. It hasn't resolved my concerns but it is good to discuss them. I guess fundamentally I think we shouldn't be merging something that improves performance if we don't have evidence of why it works. Especially if it adds complexity to a critical part of the code. I'd like to try to help gain the understanding of why it works.

I'll make the counter-argument here for discussion, don't take this as a disagreement with the principle of your position.

The counter-argument is that we should merge something that improves performance when we have evidence that it works and information on why it works. We have several benchmarks that show this improves performance on programs that have dereferences that are (re)used multiple times. We can observe that the SSA is rewritten to reduce repetition of larger expressions and replace them with single variables (the cache symbols). To me it appears fairly clear how and why it works.

The main challenges I see to the above as the following:

  1. The main performance improvements are in the SAT solver after the caching has been implemented. Since we do not control or fully understand how the SAT solver operates we cannot be sure this is a stable and guaranteed performance improvement in all cases. The response to this is that we have done benchmarking to evaluate and made this an option that can be turned on when it is helpful.
  2. These kinds of optimisations should be caught somewhere else and so we should first check that area for bugs before implementing a new performance optimisation. The response here is two fold. Firstly, we don't have good evidence of bugs/problems in the SAT cache and exploring this will take significant resources that may yield nothing. Secondly, why should we only improve performance in one place (and quite late in the pipeline) when an earlier optimisation is effective?

Again, the above are to make some counter arguments for the sake of discussion. I can see validity in both sides.

This may be significant to caching and optimisation in other places (see also responses below). Further, in the example raised here the performance is improved by (effectively) caching by splitting expressions and caching the dereference, which was why caching is also operating within a single expression.
Maybe we should work with this example to try to explore why it helps.

See below.

As far as I can see this could affect performance in a number of > ways: > > 1. Save on rewriting / SSA. I can see that this could have an > effect but if it is having the effect that @hannes-steffenhagen- > diffblue is reporting at the high-end of the results then we should > be doing this more widely and possibly profiling the SSA in those > cases because it shouldn't be that slow. > 2. It could change the number of clauses / variables used. I'd > love to think that it reduces them but what I can't understand is > how the code above doesn't result in a cache hit in the solver's > expression cache. If it misses then we should fix that cache. If > it hits then I don't see how this could reduce the number of > clauses. I speculate this may be where the subtle difference above is most significant since this allows for caching of sub-expressions (i.e. dereferences).
I can believe that this could be where it changes things but if that is the case then it is a missed caching / optimisation in the solver and should be fixed there.

See above.

I would appreciate that. I don't have time to do experiments myself but some things that I would try:

Following this.

*. Get @natasha-jeppu 's example working.

I have a modified version working now (modified since both cbmc and the repository have evolved since the original issue).

*. See if there is actual a decrease in clauses or variables generated.

There is an increase in both. Note that an increase is expected since the dereference caching adds new variables and clauses. The data is as follows

no-opt-develop.log:2908848 variables, 11455830 clauses
no-opt-unstable-no-CSE.log:2908848 variables, 11455830 clauses
no-opt-unstable-with-CSE.log:6095527 variables, 18458258 clauses

Observe that the unstable is the PR version of cbmc and turning on with-CSE is when the caching is turned on. The data for the no-CSE version is also here to show the variables and clauses are the same as develop.

*. See how it affects the run-time of various SMT solvers.

Runtime data is as follows:

no-opt-develop.log:Runtime Symex: 40.2637s
no-opt-develop.log:Runtime Postprocess Equation: 0.0033315s
no-opt-develop.log:Runtime Convert SSA: 1.63727s
no-opt-develop.log:Runtime Post-process: 1.74453s
no-opt-develop.log:Runtime Solver: 8124.48s
no-opt-develop.log:Runtime decision procedure: 8126.15s
no-opt-unstable-with-CSE.log:Runtime Symex: 60.7123s
no-opt-unstable-with-CSE.log:Runtime Postprocess Equation: 0.188855s
no-opt-unstable-with-CSE.log:Runtime Convert SSA: 6.78803s
no-opt-unstable-with-CSE.log:Runtime Post-process: 1.59298s
no-opt-unstable-with-CSE.log:Runtime Solver: 5149.39s
no-opt-unstable-with-CSE.log:Runtime decision procedure: 5156.25s

Here we can observe that there is more time spent in Symex and Convert SSA, however significantly less spent in the Solver and thus the total.

Aside: note that this does NOT come close to the manual optimisation since the manual optimisation changes the control flow of the function. To give a rough understanding of why, the original (no-opt) program has the following structure:

return (*p != 0 && *p != x &&  *(p->a) != 0 && *(p->a) == q->a)

and the manual conversion does (something like) the following

ref_p = *p;
if(ref_p == 0) return false;
if(ref_p == x) return false;
ref_p_a = *(ref_p.a);
if(ref_p_a == 0) return false;
return ref_p_a == q->a;

that not only "caches" the dereferences, but also changes the control flow.

*. Use --show-vcc and diff to establish what is actually changing.

A quick look at the log shows that adding the --show-vcc option creates a log file of ~190GB, so this is not really feasible to explore in depth.

*. Modify the solver cache to log hits and see what is happening there.

Not done (yet?).

@chrisr-diffblue
Copy link
Contributor

@martin-cs I agree it would be good to explore some of this a bit more - however practical realities are that we have a limited amount of time available to complete this work, and we think we probably have enough evidence to show that where it works, it works consistently well enough that it is worth adding as a (default disabled) feature. There's definitely more that could be done, but not the budget to do so :-) Given that most of the PR changes are actually in options handling and tests, I don't think this PR adds that much more complexity (albeit, in a somewhat subtle area of code). If its a help to resolve the conversation I'm happy to setup a three-way call ?

@TGWDB
Copy link
Contributor

TGWDB commented Mar 31, 2021

I've done a LOT more experiments and checking of various details related to the runtime performance. There are two main results that are very stable and confident now.

  1. Using CSE has little impact in many places, but can provide a big improvement or a big degradation. This is good evidence for making it available, albeit under a flag to turn on or off.
  2. There is no significantly interesting runtime performance difference between unordered_map and sharing_map. Although the unordered_map is a little faster, this is in the order of 1-2% of the total runtime, and so within the margin for variance of the runs and so not worth making big choices over.

@@ -38,6 +38,11 @@ class goto_statet
symex_level2t level2;

public:
/// This is used for eliminating repeated complicated dereferences.
/// This is pretty much just a simple wrapper around a map.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence probably isn't needed anymore?

@tautschnig
Copy link
Collaborator

I've done a LOT more experiments and checking of various details related to the runtime performance. There are two main results that are very stable and confident now.

1. Using CSE has little impact in many places, but can provide a big improvement or a big degradation. This is good evidence for making it available, albeit under a flag to turn on or off.

2. There is no significantly interesting runtime performance difference between unordered_map and sharing_map. Although the unordered_map is a little faster, this is in the order of 1-2% of the total runtime, and so within the margin for variance of the runs and so not worth making big choices over.

Thank you for doing further experiments. Would you mind sharing details about those that exhibit "big degradation?" Just so that someone can pick those up at a later point and investigate in more depth.

In general, I think we might be solving the same problem that #4977 attempted to solve. That PR doesn't have nearly as much data, but has explanations on expressions that make such a change desirable.

@tautschnig
Copy link
Collaborator

Trying to bring together two threads of discussion on this PR here: the example

return (*p != 0 && *p != x &&  *(p->a) != 0 && *(p->a) == q->a)

suggests that it's indeed double dereferences that are causing trouble, which is what #4977 tried to address. I therefore remain inclined that this is ok to be merged as-is (except for the minor comments I added earlier), and we can likely close #4977 (as @smowton had to leave it to others to address this problem).

@martin-cs: with the background of #4977, would you be happy to go ahead with this?

@martin-cs
Copy link
Collaborator

I think there may have been some kind of problem with rebasing or merging as there are now a bunch of Z3 specific changes in this PR. I'll have another view when things are a bit more normal.

@martin-cs
Copy link
Collaborator

If you can squash the fixes into the relevant commits it will make reviewing easier.

@TGWDB TGWDB force-pushed the cse-dereference-cleanup branch from 6eb7c91 to ca973e2 Compare April 1, 2021 09:48
@TGWDB TGWDB force-pushed the cse-dereference-cleanup branch from ca973e2 to 467b094 Compare April 1, 2021 10:24
@TGWDB
Copy link
Contributor

TGWDB commented Apr 1, 2021

@martin-cs Yes, I made a mess in a rebase. I've pushed now and also tidied up the commit history. The reviewing should be a lot easier and cleaner now.

@martin-cs
Copy link
Collaborator

@TGWDB thanks for tidying up the PR, I will have a read through everything and another review ASAP.

Hannes Steffenhagen and others added 2 commits April 1, 2021 14:57
@TGWDB TGWDB force-pushed the cse-dereference-cleanup branch from 467b094 to d554326 Compare April 1, 2021 13:57
@TGWDB
Copy link
Contributor

TGWDB commented Apr 1, 2021

Thank you for doing further experiments. Would you mind sharing details about those that exhibit "big degradation?" Just so that someone can pick those up at a later point and investigate in more depth.

See a set of runs here. Note that these should be considered a little noisy and so small variances are not likely to be significant. The runs are averages of a couple of invocations and all done on the same system with very limited potential for contention of CPU/memory/disk/etc. Also these are not necessarily good samples to test with, just one I had set up to be able to evaluate.

Test Develop CSE with unordered_map CSE with sharing_map
aws_add_size_checked 5.85 5.66 5.71
aws_add_size_saturating 5.74 5.81 5.68
aws_array_eq 5.80 5.79 5.82
aws_array_eq_c_str 5.82 5.85 5.78
aws_array_eq_c_str_ignore_case 5.84 5.64 5.68
aws_array_eq_ignore_case 5.78 5.80 5.68
aws_array_list_back 20.21 18.12 20.13
aws_array_list_capacity 11.98 9.92 9.86
aws_array_list_clean_up 5.79 5.82 5.65
aws_array_list_clear 5.71 5.58 5.69
aws_array_list_comparator_string 7.65 7.77 7.85
aws_array_list_copy 20.29 36.70 38.78
aws_array_list_ensure_capacity 14.99 18.16 18.19
aws_array_list_erase 334.31 340.43 365.16
aws_array_list_front 9.94 7.87 9.86
aws_array_list_get_at 22.24 34.42 34.62
aws_array_list_get_at_ptr 17.21 15.98 16.14
aws_array_list_init_dynamic 5.69 5.81 5.72
aws_array_list_init_static 7.80 7.87 7.84
aws_array_list_length 6.73 5.76 5.73
aws_array_list_pop_back 13.89 26.45 27.37
aws_array_list_pop_front 30.54 103.53 104.72
aws_array_list_pop_front_n 26.41 24.31 26.26
aws_array_list_push_back 86.01 50.13 53.27
aws_array_list_set_at 40.84 34.67 36.77
aws_array_list_shrink_to_fit 36.79 53.18 55.18
aws_array_list_sort 33.57 24.28 24.35
aws_array_list_swap 78.92 80.94 86.16
aws_array_list_swap_contents 11.89 11.94 12.99
aws_byte_buf_advance 7.80 7.88 7.81
aws_byte_buf_append 5.80 5.73 5.89
aws_byte_buf_append_dynamic 75.88 103.66 108.85
aws_byte_buf_append_with_lookup 7.90 7.84 7.90
aws_byte_buf_cat 76.82 82.02 87.23
aws_byte_buf_clean_up 5.81 5.72 5.75
aws_byte_buf_clean_up_secure 7.84 7.86 7.90
aws_byte_buf_eq 6.71 5.85 5.86
aws_byte_buf_eq_c_str 5.82 5.79 5.74
aws_byte_buf_eq_c_str_ignore_case 5.87 5.77 5.81
aws_byte_buf_eq_ignore_case 5.86 5.84 6.79
aws_byte_buf_from_array 5.76 5.64 5.81
aws_byte_buf_from_c_str 5.79 5.86 5.71
aws_byte_buf_from_empty_array 5.76 5.78 5.79
aws_byte_buf_init 5.73 5.71 5.71
aws_byte_buf_init_copy 6.88 6.81 7.77
aws_byte_buf_init_copy_from_cursor 5.77 5.72 5.70
aws_byte_buf_reserve 5.77 5.73 5.74
aws_byte_buf_reserve_relative 5.79 5.79 5.77
aws_byte_buf_reset 5.85 5.73 5.83
aws_byte_buf_secure_zero 9.86 9.90 9.91
aws_byte_buf_write 11.98 11.86 11.92
aws_byte_buf_write_be16 5.79 5.75 5.71
aws_byte_buf_write_be32 5.64 5.86 5.73
aws_byte_buf_write_be64 5.77 5.74 5.83
aws_byte_buf_write_from_whole_buffer 7.89 7.76 7.84
aws_byte_buf_write_from_whole_cursor 5.85 5.75 5.78
aws_byte_buf_write_from_whole_string 11.81 11.93 12.01
aws_byte_buf_write_u8 5.81 5.83 5.75
aws_byte_cursor_advance 6.69 6.73 6.78
aws_byte_cursor_advance_nospec 7.85 7.92 7.85
aws_byte_cursor_compare_lexical 6.84 5.74 6.84
aws_byte_cursor_compare_lookup 19.21 20.17 20.15
aws_byte_cursor_eq 5.86 5.77 5.77
aws_byte_cursor_eq_byte_buf 5.77 5.79 5.79
aws_byte_cursor_eq_byte_buf_ignore_case 5.69 5.81 5.85
aws_byte_cursor_eq_c_str 5.75 5.86 5.84
aws_byte_cursor_eq_c_str_ignore_case 5.87 5.75 5.81
aws_byte_cursor_eq_ignore_case 5.84 5.75 5.74
aws_byte_cursor_from_array 5.89 5.66 5.67
aws_byte_cursor_from_buf 5.83 5.63 5.83
aws_byte_cursor_from_c_str 5.70 5.75 5.78
aws_byte_cursor_from_string 5.70 5.73 5.81
aws_byte_cursor_left_trim_pred 5.68 5.86 5.85
aws_byte_cursor_read 11.99 10.97 12.00
aws_byte_cursor_read_and_fill_buffer 14.05 14.08 13.92
aws_byte_cursor_read_be16 5.79 5.73 5.76
aws_byte_cursor_read_be32 5.78 5.87 5.89
aws_byte_cursor_read_be64 5.74 5.87 5.84
aws_byte_cursor_read_u8 5.79 5.69 5.75
aws_byte_cursor_right_trim_pred 5.68 5.66 5.79
aws_byte_cursor_satisfies_pred 5.82 5.81 5.75
aws_byte_cursor_trim_pred 7.96 6.84 7.84
aws_hash_array_ignore_case 5.88 5.74 5.80
aws_hash_byte_cursor_ptr 5.86 5.69 5.78
aws_hash_byte_cursor_ptr_ignore_case 5.79 5.74 5.78
aws_hash_c_string 5.71 5.88 5.90
aws_hash_callback_c_str_eq 7.91 8.91 7.69
aws_hash_callback_string_destroy 5.67 5.86 5.69
aws_hash_callback_string_eq 17.18 20.13 20.13
aws_hash_iter_begin 9.95 14.02 14.06
aws_hash_iter_delete 5.82 7.76 7.85
aws_hash_iter_done 9.86 14.99 14.98
aws_hash_iter_next 26.28 55.29 58.40
aws_hash_ptr 5.82 5.80 5.70
aws_hash_string 7.68 7.90 7.78
aws_hash_table_clean_up 11.98 20.23 20.18
aws_hash_table_clear 32.68 67.49 70.71
aws_hash_table_create 86.35 202.75 207.98
aws_hash_table_eq 40.86 120.25 123.29
aws_hash_table_find 42.84 73.87 75.82
aws_hash_table_foreach 9.91 9.86 9.81
aws_hash_table_get_entry_count 5.76 5.89 5.70
aws_hash_table_init_bounded 5.74 5.72 5.76
aws_hash_table_init_unbounded 5.76 5.75 5.80
aws_hash_table_move 5.82 5.71 5.69
aws_hash_table_put 166.76 296.59 303.85
aws_hash_table_remove 34.73 59.53 60.32
aws_hash_table_swap 7.77 7.72 7.89
aws_is_power_of_two 3.54 3.72 3.64
aws_linked_list_back 9.95 9.98 9.86
aws_linked_list_begin 9.86 11.97 11.90
aws_linked_list_end 9.91 12.03 12.07
aws_linked_list_front 9.88 9.77 10.03
aws_linked_list_init 5.76 5.77 5.76
aws_linked_list_insert_after 5.82 5.85 5.82
aws_linked_list_insert_before 5.82 5.79 5.72
aws_linked_list_next 5.71 5.77 5.73
aws_linked_list_node_reset 5.89 5.73 5.76
aws_linked_list_pop_back 896.49 22.30 23.34
aws_linked_list_pop_front 16.05 18.14 18.12
aws_linked_list_prev 5.69 5.79 5.85
aws_linked_list_push_back 9.96 11.96 12.05
aws_linked_list_push_front 14.01 14.07 16.09
aws_linked_list_rbegin 9.88 11.96 11.94
aws_linked_list_remove 5.82 5.71 5.79
aws_linked_list_rend 9.89 11.93 11.88
aws_linked_list_swap_contents 22.22 22.21 22.23
aws_mul_size_checked 5.80 5.68 5.63
aws_mul_size_saturating 5.80 5.68 5.72
aws_nospec_mask 5.77 5.70 5.87
aws_priority_queue_capacity 11.93 7.81 7.86
aws_priority_queue_clean_up 5.73 5.78 5.80
aws_priority_queue_init_dynamic 5.87 5.83 5.88
aws_priority_queue_init_static 7.86 5.82 5.74
aws_priority_queue_pop 61.45 52.19 53.30
aws_priority_queue_push 688.75 5,852.57 5,910.88
aws_priority_queue_push_ref 745.58 803.21 825.88
aws_priority_queue_remove 60.40 73.62 73.51
aws_priority_queue_s_remove_node 1,945.39 1,868.87 1,911.14
aws_priority_queue_s_sift_down 397.25 468.33 475.45
aws_priority_queue_s_sift_either 242.71 255.17 258.20
aws_priority_queue_s_sift_up 40.61 36.63 37.86
aws_priority_queue_s_swap 395.18 431.36 442.50
aws_priority_queue_size 7.92 9.82 9.93
aws_priority_queue_top 30.48 16.10 16.04
aws_ptr_eq 3.74 3.79 3.61
aws_ring_buffer_acquire 60.54 58.38 58.42
aws_ring_buffer_acquire_up_to 72.83 75.23 74.91
aws_ring_buffer_buf_belongs_to_pool 18.50 18.58 18.61
aws_ring_buffer_clean_up 5.82 5.89 5.85
aws_ring_buffer_init 5.71 5.84 5.66
aws_ring_buffer_release 29.04 27.01 28.98
aws_round_up_to_power_of_two 3.71 3.76 3.64
aws_string_bytes 5.74 5.66 5.71
aws_string_compare 7.78 6.76 7.81
aws_string_destroy 5.68 5.70 5.71
aws_string_destroy_secure 5.72 5.69 5.74
aws_string_eq 7.85 5.77 5.72
aws_string_eq_byte_buf 7.80 7.88 7.97
aws_string_eq_byte_buf_ignore_case 6.71 7.85 7.77
aws_string_eq_byte_cursor 7.80 6.76 7.81
aws_string_eq_byte_cursor_ignore_case 7.78 7.84 6.82
aws_string_eq_c_str 5.89 5.77 5.86
aws_string_eq_c_str_ignore_case 5.80 5.79 5.72
aws_string_eq_ignore_case 7.92 6.72 6.78
aws_string_new_from_array 7.87 7.77 7.83
aws_string_new_from_c_str 5.89 5.79 5.77
aws_string_new_from_string 11.89 9.96 9.98
memcpy_using_uint64 27.35 28.40 28.40
memset_override_0 25.36 26.37 26.25
memset_using_uint64 35.63 35.47 36.62
Total 8,092.01 13,047.90 13,289.43

In general, I think we might be solving the same problem that #4977 attempted to solve. That PR doesn't have nearly as much data, but has explanations on expressions that make such a change desirable.

Interesting, thanks for the link.

Copy link
Collaborator

@martin-cs martin-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the performance data and your patience. Keeping in mind @chrisr-diffblue 's constraints, I am not going to oppose this. All of the hard-blocking issues around option handling have been fixed. The OPT_ and HELP_ macros would be nice but are a more minor thing.

I remain unconvinced about exactly what is being improved but if it is an option and is low-overhead when off, then, why not. I think @tautschnig 's point about @smowton 's PR #4977 is very interesting and I think that is definitely worth a look if you are looking to address these problems. TBH I would rate #4977 as a higher priority.

@TGWDB
Copy link
Contributor

TGWDB commented Apr 6, 2021

@peterschrammel Any chance of a review from a code owner?

Copy link
Member

@peterschrammel peterschrammel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another line of further investigation that was mentioned by @hannes-steffenhagen-diffblue: a significant difference between the manual optimisation in #5505 and the present PR is that the latter does not optimise away the pointer checks introduced by goto_check. Potentially, goto_check could be made smarter or goto_symex could skip over pointer checks relating to cached dereferences.

@TGWDB TGWDB merged commit 4ecf01d into diffblue:develop Apr 7, 2021
@martin-cs
Copy link
Collaborator

martin-cs commented Apr 8, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants