Skip to content

Enable IPO / LTO and -O3 #6011

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

Enable IPO / LTO and -O3 #6011

wants to merge 4 commits into from

Conversation

rurban
Copy link
Contributor

@rurban rurban commented Apr 4, 2021

LTO and -O3 give a good performance improvement.
I would not recommend -O3 with bad compiler versions though, such
as gcc-9

@rurban rurban requested a review from a team as a code owner April 4, 2021 19:25
@rurban
Copy link
Contributor Author

rurban commented Apr 4, 2021

ubuntu 18 failed:
old compilers with -O3 complain with bogus -Wmaybe-uninitialized for *begin_range (g++-7.4)
maybe disable -O3 for this function, or silence this warning.

I went with silencing this wrong warning

Copy link
Collaborator

@martin-cs martin-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR. If we are going to make this change then config.inc will also need to change in sync.

@codecov
Copy link

codecov bot commented Apr 4, 2021

Codecov Report

Merging #6011 (6ca4ab2) into develop (ca6f109) will increase coverage by 0.13%.
The diff coverage is n/a.

❗ Current head 6ca4ab2 differs from pull request most recent head e0a69da. Consider uploading reports for the commit e0a69da to get more accurate results
Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #6011      +/-   ##
===========================================
+ Coverage    74.13%   74.26%   +0.13%     
===========================================
  Files         1444     1444              
  Lines       157322   157322              
===========================================
+ Hits        116637   116843     +206     
+ Misses       40685    40479     -206     
Impacted Files Coverage Δ
src/ansi-c/gcc_types.cpp 41.86% <0.00%> (-46.52%) ⬇️
src/util/cmdline.cpp 69.94% <0.00%> (-25.44%) ⬇️
src/ansi-c/scanner.l 41.15% <0.00%> (-20.58%) ⬇️
src/ansi-c/c_nondet_symbol_factory.cpp 86.20% <0.00%> (-13.80%) ⬇️
src/util/config.h 50.00% <0.00%> (-12.50%) ⬇️
src/ansi-c/ansi_c_parser.cpp 76.82% <0.00%> (-9.76%) ⬇️
src/langapi/language_util.cpp 45.45% <0.00%> (-9.10%) ⬇️
src/util/pointer_offset_size.cpp 85.79% <0.00%> (-6.44%) ⬇️
src/util/file_util.cpp 71.42% <0.00%> (-5.36%) ⬇️
src/util/config.cpp 52.38% <0.00%> (-3.04%) ⬇️
... and 25 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ab404f6...e0a69da. Read the comment docs.

@tautschnig
Copy link
Collaborator

@rurban Would you be able to provide numbers to back up "good performance improvement?" I once tried this myself, and didn't see much of a difference, but perhaps I didn't use a good-enough compiler version or did not try all the options.

@tautschnig
Copy link
Collaborator

FWIW, we already have #3407, which should likely be closed, but equally needed more data.

@rurban
Copy link
Contributor Author

rurban commented Apr 6, 2021

Performance numbers: Typically 10-20% faster, which aligns to my other projects where -O3 -flto brought about the same improvements.
I usually verify only very small files. Big files are usually not solvable, or only with satabs.

For my project rurban/ctl (the C++ STL in C): make verify

  • /usr/bin/cbmc -O3 -flto gcc-10
    CBMC version 5.26.1 (cbmc-5.26.1-exp-89-g214216ef7) 64-bit x86_64 linux

  • /usr/bin/cbmc -O2 gcc-10
    CBMC version 5.25.0 (n/a) 64-bit x86_64 linux

$ time for f in tests/verify/*.c; do /usr/bin/cbmc --bounds-check --pointer-check --memory-leak-check --div-by-zero-check --signed-overflow-check --unsigned-overflow-check --pointer-overflow-check --conversion-check --undefined-shift-check --enum-range-check --pointer-primitive-check --compact-trace --depth 6 --unwind 6 -I. $f >/dev/null; done

real	0m3.673s
user	0m2.245s
sys	0m0.337s

$ time for f in tests/verify/*.c; do /usr/local/bin/cbmc --bounds-check --pointer-check --memory-leak-check --div-by-zero-check --signed-overflow-check --unsigned-overflow-check --pointer-overflow-check --conversion-check --undefined-shift-check --enum-range-check --pointer-primitive-check --compact-trace --depth 6 --unwind 6 -I. $f >/dev/null; done

real	0m2.819s
user	0m2.196s
sys	0m0.335s

when I add -march=native I get real 0m1.942s. This is valid for all gcc-compatible compilers on Intel.

diff --git CMakeLists.txt CMakeLists.txt
index fe1bd4ca7..dc0602cf5 100644
--- CMakeLists.txt
+++ CMakeLists.txt
@@ -202,6 +202,21 @@ if(ipo_supported)
   message(STATUS "IPO / LTO enabled")
   set_property(GLOBAL PROPERTY INTERPROCEDURAL_OPTIMIZATION True)
   add_definitions(-DLTO)
+  if ((CMAKE_SYSTEM_PROCESSOR STREQUAL "x86_64")
+       OR (CMAKE_SYSTEM_PROCESSOR STREQUAL "i386")
+       OR (CMAKE_SYSTEM_PROCESSOR STREQUAL "i586")
+       OR (CMAKE_SYSTEM_PROCESSOR STREQUAL "i686")
+       OR (CMAKE_SYSTEM_PROCESSOR STREQUAL "amd64") # freebsd
+       OR (CMAKE_SYSTEM_PROCESSOR STREQUAL "AMD64") # windows
+       OR (CMAKE_SYSTEM_PROCESSOR STREQUAL "x86"))
+     set(PROCESSOR_FAMILY "Intel")
+     if(CMAKE_COMPILER_IS_GNUCC
+         OR (CMAKE_C_COMPILER_ID STREQUAL AppleClang)
+         OR (CMAKE_C_COMPILER_ID STREQUAL Clang)
+         OR (CMAKE_C_COMPILER_ID STREQUAL Intel))
+       add_definitions(-march=native)
+     endif()
+  endif()
 else()
   message(STATUS "IPO / LTO not supported: <${error}>")
 endif()

src/config.inc Outdated
@@ -76,7 +79,7 @@ endif
OSX_IDENTITY="Developer ID Application: Daniel Kroening"

# Detailed version information
CBMC_VERSION = 5.26.0
CBMC_VERSION = 5.26.1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you may have rebased on to a branch other than develop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On top of the version PR #6014

@martin-cs
Copy link
Collaborator

@rurban Thank you for the timing results. Normally our focus tends to be on run-times in the 10s--1200s run-times. If we get a 10-20% performance benefit on those then that would be a very compelling argument.

Am I reading your comments correctly that you are using SatAbs? If so, which version?

@rurban
Copy link
Contributor Author

rurban commented Apr 6, 2021

I'm using satabs 3.2 with boom 1.8.
I haven't checked for updated versions there. I still run into WIP array stuff there.

Refining set of predicates according to counterexample (WP)
*** CEGAR Loop Iteration 2
Computing Predicate Abstraction for Program
sorry, wp(array_copy...) not implemented

@@ -14,7 +14,7 @@ if(GIT_FOUND)
OUTPUT_STRIP_TRAILING_WHITESPACE
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
)
configure_file(\${CUR}/version.cpp.in version.cpp)
configure_file(\${CUR}/version.cpp.in \${CBMC_SOURCE_DIR}/util/version.cpp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 Please drop this commit as this "fix" is not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without it version.cpp gets generated in the build dir. and the new version is ignored.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version.cpp in the build dir is compiled and linked into the build. Ignoring this file would result in a failed build as it would not exist elsewhere. The latest Ubuntu build on our release page shows the correct version number when I download and install it and it was built without this proposed change. I also end up with CBMC builds showing the correct version number for builds I produce locally, with altered version numbers in config.inc. Under what conditions have you seen incorrect version numbers?

If you have also built cbmc using make and the makefiles, then you can end up with a version.cpp in the cbmc/src/util directory. that would be a stale build artefact, from the build in place nature of the make build. It should not be committed to git. To my knowledge it is correctly ignored and neither overwritten or linked by the cmake build, if it exists.

Copy link
Contributor Author

@rurban rurban Apr 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. You are right. My mistake with the wrong prefix.

In the general case build vs src mixup, you need to build out of src to get wrong version.cpp files

@tautschnig
Copy link
Collaborator

Would you mind including the following patch (or something to that effect):

--- a/src/solvers/CMakeLists.txt
+++ b/src/solvers/CMakeLists.txt
@@ -109,7 +109,13 @@ elseif("${sat_impl}" STREQUAL "cadical")
     )

     message(STATUS "Building CaDiCaL")
-    execute_process(COMMAND make WORKING_DIRECTORY ${cadical_SOURCE_DIR})
+    if("${CMAKE_BUILD_TYPE}" STREQUAL "Release")
+      execute_process(COMMAND make CXXFLAGS=${CMAKE_CXX_FLAGS_RELEASE} WORKING_DIRECTORY ${cadical_SOURCE_DIR})
+    elseif("${CMAKE_BUILD_TYPE}" STREQUAL "RelWithDebInfo")
+      execute_process(COMMAND make CXXFLAGS=${CMAKE_CXX_FLAGS_RELWITHDEBINFO} WORKING_DIRECTORY ${cadical_SOURCE_DIR})
+    else()
+      execute_process(COMMAND make WORKING_DIRECTORY ${cadical_SOURCE_DIR})
+    endif()

     target_compile_definitions(solvers PUBLIC
         SATCHECK_CADICAL HAVE_CADICAL

to ensure the CaDiCaL solver is also built with the new options?

@rurban rurban force-pushed the develop branch 2 times, most recently from 7b54c81 to 6ca4ab2 Compare April 7, 2021 13:48
Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reliable way in CMake to obtain the actual compile flags? As a CMake noob the best I could find is not using add_definitions and instead going for set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} ...") so that CMAKE_CXX_FLAGS_RELEASE can be used. I'm asking to make sure the CaDiCaL build actually benefits.

if(ipo_supported)
message(STATUS "IPO / LTO enabled")
set_property(GLOBAL PROPERTY INTERPROCEDURAL_OPTIMIZATION True)
add_definitions(-DLTO)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMake noob here: could we use set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -DLTO") instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this is easier. It only adds it to the Release flags anyway, because only there ipo_supported will be true.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, but then I wasn't able to find a way to surface this to the CaDiCaL build configuration. You might know how to make this work?

CMakeLists.txt Outdated
Comment on lines 213 to 221
if(CMAKE_COMPILER_IS_GNUCC
OR (CMAKE_C_COMPILER_ID STREQUAL AppleClang)
OR (CMAKE_C_COMPILER_ID STREQUAL Clang)
OR (CMAKE_C_COMPILER_ID STREQUAL Intel))
add_definitions(-march=native)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if(CMAKE_COMPILER_IS_GNUCC
OR (CMAKE_C_COMPILER_ID STREQUAL AppleClang)
OR (CMAKE_C_COMPILER_ID STREQUAL Clang)
OR (CMAKE_C_COMPILER_ID STREQUAL Intel))
add_definitions(-march=native)
if(CMAKE_COMPILER_IS_GNUCC
OR (CMAKE_CXX_COMPILER_ID STREQUAL AppleClang)
OR (CMAKE_CXX_COMPILER_ID STREQUAL Clang)
OR (CMAKE_CXX_COMPILER_ID STREQUAL Intel))
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -march=native")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CXX compiler! good catch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might actually need both the C and CXX variants?

@martin-cs
Copy link
Collaborator

martin-cs commented Apr 7, 2021 via email

@rurban
Copy link
Contributor Author

rurban commented Apr 7, 2021

Cannot repro the ubuntu20 cmake-gcc crash, but looks bad. Was introduced by --march=native, so it looks like a bad alignment bug. Either a function, but more like an argument, which wanted to be vectorized but was not 16-byte aligned.

I also notized that the GLOBAL target did not enable all targets to be compiled with -flto. Got that working, but am now in different, more expected fields of bugs. Will need some time.

@Schrammel
Copy link

Schrammel commented Apr 7, 2021

Hello @martin-cs!

I never had the source code of this project, sorry,I believe that you have marked the wrong person...

OR (CMAKE_C_COMPILER_ID STREQUAL AppleClang)
OR (CMAKE_C_COMPILER_ID STREQUAL Clang)
OR (CMAKE_C_COMPILER_ID STREQUAL Intel))
add_definitions(-march=native)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit nervous of the -march=native - if I'm reading the CMake correctly (a big if :-) ) then this is enabled by default on release builds, but thats quite likely to cause problems building CBMC release distributions (or at least, be a trap waiting to cause problems). native means "use any instructions available on the build machine" I believe - but we've got no real guarantee that our build machines are not higher spec/newer CPUs/whatever than any of the downstream users of our binary packages, meaning they run the risk of getting illegal instruction crashes/traps from our released binaries. While we could override default options for our binary packaging, I'd feel much more comfortable if we didn't have to...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Packagers need to get rid of that somehow. Maybe only add it to CFLAGS and CXXFLAGS via env. The idea was to find sloppy visibility and alignment, and test it in develop. Also have locally fast binaries

@chrisr-diffblue
Copy link
Contributor

While I'm definitely supportive of "free speed" form -O3/LTO - I must admit I'm a bit weary of enabling both of these by default. In all honesty, I may just be being paranoid for the decade I spent working as a compiler engineer, :-) but I've seen an awful lot of codegen bug reports where '-O3' was a common factor - and codegen bugs are insidious to discover. Given what CBMC is, and that this PR enables -O3 by default, I think it's worth asking an explicit question: How much do we want to trade potential speed for potential increase risk of silent incorrect results? Not against this going in, but just think it is worth an explicit decision on the risk aspect.

@rurban
Copy link
Contributor Author

rurban commented Apr 10, 2021

The -O3 idea is to catch compiler or code bugs earlier in the develop branch. Inlining and vectorization catches much more than without, E.g. wrong visibility, wrong alignment, bad compiler versions, ...

Same for -flto and -march-native. I'm already blocked by some bugs caught by improved lto builds.
-march-native of course needs to be turned off by packagers.

@tautschnig
Copy link
Collaborator

The -O3 idea is to catch compiler or code bugs earlier in the develop branch. Inlining and vectorization catches much more than without, E.g. wrong visibility, wrong alignment, bad compiler versions, ...

Same for -flto and -march-native. I'm already blocked by some bugs caught by improved lto builds.
-march-native of course needs to be turned off by packagers.

That suggests 1) adding code to permit building with that configuration and 2) adding a further GitHub action using that configuration rather than making it the default?

Also, can you provide some hints on how to best debug this? I wasn't able to reproduce the failure on a 20.04 system myself.

rurban added 4 commits April 12, 2021 09:54
LTO and -O3 give a good performance improvement.
I would not recommend -O3 with bad compiler versions though, such
as gcc-9.

WIP: GLOBAL properties are not enough, add the IPO property to all
targets. Which leads to various problems still.
not valid on arm or other archs. Only for gcc-compatible compilers.
30% faster on small samples than -flto
Create version.cpp in the src dir, where it is picked up.
And not in the build dir.
Update only if different to avoid needless recompilations.

Repro: check bin/cbmc --version for the correct version.
@rurban
Copy link
Contributor Author

rurban commented Apr 12, 2021

I'll redo in a new branch, because of the blockers

@rurban rurban closed this Apr 12, 2021
@thomasspriggs
Copy link
Contributor

It might be worth considering -O3 and the IPO / LTO changes as separate PRs. If we switch on one before the other, then we might be in a better place to work out which of the changes caused any issues which we run into later. Do you think IPO / LTO is less risky than -O3 optimisation?

@rurban
Copy link
Contributor Author

rurban commented Apr 12, 2021

LTO is harder to achieve.

O3 is riskier to fall into compiler bugs. I found several.

@TGWDB TGWDB mentioned this pull request May 19, 2021
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants