Skip to content

Make global string_container more resilient to different static init orders #1213

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 37 commits into from

Conversation

reuk
Copy link
Contributor

@reuk reuk commented Aug 7, 2017

At the moment, if we initialise a static object which depends on the global string_container, there's no guarantee which object will be initialised first, which means the string_container may be uninitialised when first used. This problem was spotted by @jasigal recently.

This PR wraps the global object in a function which ensures it is initialised before it is first used.

tautschnig and others added 30 commits July 19, 2017 10:19
clang -Wall says:

Struct_Initialization1/main.c:16:38: warning: suggest braces around initialization of subobject [-Wmissing-braces]
struct _classinfo nullclass1 = { 42, 1, 2, 3, 4 };
                                     ^~~~
                                     {   }
and then constructs a suitable object.

Our implementation now attempts to build an initializer list if a
variable-length array is encountered; if this succeeds, the same suitable object
is constructed. Variable-length arrays in the middle of a struct are not
permitted (neither by GCC nor Clang or our C front-end).
Previously the code would rely on type sizes being constant when building a
subtraction; this is unnecessary as the expression is sent to the solver for
evaluation anyway.
fgets may (and certainly does when successful) change the contents of the buffer
passed in. Building a regression test showed further deficiences in the stdio
library model: 1) MacOS uses _fdopen; 2) fclose uses free from stdlib.h.
Licensing: I had contributed these myself to the SV-COMP benchmarks.
Array initialization with a non-array is expected to fail
…apply

miniBDD: added a non-recursive variant of APPLY
vectors don't need to be handled, they have been removed via remove_vector.
Unlike CBMC, all output is enabled in debug mode only.
All symex limits use value -1 as marking "not set". Use size_t for counting
statistics.
The solver back end cannot necessarily handle unbounded integers. Do not use
them unless explicitly requested/supported.
fix for operator< on reverse_keyt in miniBDD, fixing segfault in ebmc
C library modelling: fgets, getopt, vasprintf
Avoid using integer_typet where unsuitable
Follow-up to 260e03d: instead of using negative numbers to represent
"limit not set", use the maximum value as default.
Avoid spurious signed/unsigned comparison warnings
Support out-of-bounds checks on arrays of dynamic size
Daniel Kroening added 2 commits August 4, 2017 22:29
Enable extractbits over arbitrary types, fix byte extract omission
@reuk reuk force-pushed the reuk/static-string-container branch from fe41573 to ee8ea7d Compare August 7, 2017 10:16
/// Get a reference to the global string container.
string_containert &get_string_container()
{
static string_containert ret{};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are those {} required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're not, but they make it clear that default-initialisation is intentional.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be done if done consistently across the code base. Else please remove them for we don't (yet) do it elsewhere.

@tautschnig
Copy link
Collaborator

What is the performance difference?

@reuk
Copy link
Contributor Author

reuk commented Aug 7, 2017

I'm not sure how to measure the performance difference. This PR shouldn't affect performance at all, because it only changes the order in which static objects are initialised. The total number of static initialisations is unchanged.

@reuk reuk force-pushed the reuk/static-string-container branch from ee8ea7d to e6dcaa7 Compare August 7, 2017 10:37
@tautschnig
Copy link
Collaborator

I'm not sure how to measure the performance difference.

I believe this is a problem that needs to be addressed before this PR can be merged.

This PR shouldn't affect performance at all, because it only changes the order in which static objects are initialised.

How about the non-inlined function calls that will now be necessary each time dstringt or as_string is invoked?

@reuk reuk force-pushed the reuk/static-string-container branch from e6dcaa7 to eb89bc0 Compare August 7, 2017 11:06
@smowton
Copy link
Contributor

smowton commented Aug 7, 2017

Possible alternative: instead of a static irep_idt (if that's what was causing this to begin with), I've been using a static function local, which gets initialised on first call instead of during library init. For example,

const irep_idt &get_some_id() {
  static const irep_idt some_id("some_id");
  return some_id;
}

The cost should be minimal for most sensible implementations per http://en.cppreference.com/w/cpp/language/storage_duration#Static_local_variables

@LAJW
Copy link
Contributor

LAJW commented Aug 7, 2017

Apart from the cost of function call there's going to be a performance hit similar to reading atomic ("magic static" - C++11 static variable initialisation is thread safe, so likely implemented using atomics). It's not implemented in VC2013 which we're supporting. If irep reference counting is not thread safe either, then that is a problem that can be safely ignored. (VC Support)

Moving get_string_declaration to header file and adding inline might alleviate @tautschnig's concern.

@reuk
Copy link
Contributor Author

reuk commented Aug 7, 2017

If we explicitly inline the function, wouldn't we end up with one copy of the static local variable for each translation unit?

@LAJW
Copy link
Contributor

LAJW commented Aug 7, 2017

@reuk Apparently not.

A static local variable in an extern inline function always refers to the same object.
7.1.2/4 - C++98/C++14 (n3797)

@reuk
Copy link
Contributor Author

reuk commented Aug 7, 2017

Some of the answers on StackOverflow say you may still end up with one instance per TU but the linker will pick a specific one to be the "canonical" instance, although I guess this may be implementation-dependent and may depend on LTO... I haven't been able to find hard evidence one way or the other.

Copy link
Contributor

@jasigal jasigal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving on the basis that my original problem is fixed, but I don't know enough to weigh in on the other topics. Obviously a senior developer needs to approve this anyways.

@tautschnig
Copy link
Collaborator

Don't get me wrong: I'm all for getting rid of the undefinedness that initialisation of static global objects induces; but this must not be merged until a performance evaluation is performed.

@reuk
Copy link
Contributor Author

reuk commented Aug 7, 2017

@tautschnig How would you propose to benchmark this PR?

@tautschnig
Copy link
Collaborator

This PR should not be benchmarked differently from any other PR, even though it is slightly easier here. Doing a summary profile across the regression tests might suffice for this PR if specific attention is paid to where in the profile the affected functions (dstringt and dstringt::as_string) show up.

Proper benchmarking would entail analysis of the effect on overall performance across a large set of benchmark inputs.

@reuk
Copy link
Contributor Author

reuk commented Aug 7, 2017

The following are times to run make from the regression directory.

With patch:

make  177.83s user 11.76s system 92% cpu 3:25.03 total

Master:

make  177.36s user 11.84s system 93% cpu 3:21.84 total

@tautschnig
Copy link
Collaborator

That experiment won't provide more than the most basic sanity check. The bare minimum is actual profiling so that you can get information on what fraction of the total time is spent on the mentioned functions.

@martin-cs
Copy link
Collaborator

If it would help, I should be int he office tomorrow and have done quite a bit of profiling of CBMC and we could chat about it.

Also at one point @Degiorgio had an interest in this code...

@tautschnig tautschnig changed the base branch from master to develop August 22, 2017 12:16
@reuk
Copy link
Contributor Author

reuk commented Aug 22, 2017

Closing as duplicate of #1218 seeing as both are targeting develop now.

@reuk reuk closed this Aug 22, 2017
@reuk reuk deleted the reuk/static-string-container branch September 14, 2017 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants