Skip to content

Commit 2940331

Browse files
committed
Rewrite unions such that all members are of the same size
In the C front-end, rewrite union components with types smaller than the union's size to anonymous structs. Each such struct contains the original union component plus padding. Assignments to union members thus always assign all bytes that make up the object representation of a union. The use of an anonymous struct ensures that member accesses can still be resolved. As this is a change in the semantics of goto programs, the goto binary version is incremented. rewrite_union is no longer necessary, but bugs (hidden by rewrite_union) in handling endianness in simplify_expr_member and convert_member surfaced and had to be fixed.
1 parent ea09c2a commit 2940331

File tree

23 files changed

+342
-67
lines changed

23 files changed

+342
-67
lines changed

jbmc/src/jdiff/jdiff_parse_options.cpp

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,6 @@ Author: Peter Schrammel
4040
#include <goto-programs/remove_unused_functions.h>
4141
#include <goto-programs/remove_vector.h>
4242
#include <goto-programs/remove_virtual_functions.h>
43-
#include <goto-programs/rewrite_union.h>
4443
#include <goto-programs/set_properties.h>
4544
#include <goto-programs/show_properties.h>
4645
#include <goto-programs/string_abstraction.h>
@@ -296,7 +295,6 @@ bool jdiff_parse_optionst::process_goto_program(
296295
remove_returns(goto_model);
297296
remove_vector(goto_model);
298297
remove_complex(goto_model);
299-
rewrite_union(goto_model);
300298

301299
// add generic checks
302300
log.status() << "Generic Property Instrumentation" << messaget::eom;
693 Bytes
Binary file not shown.

regression/ansi-c/arch_flags_mcpu_bad/test.desc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ This tests the -mcpu=cortex=a15 flag that should activate ARM-32 mode.
1111
The object file 'object.intel' was compiled from 'source.c' with goto-cc
1212
on a 64-bit platform:
1313

14-
goto-cc -c source.c
14+
goto-cc -c source.c -o object.intel
1515

1616
preproc.i is already pre-processed so that it can be linked in without
1717
needing to invoke a pre-processor from a cross-compile toolchain on your
687 Bytes
Binary file not shown.
Binary file not shown.

regression/ansi-c/arch_flags_mthumb_bad/test.desc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ This tests the -mthumb flag that should activate ARM-32 mode. The object
1111
file 'object.intel' was compiled from 'source.c' with goto-cc on a
1212
64-bit platform:
1313

14-
goto-cc -c source.c
14+
goto-cc -c source.c -o object.intel
1515

1616
preproc.i is already pre-processed so that it can be linked in without
1717
needing to invoke a pre-processor from a cross-compile toolchain on your
686 Bytes
Binary file not shown.

regression/cbmc-library/float-nan-check/test.desc

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
11
CORE
22
main.c
33
--nan-check
4-
\[main.NaN.1\] line \d+ NaN on \+ in byte_extract_little_endian\(c, (0|0l|0ll), float\) \+ myzero: SUCCESS
4+
\[main.NaN.1\] line \d+ NaN on \+ in c\.f \+ myzero: SUCCESS
55
\[main.NaN.2\] line \d+ NaN on / in myzero / myzero: FAILURE
66
\[main.NaN.3\] line \d+ NaN on / in \(float\)n / myinf: FAILURE
77
\[main.NaN.4\] line \d+ NaN on \* in myinf \* myzero: FAILURE
88
\[main.NaN.5\] line \d+ NaN on \+ in -myinf \+ myinf: FAILURE
99
\[main.NaN.6\] line \d+ NaN on \+ in myinf \+ -myinf: FAILURE
1010
\[main.NaN.7\] line \d+ NaN on - in myinf - myinf: FAILURE
1111
\[main.NaN.8\] line \d+ NaN on - in -myinf - -myinf: FAILURE
12-
\[main.NaN.9\] line \d+ NaN on \+ in byte_extract_little_endian\(c, (0|0l|0ll), float\) \+ byte_extract_little_endian\(c, (0|0l|0ll), float\): SUCCESS
13-
\[main.NaN.10\] line \d+ NaN on / in byte_extract_little_endian\(c, (0|0l|0ll), float\) / myzero: SUCCESS
12+
\[main.NaN.9\] line \d+ NaN on \+ in c\.f \+ c\.f: SUCCESS
13+
\[main.NaN.10\] line \d+ NaN on / in c\.f / myzero: SUCCESS
1414
\[main.NaN.11\] line \d+ NaN on \* in mynan \* mynan: SUCCESS
1515
\[main.NaN.12\] line \d+ NaN on \+ in mynan \+ mynan: SUCCESS
1616
\[main.NaN.13\] line \d+ NaN on - in mynan - mynan: SUCCESS
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
CORE broken-smt-backend
2+
main.c
3+
--big-endian --no-simplify
4+
^EXIT=0$
5+
^SIGNAL=0$
6+
^VERIFICATION SUCCESSFUL$
7+
--
8+
^warning: ignoring
9+
--
10+
Although this test can now be fully solved via constant propagation and
11+
simplification, it should also succeed without simplification.

regression/cbmc/union7/test.desc

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
1-
CORE broken-smt-backend
1+
CORE
22
main.c
33
--big-endian
4+
Generated 3 VCC\(s\), 0 remaining after simplification
45
^EXIT=0$
56
^SIGNAL=0$
67
^VERIFICATION SUCCESSFUL$
78
--
89
^warning: ignoring
910
--
10-
This test reports a VERIFICATION ERROR when running with the SMT2 solver on
11-
Windows only.
11+
This test can now be fully solved via constant propagation and simplification.

scripts/expected_doxygen_warnings.txt

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -35,13 +35,13 @@
3535
/home/runner/work/cbmc/cbmc/src/analyses/variable-sensitivity/variable_sensitivity_object_factory.cpp:188: warning: The following parameter of initialize_abstract_object(const typet type, bool top, bool bottom, const exprt &e, const abstract_environmentt &environment, const namespacet &ns, const vsd_configt &configuration) is not documented:
3636
parameter 'configuration'
3737

38-
warning: Include graph for 'cbmc_parse_options.cpp' not generated, too many nodes (61), threshold is 60. Consider increasing DOT_GRAPH_MAX_NODES.
38+
warning: Include graph for 'cbmc_parse_options.cpp' not generated, too many nodes (60), threshold is 60. Consider increasing DOT_GRAPH_MAX_NODES.
3939
warning: Include graph for 'goto_instrument_parse_options.cpp' not generated, too many nodes (97), threshold is 60. Consider increasing DOT_GRAPH_MAX_NODES.
4040
warning: Included by graph for 'goto_functions.h' not generated, too many nodes (66), threshold is 60. Consider increasing DOT_GRAPH_MAX_NODES.
4141
warning: Included by graph for 'goto_model.h' not generated, too many nodes (109), threshold is 60. Consider increasing DOT_GRAPH_MAX_NODES.
42-
warning: Included by graph for 'arith_tools.h' not generated, too many nodes (182), threshold is 60. Consider increasing DOT_GRAPH_MAX_NODES.
43-
warning: Included by graph for 'c_types.h' not generated, too many nodes (111), threshold is 60. Consider increasing DOT_GRAPH_MAX_NODES.
44-
warning: Included by graph for 'config.h' not generated, too many nodes (85), threshold is 60. Consider increasing DOT_GRAPH_MAX_NODES.
42+
warning: Included by graph for 'arith_tools.h' not generated, too many nodes (184), threshold is 60. Consider increasing DOT_GRAPH_MAX_NODES.
43+
warning: Included by graph for 'c_types.h' not generated, too many nodes (112), threshold is 60. Consider increasing DOT_GRAPH_MAX_NODES.
44+
warning: Included by graph for 'config.h' not generated, too many nodes (86), threshold is 60. Consider increasing DOT_GRAPH_MAX_NODES.
4545
warning: Included by graph for 'exception_utils.h' not generated, too many nodes (61), threshold is 60. Consider increasing DOT_GRAPH_MAX_NODES.
4646
warning: Included by graph for 'expr.h' not generated, too many nodes (87), threshold is 60. Consider increasing DOT_GRAPH_MAX_NODES.
4747
warning: Included by graph for 'expr_util.h' not generated, too many nodes (61), threshold is 60. Consider increasing DOT_GRAPH_MAX_NODES.

src/ansi-c/c_typecheck_expr.cpp

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ Author: Daniel Kroening, [email protected]
1717
#include <util/c_types.h>
1818
#include <util/config.h>
1919
#include <util/cprover_prefix.h>
20+
#include <util/expr_initializer.h>
2021
#include <util/expr_util.h>
2122
#include <util/floatbv_expr.h>
2223
#include <util/fresh_symbol.h>
@@ -1121,6 +1122,36 @@ void c_typecheck_baset::typecheck_expr_typecast(exprt &expr)
11211122
expr.set(ID_C_lvalue, true);
11221123
return;
11231124
}
1125+
else if(c.get_anonymous() && c.type().id() == ID_struct_tag)
1126+
{
1127+
// we also look into anonymous members as we might have wrapped a
1128+
// non-maximal member in such a struct
1129+
const struct_typet &struct_type =
1130+
follow_tag(to_struct_tag_type(c.type()));
1131+
if(
1132+
!struct_type.components().empty() &&
1133+
struct_type.components().front().type() == op.type())
1134+
{
1135+
auto nondet =
1136+
nondet_initializer(c.type(), expr.source_location(), *this);
1137+
if(!nondet.has_value())
1138+
{
1139+
error().source_location = expr.source_location();
1140+
error() << "cannot nondet-initialize union component of type '"
1141+
<< to_string(c.type()) << "'" << eom;
1142+
throw 0;
1143+
}
1144+
CHECK_RETURN(nondet->id() == ID_struct);
1145+
CHECK_RETURN(!nondet->operands().empty());
1146+
1147+
nondet->operands().front() = op;
1148+
union_exprt union_expr(c.get_name(), *nondet, expr.type());
1149+
union_expr.add_source_location() = expr.source_location();
1150+
expr = union_expr;
1151+
expr.set(ID_C_lvalue, true);
1152+
return;
1153+
}
1154+
}
11241155
}
11251156

11261157
// not found, complain

src/ansi-c/c_typecheck_initializer.cpp

Lines changed: 25 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -532,27 +532,36 @@ exprt::operandst::const_iterator c_typecheck_baset::do_designated_initializer(
532532
// initializer; other bytes have an unspecified value (C Standard
533533
// 6.2.6.1(7)). In practice, objects of static lifetime are fully zero
534534
// initialized.
535-
const auto zero =
536-
zero_initializer(component.type(), value.source_location(), *this);
537-
if(!zero.has_value())
538-
{
539-
error().source_location = value.source_location();
540-
error() << "cannot zero-initialize union component of type '"
541-
<< to_string(component.type()) << "'" << eom;
542-
throw 0;
543-
}
544-
545535
if(current_symbol.is_static_lifetime)
546536
{
547-
byte_update_exprt byte_update{
548-
byte_update_id(), *dest, from_integer(0, index_type()), *zero};
549-
byte_update.add_source_location() = value.source_location();
550-
*dest = std::move(byte_update);
551-
dest = &(to_byte_update_expr(*dest).op2());
537+
const auto zero =
538+
zero_initializer(component.type(), value.source_location(), *this);
539+
if(!zero.has_value())
540+
{
541+
error().source_location = value.source_location();
542+
error() << "cannot zero-initialize union component of type '"
543+
<< to_string(component.type()) << "'" << eom;
544+
throw 0;
545+
}
546+
547+
union_exprt union_expr(component.get_name(), *zero, type);
548+
union_expr.add_source_location() = value.source_location();
549+
*dest = std::move(union_expr);
550+
dest = &(to_union_expr(*dest).op());
552551
}
553552
else
554553
{
555-
union_exprt union_expr(component.get_name(), *zero, type);
554+
const auto nondet = nondet_initializer(
555+
component.type(), value.source_location(), *this);
556+
if(!nondet.has_value())
557+
{
558+
error().source_location = value.source_location();
559+
error() << "cannot nondet-initialize union component of type '"
560+
<< to_string(component.type()) << "'" << eom;
561+
throw 0;
562+
}
563+
564+
union_exprt union_expr(component.get_name(), *nondet, type);
556565
union_expr.add_source_location() = value.source_location();
557566
*dest = std::move(union_expr);
558567
dest = &(to_union_expr(*dest).op());

src/ansi-c/c_typecheck_type.cpp

Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,9 @@ Author: Daniel Kroening, [email protected]
2020
#include <util/mathematical_types.h>
2121
#include <util/pointer_expr.h>
2222
#include <util/pointer_offset_size.h>
23+
#include <util/prefix.h>
2324
#include <util/simplify_expr.h>
25+
#include <util/string2int.h>
2426

2527
#include "ansi_c_convert_type.h"
2628
#include "c_qualifiers.h"
@@ -1106,6 +1108,148 @@ void c_typecheck_baset::typecheck_compound_body(
11061108
<< eom;
11071109
throw 0;
11081110
}
1111+
1112+
if(type.id() == ID_union && !components.empty())
1113+
{
1114+
const auto widest_member =
1115+
to_union_type(type).find_widest_union_component(*this);
1116+
optionalt<exprt> size_expr_opt;
1117+
1118+
if(!widest_member.has_value())
1119+
{
1120+
// GCC extension: arrays of non-constant size
1121+
size_expr_opt = size_of_expr(type, *this);
1122+
1123+
if(!size_expr_opt.has_value())
1124+
{
1125+
error().source_location = type.source_location();
1126+
error() << "cannot determine size of union" << eom;
1127+
throw 0;
1128+
}
1129+
}
1130+
1131+
// ensure non-conflicting member names
1132+
for(auto &component : components)
1133+
{
1134+
if(
1135+
id2string(component.get_name()).size() > 5 &&
1136+
has_prefix(id2string(component.get_name()), "$anon"))
1137+
{
1138+
const std::string suffix{id2string(component.get_name()), 5};
1139+
auto int_suffix = string2optional_unsigned(suffix);
1140+
if(int_suffix.has_value())
1141+
anon_member_counter = std::max(anon_member_counter, *int_suffix + 1);
1142+
}
1143+
}
1144+
1145+
for(auto &component : components)
1146+
{
1147+
std::vector<typet> padding_types;
1148+
1149+
const auto &component_bits = pointer_offset_bits(component.type(), *this);
1150+
if(
1151+
component_bits.has_value() &&
1152+
(!widest_member.has_value() ||
1153+
*component_bits < widest_member->second) &&
1154+
*component_bits % 8 != 0)
1155+
{
1156+
std::size_t bit_field_padding_width =
1157+
8 - numeric_cast_v<std::size_t>(*component_bits % 8);
1158+
padding_types.push_back(c_bit_field_typet{
1159+
unsignedbv_typet{bit_field_padding_width}, bit_field_padding_width});
1160+
}
1161+
1162+
if(size_expr_opt.has_value())
1163+
{
1164+
const auto component_size_expr_opt =
1165+
size_of_expr(component.type(), *this);
1166+
CHECK_RETURN(component_size_expr_opt.has_value());
1167+
// size_of_expr will round up to full bytes, and also works for bit
1168+
// fields (which C sizeof does not support). Thus, we don't need to
1169+
// special-case bit_field_padding_opt being set.
1170+
if_exprt padding_bytes{
1171+
binary_relation_exprt{
1172+
*component_size_expr_opt, ID_lt, *size_expr_opt},
1173+
minus_exprt{*size_expr_opt, *component_size_expr_opt},
1174+
from_integer(0, size_expr_opt->type())};
1175+
simplify(padding_bytes, *this);
1176+
1177+
if(!padding_bytes.is_zero())
1178+
{
1179+
make_index_type(padding_bytes);
1180+
1181+
padding_types.push_back(
1182+
array_typet{unsigned_char_type(), padding_bytes});
1183+
}
1184+
}
1185+
else
1186+
{
1187+
CHECK_RETURN(component_bits.has_value());
1188+
// Truncating division ensures that we don't need to special-case
1189+
// bit_field_padding_opt being set.
1190+
std::size_t padding_bytes =
1191+
numeric_cast_v<std::size_t>(widest_member->second - *component_bits) /
1192+
8;
1193+
1194+
if(padding_bytes != 0)
1195+
{
1196+
padding_types.push_back(array_typet{
1197+
unsigned_char_type(), from_integer(padding_bytes, index_type())});
1198+
}
1199+
}
1200+
1201+
if(padding_types.empty())
1202+
continue;
1203+
1204+
struct_typet::componentst component_with_padding_type;
1205+
component_with_padding_type.push_back(component);
1206+
for(const auto &t : padding_types)
1207+
{
1208+
// TODO: perhaps use the same naming as padding.cpp uses
1209+
component_with_padding_type.push_back(struct_union_typet::componentt{
1210+
"$anon" + std::to_string(anon_member_counter++), t});
1211+
// TODO: not sure whether this should be anonymous
1212+
component_with_padding_type.back().set_anonymous(true);
1213+
component_with_padding_type.back().set_is_padding(true);
1214+
component_with_padding_type.back().add_source_location() =
1215+
component.source_location();
1216+
}
1217+
1218+
struct_union_typet::componentt component_with_padding{
1219+
"$anon" + std::to_string(anon_member_counter++),
1220+
struct_typet{component_with_padding_type}};
1221+
component_with_padding.set_anonymous(true);
1222+
component_with_padding.add_source_location() =
1223+
component.source_location();
1224+
1225+
// based on typecheck_compound_type, branch
1226+
// "if(type.find(ID_tag).is_nil())"
1227+
// produce type symbol
1228+
type_symbolt compound_symbol{component_with_padding.type()};
1229+
compound_symbol.location = component_with_padding.source_location();
1230+
1231+
std::string typestr = type2name(compound_symbol.type, *this);
1232+
compound_symbol.base_name = "#anon#" + typestr;
1233+
compound_symbol.name = "tag-#anon#" + typestr;
1234+
1235+
// We might already have the same anonymous struct, and this is simply
1236+
// ok as those are exactly the same types, just introduced at a
1237+
// different source location.
1238+
symbolt *new_symbol = &compound_symbol;
1239+
if(
1240+
symbol_table.symbols.find(compound_symbol.name) ==
1241+
symbol_table.symbols.end())
1242+
{
1243+
move_symbol(compound_symbol, new_symbol);
1244+
}
1245+
1246+
struct_tag_typet tag_type{new_symbol->name};
1247+
tag_type.add_source_location() = component_with_padding.source_location();
1248+
component_with_padding.type().swap(tag_type);
1249+
1250+
component.swap(component_with_padding);
1251+
}
1252+
}
11091253
}
11101254

11111255
typet c_typecheck_baset::enum_constant_type(

src/cbmc/cbmc_parse_options.cpp

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,6 @@ Author: Daniel Kroening, [email protected]
6262
#include <goto-programs/remove_skip.h>
6363
#include <goto-programs/remove_unused_functions.h>
6464
#include <goto-programs/remove_vector.h>
65-
#include <goto-programs/rewrite_union.h>
6665
#include <goto-programs/set_properties.h>
6766
#include <goto-programs/show_goto_functions.h>
6867
#include <goto-programs/show_properties.h>
@@ -946,7 +945,6 @@ bool cbmc_parse_optionst::process_goto_program(
946945
remove_returns(goto_model);
947946
remove_vector(goto_model);
948947
remove_complex(goto_model);
949-
rewrite_union(goto_model);
950948

951949
// add generic checks
952950
log.status() << "Generic Property Instrumentation" << messaget::eom;

src/goto-diff/goto_diff_parse_options.cpp

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,6 @@ Author: Peter Schrammel
4040
#include <goto-programs/remove_unused_functions.h>
4141
#include <goto-programs/remove_vector.h>
4242
#include <goto-programs/remove_virtual_functions.h>
43-
#include <goto-programs/rewrite_union.h>
4443
#include <goto-programs/set_properties.h>
4544
#include <goto-programs/show_properties.h>
4645
#include <goto-programs/string_abstraction.h>
@@ -341,7 +340,6 @@ bool goto_diff_parse_optionst::process_goto_program(
341340
remove_returns(goto_model);
342341
remove_vector(goto_model);
343342
remove_complex(goto_model);
344-
rewrite_union(goto_model);
345343

346344
// add generic checks
347345
log.status() << "Generic Property Instrumentation" << messaget::eom;

src/goto-programs/write_goto_binary.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Author: CM Wintersteiger
1212
#ifndef CPROVER_GOTO_PROGRAMS_WRITE_GOTO_BINARY_H
1313
#define CPROVER_GOTO_PROGRAMS_WRITE_GOTO_BINARY_H
1414

15-
#define GOTO_BINARY_VERSION 5
15+
#define GOTO_BINARY_VERSION 6
1616

1717
#include <iosfwd>
1818
#include <string>

0 commit comments

Comments
 (0)