Skip to content

Do not hardcode char/bytes having 8 bits #917

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 18, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 3 additions & 5 deletions src/cprover/cprover_parse_options.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -308,11 +308,9 @@ void cprover_parse_optionst::help()
{
std::cout << '\n';

std::cout
<< u8"* * CPROVER " << CBMC_VERSION << " (" << (sizeof(void *) * 8)
<< "-bit)"
<< " * *\n"
<< "* * Copyright 2022 * *\n";
std::cout << '\n'
<< banner_string("CPROVER", CBMC_VERSION) << '\n'
<< align_center_with_border("Copyright 2022") << '\n';

// clang-format off
std::cout << help_formatter(
Expand Down
5 changes: 3 additions & 2 deletions src/goto-instrument/contracts/memory_predicates.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ void is_fresh_baset::update_ensures(goto_programt &ensures)

array_typet is_fresh_baset::get_memmap_type()
{
return array_typet(c_bool_typet(8), infinity_exprt(size_type()));
Copy link
Collaborator

@remi-delmas-3000 remi-delmas-3000 Oct 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to propagate this to instrumentation code ? This code does not represent anything running on an actual processor. Should c_bool_typet(8) be deprecated completely ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with an arbitrary type, but we must not have "8" as a magic number. I assumed this was "8" in the first place so as to have byte granularity? Was I wrong?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My usual assumption is that architecture should be well defined at the point of running the front end. This is because the C front end runs an external C pre-processor which can have architecture specific macro expansions. As goto-instrument runs after the front end, it is operating on an architecture specific program form. Note for example that the c_bool_type() construction function in src/util/c_types.cpp references the same config field.

return array_typet(unsigned_char_type(), infinity_exprt(size_type()));
}

void is_fresh_baset::add_memory_map_decl(goto_programt &program)
Expand All @@ -149,7 +149,8 @@ void is_fresh_baset::add_memory_map_decl(goto_programt &program)
goto_programt::make_decl(memmap_symbol.symbol_expr(), source_location));
program.add(goto_programt::make_assignment(
memmap_symbol.symbol_expr(),
array_of_exprt(from_integer(0, c_bool_typet(8)), get_memmap_type()),
array_of_exprt(
from_integer(0, get_memmap_type().element_type()), get_memmap_type()),
source_location));
}

Expand Down
3 changes: 2 additions & 1 deletion src/goto-instrument/stack_depth.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Date: November 2011

#include <util/arith_tools.h>
#include <util/bitvector_types.h>
#include <util/c_types.h>

#include <goto-programs/goto_convert_functions.h>
#include <goto-programs/goto_model.h>
Expand All @@ -26,7 +27,7 @@ static symbol_exprt add_stack_depth_symbol(
message_handlert &message_handler)
{
const irep_idt identifier="$stack_depth";
unsignedbv_typet type(sizeof(std::size_t)*8);
typet type = size_type();

symbolt new_symbol;
new_symbol.name=identifier;
Expand Down
6 changes: 4 additions & 2 deletions src/goto-instrument/synthesizer/expr_enumerator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ Author: Qinheping Hu
#include <util/format_expr.h>
#include <util/simplify_expr.h>

#include <climits>

expr_sett leaf_enumeratort::enumerate(const std::size_t size) const
{
// Size of leaf expressions must be 1.
Expand Down Expand Up @@ -185,7 +187,7 @@ get_partitions_long(const std::size_t n, const std::size_t k)
/// Compute all positions of ones in the bit vector `v` (1-indexed).
std::vector<std::size_t> get_ones_pos(std::size_t v)
{
const std::size_t length = sizeof(std::size_t) * 8;
const std::size_t length = sizeof(std::size_t) * CHAR_BIT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description states that you are using the width from the configuration. But this line is using the width for the machine for which cbmc is being compiled. Shouldn't config.ansi_c.char_width be used instead? The distinction could be important if cbmc is to be used for program written for a machine such as a DSP like you suggested in the PR desription. Wouldn't these architectures be more likely to be targeted using a cross compiler rather than porting cbmc to the target hardware itself?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right in that care is required where CHAR_BIT is the right choice and where config.ansi_c.char_width is to be used instead. I hope to have made the right choice in all places, but of course may have made mistakes. For this particular case: as far as I understand the code, this is about the width of bytes that CBMC is being compiled for, and not whatever the verification target might be. So I claim that CHAR_BIT is correct. Now you are also right in that, in all likelihood, one would use a cross compiler for such a DSP, and it may well be that no platform that CBMC ever runs on has CHAR_BIT != 8 - but I'd also like to avoid magic numbers, making it very difficult to tell whether the use of "8" is one that should be config.ansi_c.char_width instead, or is actually fine for it's the compile-time byte width on all relevant platforms.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are convinced that we are depending on the right thing in this case, then I am happy to merge this as is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This specific case it is to enumerate various bit sequences, and this enumeration is done by the compiled CBMC binary.

std::vector<std::size_t> result;

// Start from the lowest bit at position `length`
Expand Down Expand Up @@ -236,7 +238,7 @@ std::list<partitiont> non_leaf_enumeratort::get_partitions(
return {};

// Number of bits at all.
const std::size_t length = sizeof(std::size_t) * 8;
const std::size_t length = sizeof(std::size_t) * CHAR_BIT;

// This bithack-based implementation works only for `n` no larger than
// `length`. Use the vector-based implementation `n` is too large.
Expand Down
7 changes: 4 additions & 3 deletions src/goto-programs/goto_trace.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,10 @@ Author: Daniel Kroening

#include "goto_trace.h"

#include <ostream>

#include <util/arith_tools.h>
#include <util/byte_operators.h>
#include <util/c_types.h>
#include <util/config.h>
#include <util/format_expr.h>
#include <util/merge_irep.h>
#include <util/range.h>
Expand All @@ -28,6 +27,8 @@ Author: Daniel Kroening

#include "printf_formatter.h"

#include <ostream>

static optionalt<symbol_exprt> get_object_rec(const exprt &src)
{
if(src.id()==ID_symbol)
Expand Down Expand Up @@ -195,7 +196,7 @@ static std::string numeric_representation(
for(const auto c : result)
{
oss << c;
if(++i % 8 == 0 && result.size() != i)
if(++i % config.ansi_c.char_width == 0 && result.size() != i)
oss << ' ';
}
if(options.base_prefix)
Expand Down
81 changes: 46 additions & 35 deletions src/util/pointer_offset_size.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ Author: Daniel Kroening, [email protected]
#include "arith_tools.h"
#include "byte_operators.h"
#include "c_types.h"
#include "config.h"
#include "invariant.h"
#include "namespace.h"
#include "pointer_expr.h"
Expand All @@ -38,14 +39,14 @@ optionalt<mp_integer> member_offset(
{
const std::size_t w = to_c_bit_field_type(comp.type()).get_width();
bit_field_bits += w;
result += bit_field_bits / 8;
bit_field_bits %= 8;
result += bit_field_bits / config.ansi_c.char_width;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming we guarantee that char_width is always different than zero in all these cases, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, though we might one day want to have some architecture sanity check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be loaded from file through configt::set_from_symbol_table and symtab2gb. Therefore it could theoretically be set to 0 in the file and I am not sure we have appropriate validation that the value from file is not 0.

bit_field_bits %= config.ansi_c.char_width;
}
else if(comp.type().id() == ID_bool)
{
++bit_field_bits;
result += bit_field_bits / 8;
bit_field_bits %= 8;
result += bit_field_bits / config.ansi_c.char_width;
bit_field_bits %= config.ansi_c.char_width;
}
else
{
Expand Down Expand Up @@ -92,7 +93,7 @@ pointer_offset_size(const typet &type, const namespacet &ns)
auto bits = pointer_offset_bits(type, ns);

if(bits.has_value())
return (*bits + 7) / 8;
return (*bits + config.ansi_c.char_width - 1) / config.ansi_c.char_width;
else
return {};
}
Expand Down Expand Up @@ -249,16 +250,16 @@ optionalt<exprt> member_offset_expr(
{
std::size_t w = to_c_bit_field_type(c.type()).get_width();
bit_field_bits += w;
const std::size_t bytes = bit_field_bits / 8;
bit_field_bits %= 8;
const std::size_t bytes = bit_field_bits / config.ansi_c.char_width;
bit_field_bits %= config.ansi_c.char_width;
if(bytes > 0)
result = plus_exprt(result, from_integer(bytes, result.type()));
}
else if(c.type().id() == ID_bool)
{
++bit_field_bits;
const std::size_t bytes = bit_field_bits / 8;
bit_field_bits %= 8;
const std::size_t bytes = bit_field_bits / config.ansi_c.char_width;
bit_field_bits %= config.ansi_c.char_width;
if(bytes > 0)
result = plus_exprt(result, from_integer(bytes, result.type()));
}
Expand Down Expand Up @@ -289,7 +290,9 @@ optionalt<exprt> size_of_expr(const typet &type, const namespacet &ns)
auto bits = pointer_offset_bits(array_type, ns);

if(bits.has_value())
return from_integer((*bits + 7) / 8, size_type());
return from_integer(
(*bits + config.ansi_c.char_width - 1) / config.ansi_c.char_width,
size_type());
}

auto sub = size_of_expr(array_type.element_type(), ns);
Expand All @@ -316,7 +319,9 @@ optionalt<exprt> size_of_expr(const typet &type, const namespacet &ns)
auto bits = pointer_offset_bits(vector_type, ns);

if(bits.has_value())
return from_integer((*bits + 7) / 8, size_type());
return from_integer(
(*bits + config.ansi_c.char_width - 1) / config.ansi_c.char_width,
size_type());
}

auto sub = size_of_expr(vector_type.element_type(), ns);
Expand Down Expand Up @@ -355,16 +360,16 @@ optionalt<exprt> size_of_expr(const typet &type, const namespacet &ns)
{
std::size_t w = to_c_bit_field_type(c.type()).get_width();
bit_field_bits += w;
const std::size_t bytes = bit_field_bits / 8;
bit_field_bits %= 8;
const std::size_t bytes = bit_field_bits / config.ansi_c.char_width;
bit_field_bits %= config.ansi_c.char_width;
if(bytes > 0)
result = plus_exprt(result, from_integer(bytes, result.type()));
}
else if(c.type().id() == ID_bool)
{
++bit_field_bits;
const std::size_t bytes = bit_field_bits / 8;
bit_field_bits %= 8;
const std::size_t bytes = bit_field_bits / config.ansi_c.char_width;
bit_field_bits %= config.ansi_c.char_width;
if(bytes > 0)
result = plus_exprt(result, from_integer(bytes, result.type()));
}
Expand Down Expand Up @@ -415,7 +420,8 @@ optionalt<exprt> size_of_expr(const typet &type, const namespacet &ns)
}
else
{
mp_integer sub_bytes = (*sub_bits + 7) / 8;
mp_integer sub_bytes =
(*sub_bits + config.ansi_c.char_width - 1) / config.ansi_c.char_width;

if(max_bytes>=0)
{
Expand Down Expand Up @@ -449,19 +455,14 @@ optionalt<exprt> size_of_expr(const typet &type, const namespacet &ns)
type.id()==ID_c_bit_field)
{
std::size_t width=to_bitvector_type(type).get_width();
std::size_t bytes=width/8;
if(bytes*8!=width)
std::size_t bytes = width / config.ansi_c.char_width;
if(bytes * config.ansi_c.char_width != width)
bytes++;
return from_integer(bytes, size_type());
}
else if(type.id()==ID_c_enum)
{
std::size_t width =
to_bitvector_type(to_c_enum_type(type).underlying_type()).get_width();
std::size_t bytes=width/8;
if(bytes*8!=width)
bytes++;
return from_integer(bytes, size_type());
return size_of_expr(to_c_enum_type(type).underlying_type(), ns);
}
else if(type.id()==ID_c_enum_tag)
{
Expand All @@ -475,11 +476,11 @@ optionalt<exprt> size_of_expr(const typet &type, const namespacet &ns)
{
// the following is an MS extension
if(type.get_bool(ID_C_ptr32))
return from_integer(4, size_type());
return from_integer(32 / config.ansi_c.char_width, size_type());

std::size_t width=to_bitvector_type(type).get_width();
std::size_t bytes=width/8;
if(bytes*8!=width)
std::size_t bytes = width / config.ansi_c.char_width;
if(bytes * config.ansi_c.char_width != width)
bytes++;
return from_integer(bytes, size_type());
}
Expand All @@ -497,7 +498,7 @@ optionalt<exprt> size_of_expr(const typet &type, const namespacet &ns)
}
else if(type.id()==ID_string)
{
return from_integer(32/8, size_type());
return from_integer(32 / config.ansi_c.char_width, size_type());
}
else
return {};
Expand Down Expand Up @@ -607,12 +608,17 @@ optionalt<exprt> get_subexpression_at_offset(
// if this member completely contains the target, and this member is
// byte-aligned, recurse into it
if(
offset_bytes * 8 >= m_offset_bits && m_offset_bits % 8 == 0 &&
offset_bytes * 8 + *target_size_bits <= m_offset_bits + *m_size_bits)
offset_bytes * config.ansi_c.char_width >= m_offset_bits &&
m_offset_bits % config.ansi_c.char_width == 0 &&
offset_bytes * config.ansi_c.char_width + *target_size_bits <=
m_offset_bits + *m_size_bits)
{
const member_exprt member(expr, component.get_name(), component.type());
return get_subexpression_at_offset(
member, offset_bytes - m_offset_bits / 8, target_type_raw, ns);
member,
offset_bytes - m_offset_bits / config.ansi_c.char_width,
target_type_raw,
ns);
}

m_offset_bits += *m_size_bits;
Expand All @@ -628,11 +634,14 @@ optionalt<exprt> get_subexpression_at_offset(
// no arrays of non-byte-aligned, zero-, or unknown-sized objects
if(
elem_size_bits.has_value() && *elem_size_bits > 0 &&
*elem_size_bits % 8 == 0 && *target_size_bits <= *elem_size_bits)
*elem_size_bits % config.ansi_c.char_width == 0 &&
*target_size_bits <= *elem_size_bits)
{
const mp_integer elem_size_bytes = *elem_size_bits / 8;
const mp_integer elem_size_bytes =
*elem_size_bits / config.ansi_c.char_width;
const auto offset_inside_elem = offset_bytes % elem_size_bytes;
const auto target_size_bytes = *target_size_bits / 8;
const auto target_size_bytes =
*target_size_bits / config.ansi_c.char_width;
// only recurse if the cell completely contains the target
if(offset_inside_elem + target_size_bytes <= elem_size_bytes)
{
Expand Down Expand Up @@ -660,7 +669,9 @@ optionalt<exprt> get_subexpression_at_offset(
continue;

// if this member completely contains the target, recurse into it
if(offset_bytes * 8 + *target_size_bits <= *m_size_bits)
if(
offset_bytes * config.ansi_c.char_width + *target_size_bits <=
*m_size_bits)
{
const member_exprt member(expr, component.get_name(), component.type());
return get_subexpression_at_offset(
Expand Down
14 changes: 9 additions & 5 deletions src/util/simplify_expr.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -821,6 +821,8 @@ simplify_exprt::simplify_typecast(const typecast_exprt &expr)
}

// circular casts through types shorter than `int`
// we use fixed bit widths as this is specifically for the Java bytecode
// front-end
if(op_type == signedbv_typet(32) && expr.op().id() == ID_typecast)
{
if(expr_type==c_bool_typet(8) ||
Expand Down Expand Up @@ -1817,7 +1819,7 @@ simplify_exprt::simplify_byte_extract(const byte_extract_exprt &expr)
// the next member would be misaligned, abort
if(
!component_bits.has_value() || *component_bits == 0 ||
*component_bits % 8 != 0)
*component_bits % expr.get_bits_per_byte() != 0)
{
failed = true;
break;
Expand Down Expand Up @@ -1868,10 +1870,12 @@ simplify_exprt::simplify_byte_extract(const byte_extract_exprt &expr)
pointer_offset_bits(array_type.element_type(), ns);
if(element_bit_width.has_value() && *element_bit_width > 0)
{
if(*offset > 0 && *offset * 8 % *element_bit_width == 0)
if(
*offset > 0 &&
*offset * expr.get_bits_per_byte() % *element_bit_width == 0)
{
const auto elements_to_erase =
numeric_cast_v<std::size_t>((*offset * 8) / *element_bit_width);
const auto elements_to_erase = numeric_cast_v<std::size_t>(
(*offset * expr.get_bits_per_byte()) / *element_bit_width);
array_exprt slice = to_array_expr(expr.op());
slice.operands().erase(
slice.operands().begin(),
Expand Down Expand Up @@ -1953,7 +1957,7 @@ simplify_exprt::simplify_byte_update(const byte_update_exprt &expr)
if(
root_size.has_value() && *root_size >= 0 && val_size.has_value() &&
*val_size >= 0 && offset_int.has_value() && *offset_int >= 0 &&
*offset_int * 8 + *val_size <= *root_size)
*offset_int * expr.get_bits_per_byte() + *val_size <= *root_size)
{
auto root_bits =
expr2bits(root, expr.id() == ID_byte_update_little_endian, ns);
Expand Down
3 changes: 2 additions & 1 deletion src/util/simplify_expr_struct.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ Author: Daniel Kroening, [email protected]
#include "arith_tools.h"
#include "byte_operators.h"
#include "c_types.h"
#include "config.h"
#include "invariant.h"
#include "namespace.h"
#include "pointer_offset_size.h"
Expand Down Expand Up @@ -185,7 +186,7 @@ simplify_exprt::simplify_member(const member_exprt &expr)

if(target_size.has_value())
{
mp_integer target_bits = target_size.value() * 8;
mp_integer target_bits = target_size.value() * config.ansi_c.char_width;
const auto bits = expr2bits(op, true, ns);

if(bits.has_value() &&
Expand Down