Skip to content

make pointer_offset_exprt unsigned #6893

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 20, 2023
Merged

Conversation

kroening
Copy link
Member

@kroening kroening commented May 27, 2022

This commit changes the type of __CPROVER_POINTER_OFFSET from ssize_t to
size_t. To match, relational operators on pointers are now unsigned, to
enable p+PTRDIFF_MAX+1 > p.

The rationale is subtle. The ISO C 11 standard 6.5.8 par 5 suggests that a
pair of pointers can be compared if either the pointers point into the
object, or one beyond the end of the sequence. The behavior in the case of
a pointer that points before the sequence is undefined. There is a similar
narrative for pointer arithmetic. A pointer with an offset that has a set
most significant bit should thus compare "less than" a pointer with an
offset that has a zero MSB. This suggests an unsigned encoding.

  • Each commit message has a non-empty body, explaining why the change was made.
  • n/a Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unconvinced the required change is as small as the current PR suggests: there are several uses of bv_utils.sign_extension in bv_pointers.cpp, and I believe these need to be changed.

@kroening kroening force-pushed the unsigned_pointer_offset branch 3 times, most recently from c6ae1ab to cdcba8f Compare May 31, 2022 10:40
@codecov
Copy link

codecov bot commented May 31, 2022

Codecov Report

Base: 78.48% // Head: 78.48% // Decreases project coverage by -0.01% ⚠️

Coverage data is based on head (7cb83c1) compared to base (fb65095).
Patch coverage: 93.18% of modified lines in pull request are covered.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #6893      +/-   ##
===========================================
- Coverage    78.48%   78.48%   -0.01%     
===========================================
  Files         1663     1663              
  Lines       191189   191232      +43     
===========================================
+ Hits        150055   150082      +27     
- Misses       41134    41150      +16     
Impacted Files Coverage Δ
src/util/pointer_predicates.cpp 87.23% <76.47%> (-7.89%) ⬇️
src/util/lower_byte_operators.cpp 92.85% <95.91%> (+0.08%) ⬆️
src/ansi-c/goto_check_c.cpp 91.60% <100.00%> (ø)
src/cprover/bv_pointers_wide.cpp 58.28% <100.00%> (ø)
src/pointer-analysis/value_set_dereference.cpp 99.38% <100.00%> (+0.01%) ⬆️
src/solvers/flattening/bv_pointers.cpp 86.28% <100.00%> (ø)
src/util/replace_expr.cpp 100.00% <100.00%> (ø)
unit/util/pointer_expr.cpp 100.00% <100.00%> (ø)
src/solvers/smt2/smt2_format.cpp 85.39% <0.00%> (-1.13%) ⬇️
src/goto-programs/string_abstraction.cpp 90.93% <0.00%> (-0.78%) ⬇️
... and 2 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@kroening kroening marked this pull request as ready for review May 31, 2022 12:05
@tautschnig
Copy link
Collaborator

I'm unconvinced the required change is as small as the current PR suggests: there are several uses of bv_utils.sign_extension in bv_pointers.cpp, and I believe these need to be changed.

Slightly amending my concern as discussed out of band: we likely make a mess at the moment when converting the result of pointer (offset) arithmetic back to the bit-limited offset representation where we may or may not end up producing a negative number in case of overflow/underflow. This needs looking into, or perhaps we should just invest our energy in the alternative pointer encodings.

@tautschnig
Copy link
Collaborator

I believe the only missing piece for this one is adjusting pointer_offset in pointer_predicates.cpp.

@tautschnig tautschnig removed their assignment Oct 9, 2022
@kroening kroening force-pushed the unsigned_pointer_offset branch from cdcba8f to d107e31 Compare October 9, 2022 15:15
@tautschnig
Copy link
Collaborator

I believe the only missing piece for this one is adjusting pointer_offset in pointer_predicates.cpp.

We now have a bunch of failing regression tests with warning: ignoring >= as that comparison now mixes signed/unsigned types. Looks like we are hardcoding the signed offset in a bunch of places?

@kroening kroening force-pushed the unsigned_pointer_offset branch 7 times, most recently from ebb0827 to 2a46b9f Compare October 10, 2022 17:14
@kroening kroening assigned peterschrammel and unassigned kroening Oct 10, 2022
@kroening kroening force-pushed the unsigned_pointer_offset branch from 2a46b9f to 1183c4a Compare January 12, 2023 09:42
@kroening kroening force-pushed the unsigned_pointer_offset branch 2 times, most recently from 6aa4277 to b200467 Compare January 12, 2023 12:23
@kroening kroening force-pushed the unsigned_pointer_offset branch from b200467 to 268b6c3 Compare January 12, 2023 13:42
@tautschnig tautschnig self-assigned this Jan 12, 2023
@tautschnig tautschnig force-pushed the unsigned_pointer_offset branch from 268b6c3 to 8ef892c Compare January 16, 2023 13:30
@tautschnig tautschnig force-pushed the unsigned_pointer_offset branch from 8ef892c to 1cfb9cf Compare January 17, 2023 13:00
Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a temporary blocker: I am now expecting all tests to pass, but @remi-delmas-3000 is taking a look at all pointer arithmetic done contracts code to make sure no further updates are required.

@remi-delmas-3000
Copy link
Collaborator

remi-delmas-3000 commented Jan 17, 2023

I wrote a test to check that I can perform inclusion checks using pointers that are one past the end of the object, it seems to work except for one thing I don’t understand:

#include <stdio.h>
#include <assert.h>
#include <stdbool.h>

int main(void)
{
    size_t size;
    size_t max_malloc_size = __CPROVER_max_malloc_size;
    __CPROVER_assume(0 <= size && size <= max_malloc_size);
    char *lb = malloc(size);
    size_t offset_lb = __CPROVER_POINTER_OFFSET(lb);
    size_t object_lb = __CPROVER_POINTER_OBJECT(lb);

    void *ub = lb + size;
    size_t offset_ub = __CPROVER_POINTER_OFFSET(ub);
    size_t object_ub = __CPROVER_POINTER_OBJECT(ub);

    void *ubp1 = lb + size + 1;
    size_t offset_ubp1 = __CPROVER_POINTER_OFFSET(ubp1);
    size_t object_ubp1 = __CPROVER_POINTER_OBJECT(ubp1);

    assert(object_ub == object_lb); // proved
    assert(object_ubp1 == object_lb); // proved
    assert(ubp1 > ub); // proved
    assert(offset_ubp1 > offset_ub); // proved
    assert(offset_ubp1 == offset_ub + 1); // falsified

    size_t idx;
    if (idx < size) {
      void *lb_idx = lb + idx;
      void *dummy_ub = ub;
      assert(lb <= lb_idx); // proved
      assert(lb_idx <= ub); // proved
    }
}

the third assertion is falsified with

max_malloc_size=36028797018963968ul (00000000 10000000 00000000 00000000 00000000 00000000 00000000 00000000)
size=36028797018963967ul (00000000 01111111 11111111 11111111 11111111 11111111 11111111 11111111)
ub=(const void *)((char *)&dynamic_object + 36028797018963967l) (00000010 01111111 11111111 11111111 11111111 11111111 11111111 11111111)
offset_ub=36028797018963967ul (00000000 01111111 11111111 11111111 11111111 11111111 11111111 11111111)
ubp1=(const void *)((char *)&dynamic_object + -36028797018963968l) (00000010 10000000 00000000 00000000 00000000 00000000 00000000 00000000)
offset_ubp1=18410715276690587648ul (11111111 10000000 00000000 00000000 00000000 00000000 00000000 00000000)

In the trace size is taken to be max_malloc_size-1 so that the offset of ubp1 = lb + size + 1 lands exactly at max_malloc_size in the pointer representation but becomes something else when extracted using __CPROVER_POINTER_OFFSET, with leading ones added offset_ubp1. The trace prints the offset as a negative number when printing the pointer.

The counter example goes away if we restrict the allocation size to be strictly less than __CPROVER_max_malloc_size
So it seems like offsets strictly below max_malloc_size get translated without leading ones, offsets greater or equal to max_malloc_size get leading ones. Seems like a remnant of signed conversion for overflowing pointers.

I wasn't expecting unsigned offsets to use the full width, rather N-object-bits at most.

max_malloc_size is still defined as 2^(offset_bits-1) so technically we could still add (max_malloc_size-1) to max_malloc_size before having to overflow into object bits in the pointer representation.

There already is an assertion in place to guard against undefined
behaviour. We shouldn't have to match on particular outcomes of
undefined behaviour that occurs after that assertion.
We must not assume that byte-operator offsets and arrays indices use the
same type (without at least checking that they do). Always use types
available from the context, and type cast as needed.
Make sure that replace_expr cannot introduce type-inconsistent
expressions. This wouldn't be a fault of replace_expr, but rather
requires fixing the caller. Add preconditions to check this.
@tautschnig tautschnig force-pushed the unsigned_pointer_offset branch from 723ea09 to 32fd591 Compare January 19, 2023 16:17
Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now included the test posted by @remi-delmas-3000 together with further fixes (ensuring that we always interpret the offset as unsigned).

kroening and others added 2 commits January 19, 2023 21:40
This commit changes the type of __CPROVER_POINTER_OFFSET from ssize_t to
size_t.  To match, relational operators on pointers are now unsigned, to
enable p+PTRDIFF_MAX+1 > p.

The rationale is subtle.  The ISO C 11 standard 6.5.8 par 5 suggests that a
pair of pointers can be compared if either the pointers point into the
object, or one beyond the end of the sequence.  The behavior in the case of
a pointer that points before the sequence is undefined.  There is a similar
narrative for pointer arithmetic.  A pointer with an offset that has a set
most significant bit should thus compare "less than" a pointer with an
offset that has a zero MSB.  This suggests an unsigned encoding.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants