Skip to content

Feature request: Add support to Poison value #7014

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
celinval opened this issue Jul 18, 2022 · 8 comments
Open

Feature request: Add support to Poison value #7014

celinval opened this issue Jul 18, 2022 · 8 comments
Labels
aws Bugs or features of importance to AWS CBMC users feature request Kani Bugs or features of importance to Kani Rust Verifier

Comments

@celinval
Copy link
Collaborator

celinval commented Jul 18, 2022

As a user, I would like to be able to verify that my program only reads initialized memory. For example, I should be able to verify that the following testcase reads uninitialized memory:

// undef.c
struct Two {
    int first;
    int second;
};

int equal(struct Two* obj) {
    return obj->first == obj->second;
}

int main() {
    struct Two object;
    return equal(&object);
}
CBMC's verification in the example above succeeds.
$ cbmc --bounds-check --pointer-check --memory-leak-check --div-by-zero-check --pointer-overflow-check --conversion-check --undefined-shift-check --nan-check --enum-range-check --pointer-primitive-check --signed-overflow-check --unsigned-overflow-check  undef.c 
CBMC version 5.61.0 (cbmc-5.61.0) 64-bit x86_64 linux
Parsing undef.c
Converting
Type-checking undef
Generating GOTO Program
Adding CPROVER library (x86_64)
Removal of function pointers and virtual functions
Generic Property Instrumentation
Running with 8 object bits, 56 offset bits (default)
Starting Bounded Model Checking
Runtime Symex: 0.00119599s
size of program expression: 48 steps
simple slicing removed 0 assignments
Generated 13 VCC(s), 0 remaining after simplification
Runtime Postprocess Equation: 1.631e-05s

** Results:
function __CPROVER__start
[__CPROVER__start.memory-leak.1] dynamically allocated memory never freed in __CPROVER_memory_leak == NULL: SUCCESS

undef.c function equal
[equal.pointer_dereference.1] line 7 dereference failure: pointer NULL in obj->first: SUCCESS
[equal.pointer_dereference.2] line 7 dereference failure: pointer invalid in obj->first: SUCCESS
[equal.pointer_dereference.3] line 7 dereference failure: deallocated dynamic object in obj->first: SUCCESS
[equal.pointer_dereference.4] line 7 dereference failure: dead object in obj->first: SUCCESS
[equal.pointer_dereference.5] line 7 dereference failure: pointer outside object bounds in obj->first: SUCCESS
[equal.pointer_dereference.6] line 7 dereference failure: invalid integer address in obj->first: SUCCESS
[equal.pointer_dereference.7] line 7 dereference failure: pointer NULL in obj->second: SUCCESS
[equal.pointer_dereference.8] line 7 dereference failure: pointer invalid in obj->second: SUCCESS
[equal.pointer_dereference.9] line 7 dereference failure: deallocated dynamic object in obj->second: SUCCESS
[equal.pointer_dereference.10] line 7 dereference failure: dead object in obj->second: SUCCESS
[equal.pointer_dereference.11] line 7 dereference failure: pointer outside object bounds in obj->second: SUCCESS
[equal.pointer_dereference.12] line 7 dereference failure: invalid integer address in obj->second: SUCCESS

** 0 of 13 failed (1 iterations)
VERIFICATION SUCCESSFUL
While `valgrind` is able to detect an error.
$ clang undef.c -o undef && valgrind --track-origins=yes ./undef 
==5651== Memcheck, a memory error detector
==5651== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==5651== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==5651== Command: ./undef
==5651== 
==5651== Syscall param exit_group(status) contains uninitialised byte(s)
==5651==    at 0x4F22AB6: _Exit (_exit.c:31)
==5651==    by 0x4E81101: __run_exit_handlers (exit.c:132)
==5651==    by 0x4E81129: exit (exit.c:139)
==5651==    by 0x4E5FC8D: (below main) (libc-start.c:344)
==5651==  Uninitialised value was created by a stack allocation
==5651==    at 0x4004A0: main (in /tmp/undef)
==5651== 
==5651== 
==5651== HEAP SUMMARY:
==5651==     in use at exit: 0 bytes in 0 blocks
==5651==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==5651== 
==5651== All heap blocks were freed -- no leaks are possible
==5651== 
==5651== For counts of detected and suppressed errors, rerun with: -v
==5651== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

For that, it would be great if CBMC could:

  1. Allow users to model uninitialized memory with poison value.
  2. Allow users explicitly assign poison to a variable in their proof harnesses.
  3. Explicitly check for poison.
@celinval celinval added aws Bugs or features of importance to AWS CBMC users feature request Kani Bugs or features of importance to Kani Rust Verifier labels Jul 18, 2022
@martin-cs
Copy link
Collaborator

Thanks for the request. Double^WTriple thanks for it coming with a clear minimal test case. A few questions to try to understand the need / functionality / implementation split...

  1. Would a flag, say --uninitialized-check that added assertions for every potential uninitialized read be sufficient?
  2. Do you want to be able to explicitly marks things as uninitialized? If it could be done automatically would that be enough?
  3. Do you envisage uses of poison beyond uninitialized variables?
  4. Would you want to use this in the same way as --bounds-check or is it OK / more sensible to be another tool / pass?

I ask because implementing poison as you describe would require additional state and tracking in the symbolic execution step as it would need to track the poisoned or not status of things. @tautschnig can probably correct me but this feels like a non-trivial addition as it would touch many places.

But, there might be some easier alternatives:

A. It might be possible to make symex_decl a little bit more lazy and only create names on assignment, not declaration. This would catch potentially uninitialized things but doesn't really fit with the "assertions as checks" architecture.

B. goto-instrument has a --uninitialized-check option. At the moment it only works for local variables but it could be extended by using some of the work on the abstract interpreter. That would find potentially uninitialized things. I guess we would then instrument in explicit checks as assertions? That is sounding rather like a goto-analyzer feature though. This feels a lot like that data tainting code.

C. We could use Ali's trick of "non-deterministically save one thing to check set membership" like use after free.

Does any of that make sense? Thoughts on 1--4 most appreciated.

@tautschnig
Copy link
Collaborator

Just to say that I'd also view this as 1) fix/extend --uninitialized-check or 2) fully implement something like shadow memory, which likely is expensive. @peterschrammel might have some ideas, and anyone else who has completely different ideas please do weigh in!

@celinval
Copy link
Collaborator Author

I do want to say that I understand this is a big ask and I think there are a lot of subtleties that would need to be well thought out. I just wanted to kick off the idea and I'm happy to brainstorm more on this.

My use case comes from implementing Kani; hence, having a more fine grain control would be great; for example, rust doesn't allow users to read uninitialized local variables. So checking for that is a bit redundant. On the other hand, types like MaybeUninit are great candidates for those checks.

You are correct that allowing users to explicitly assign poison feels a lot like tainting analysis (which is a valid use case).

Here are a few open questions that I have on the semantics of these checks:

  • Does something like assume(uninit > 0) trigger a failure or mark uninit as nondet that is greater than 0?
  • Should poison be bit precise, e.g.:
union Data {
   int i;
   char c;
};

int convert(char c) {
    union Data data;
    data.c = c;
    return data.i;
}

@kroening
Copy link
Member

Take a look at Pasquale Malacaria's papers -- he has been checking properties like these with CBMC for many years.

@martin-cs
Copy link
Collaborator

@celinval Am happy to discuss these either here or in a call.

In my opinion checking for uninitialised variables and having poison values are rather different things. You could use the poison values to instrument in a check for uninitialised variables in a front-end like Kani but there are A. other ways of checking variable initialised and B. other uses of poison. Which is more important to you, having checks for uninitialised values or having poison values?

@celinval
Copy link
Collaborator Author

celinval commented Aug 2, 2022

That's a great question. I am leaning towards poison, which give us fine control over the analysis and it enables other use cases. That said, it would require more work on our side.

What are the trade offs on the cbmc side? Any idea which one would perform better?

Thanks!

Btw, Sorry for the late response.

@martin-cs
Copy link
Collaborator

@celinval Thanks for the thoughts and no worries about the time.

Uninitialised variables : I would probably make this a two phase analysis. I'd build on the existing uninitialized_domaint or implement something similar with the variable sensitivity domain. You would have to explicitly run goto-instrument or goto-analyzer for the first part. This would find all of the things that /might/ be read uninitialized. For each one it creates a shadow "var_is_initialized" variable, sets it when written and asserts it when read. The second phase would be running CBMC as normal. I wouldn't expect the overhead to be too bad in terms of run time and space or increasing the load on CBMC as reads from potentially uninitialised variables would hopefully be quite rare. It would potentially make your workflow a little more complex but if you were doing #7042 (comment) then it would hopefully be only a handful of lines of code.

Poison : More complex as presumably you would want a __CPROVER_set_poison function, maybe separate read, write and both poison? Maybe a "don't even takes this variable's address" poison? Would you want to be able to remove the poison (__CPROVER_antidote?). If it was a function it would need to be stateful, so you could:

char *tmp = malloc(some);
if (whatever_option_in_the_harness) {
  for (int i = 0; i < some; ++i) {
    __CPROVER_set_poison(some[i]);
  }
} else {
  // It is fine
}

The best way I can think of to do this is to given an (internal, library model) implementation of __CPROVER_set_poison(char *) like this:

char *__CPROVER_saved_poison_value;

void __CPROVER_set_poison (char *spicy_byte) {
  _Bool nondet;
  if (nondet) {
    __CPROVER_saved_poison_value = spicy_byte;
  }
}

Then /for every/ read or write (or address-of, or dereference, ...) that /might/ touch a poisoned value you would then need an instrumentation pass to add an assertion:

assert(&thing_I_was_reading != __CPROVER_saved_poison_value);

That is going to be a lot of assertions and will add to the load on the pointer analysis. Maybe we could have a stateful, abstract interpretation pass to reduce the number of "/might/ touch" locations but I fear it will still be quite heavy.

I am hoping that @tautschnig has a better idea for how to implement poison at lower cost.

@peterschrammel
Copy link
Member

implement poison at lower cost

@martin-cs, coincidentally, something that may help here already exists on a branch (and will hopefully get merged soon as time allows).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws Bugs or features of importance to AWS CBMC users feature request Kani Bugs or features of importance to Kani Rust Verifier
Projects
None yet
Development

No branches or pull requests

5 participants