Skip to content

Fuzz bindgen with C-Smith #969

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
fitzgen opened this issue Sep 8, 2017 · 13 comments
Open

Fuzz bindgen with C-Smith #969

fitzgen opened this issue Sep 8, 2017 · 13 comments

Comments

@fitzgen
Copy link
Member

fitzgen commented Sep 8, 2017

C-Smith generates C programs for fuzzing C compilers. We should create infrastructure (scripts, etc) for running it against bindgen, that we check into tree. And we should run it against bindgen.

We should look into if C-Smith supports things like controlling the kinds of programs it generates. We don't care about functions' bodies generally, and we care a whole lot about different kinds of type definitions.

@fitzgen
Copy link
Member Author

fitzgen commented Sep 25, 2017

Note: #1033 added some initial infrastructure for fuzzing bindgen with C-Smith, but it appears not to be super useful yet.

Additional help tweaking knobs to get C-Smith to exercise bindgen a little better would be very welcome!

Also, perhaps seeing if running C-Smith for longer periods ends up uncovering anything.

Finally, it might make sense to file issues upstream with C-Smith if we can't tweak knobs to our satisfaction.

bors-servo pushed a commit that referenced this issue Sep 25, 2017
csmith fuzzing

ref #969

An initial version of a script that fuzzes bindgen with csmith. I ran it for maybe 1000 iterations and it did not find something wrong. The programs generated by csmith are probably too simple type wise.

Here is an example output of what csmith generates:
``` C
/* --- Struct/Union Declarations --- */
union U2 {
   uint64_t  f0;
   const signed f1 : 18;
};

union U4 {
   const volatile signed f0 : 1;
   volatile int16_t  f1;
   int32_t  f2;
   int8_t * const  f3;
   volatile int64_t  f4;
};

union U5 {
   const int8_t * f0;
   volatile int8_t  f1;
   uint16_t  f2;
   unsigned f3 : 22;
};

/* --- GLOBAL VARIABLES --- */
static int8_t g_3[8] = {0x47L,0xE8L,0x47L,0x47L,0xE8L,0x47L,0x47L,0xE8L};
static int32_t g_25 = 0x3421AD7BL;
static union U5 g_40 = {0};/* VOLATILE GLOBAL g_40 */
static int32_t g_43[4][3] = {{(-10L),(-10L),(-10L)},{(-10L),(-10L),(-10L)},{(-10L),(-10L),(-10L)},{(-10L),($static int32_t * volatile g_42 = &g_43[0][0];/* VOLATILE GLOBAL g_42 */
static int32_t * volatile g_50 = &g_43[2][0];/* VOLATILE GLOBAL g_50 */
static int32_t g_53 = (-9L);
static union U4 g_57 = {0x9C113E7BL};/* VOLATILE GLOBAL g_57 */

/* --- FORWARD DECLARATIONS --- */
static union U4  func_1(void);
static int16_t  func_4(int32_t  p_5);
static int32_t  func_6(int32_t  p_7, union U2  p_8, int8_t * p_9);
static int32_t  func_10(uint32_t  p_11, int8_t * p_12, int8_t * p_13);
static int8_t * func_14(int32_t  p_15, union U2  p_16);
static union U2  func_28(const uint64_t  p_29, int8_t * p_30, uint32_t  p_31);
static union U5  func_34(uint32_t  p_35);
```
@pepyakin
Copy link
Contributor

I'm wondering will it be useful to run rustc in check-only mode (as in cargo check) on bindings generated by the bindgen.

@fitzgen
Copy link
Member Author

fitzgen commented Sep 25, 2017

Yes, definitely. In fact, we should run the tests, so that we can assert size and alignment.

@e00E
Copy link
Contributor

e00E commented Sep 25, 2017

Are there any knobs we can tweak to make the types more complex?

In what kind of way would types get more complex anyway? In plain C they usually stay fairly simple.
The following command line switched of csmith seemed relevant:

--max-expr-complexity <num>: limit expression complexities to <num> (default 10).
--max-funcs <num>: limit the number of functions (besides main) to <num>  (default 10).
--max-struct-fields <num>: limit the number of struct fields to <num> (default 10).
--max-union-fields <num>: limit the number of union fields to <num> (default 5).

but I did not set them because I would not expect having more "things" is more complex from bindgen's perspective.

@e00E
Copy link
Contributor

e00E commented Sep 25, 2017

I just found a file that makes bindgen say
ERROR:bindgen::codegen::struct_layout: Calculated wrong layout for S5, too more 2 bytes
but it does create a full output rust file and does have exit code 0 (I tested other failures earlier which had exit code 1, so currently the script only looks for nonzero exit codes). If that means that bindgen will not always have a non zero exit code on failure then we will need to parse the output of stderr for lines start with ERROR , right?
input:

#pragma pack(push)
#pragma pack(1)
struct S5 {
   signed f0 : 11;
   unsigned f1 : 12;
   unsigned f2 : 23;
};
#pragma pack(pop)

output:

#[repr(C)]
#[derive(Debug, Copy)]
pub struct S5 {
    pub _bitfield_1: [u32; 2usize],
    pub __bindgen_align: [u8; 0usize],
}
... (tests and impls)

@e00E
Copy link
Contributor

e00E commented Sep 25, 2017

What is the best way to run the tests in the bindgen output? In a full project I would run cargo test but is there a better way?

@pepyakin
Copy link
Contributor

I'm not sure whether this is best way, but you can try following command

bindgen [header.h] | rustc --test | ./rust_out

@e00E
Copy link
Contributor

e00E commented Sep 25, 2017

Looks like a test fails in the above example as well, so the new logic for checking if bindgen worked correctly would be:

  • check for non zero exit code of bindgen
  • check for any line of stderr starting with ERROR (or maybe even anything output to stderr, period)
  • check for non zero exit code when executing the tests

@fitzgen
Copy link
Member Author

fitzgen commented Sep 25, 2017

Just found: #1034

Made a label for all issues we find with C-Smith: A-csmith.

Also cleaned up the driver.py, will send a PR in a second.

@e00E
Copy link
Contributor

e00E commented Sep 25, 2017

Ah, Im currenty changing it to follow the procedure outlined in my previous post, I will see what you changed.

@fitzgen
Copy link
Member Author

fitzgen commented Sep 25, 2017

Also, an idea: #1036

@pepyakin
Copy link
Contributor

pepyakin commented Oct 2, 2017

I'm wondering if there is a way to collect all known problems found by C-Smith and don't stop fuzzing if found any of them? 🤔

@fitzgen
Copy link
Member Author

fitzgen commented Oct 2, 2017

I'm wondering if there is a way to collect all known problems found by C-Smith and don't stop fuzzing if found any of them? 🤔

Not sure... maybe we just need to fix all the issues more quickly ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants