Skip to content

Commit 4089b3a

Browse files
committed
Add an architectural overview of bindgen to CONTRIBUTING.md
This should help new contributors who are coming to the code base for the first time get up and running.
1 parent 37af44d commit 4089b3a

File tree

1 file changed

+66
-0
lines changed

1 file changed

+66
-0
lines changed

CONTRIBUTING.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ out to us in a GitHub issue, or stop by
1919
- [Testing a Single Header's Bindings Generation and Compiling its Bindings](#testing-a-single-headers-bindings-generation-and-compiling-its-bindings)
2020
- [Authoring New Tests](#authoring-new-tests)
2121
- [Test Expectations and `libclang` Versions](#test-expectations-and-libclang-versions)
22+
- [Code Overview](#code-overview)
2223
- [Pull Requests and Code Reviews](#pull-requests-and-code-reviews)
2324
- [Generating Graphviz Dot Files](#generating-graphviz-dot-files)
2425
- [Debug Logging](#debug-logging)
@@ -192,6 +193,71 @@ Where `$VERSION` is one of:
192193

193194
depending on which version of `libclang` you have installed.
194195

196+
## Code Overview
197+
198+
`bindgen` takes C and C++ header files as input and generates corresponding Rust
199+
`#[repr(C)]` type definitions and `extern` foreign function declarations.
200+
201+
First, we use `libclang` to parse the input headers. See `src/clang.rs` for our
202+
Rust-y wrappers over the raw C `libclang` API that the `clang-sys` crate
203+
exposes. We walk over `libclang`'s AST and construct our own internal
204+
representation (IR). The `ir` module and submodules (`src/ir/*`) contain the IR
205+
type definitions and `libclang` AST into IR parsing code.
206+
207+
The umbrella IR type is the `Item`. It contains various nested `enum`s that let
208+
us drill down and get more specific about the kind of construct that we're
209+
looking at. Here is a summary of the IR types and their relationships:
210+
211+
* `Item` contains:
212+
* An `ItemId` to uniquely identify it.
213+
* An `ItemKind`, which is one of:
214+
* A `Module`, which is originally a C++ namespace and becomes a Rust
215+
module. It contains the set of `ItemId`s of `Item`s that are defined
216+
within it.
217+
* A `Type`, which contains:
218+
* A `Layout`, describing the type's size and alignment.
219+
* A `TypeKind`, which is one of:
220+
* Some integer type.
221+
* Some float type.
222+
* A `Pointer` to another type.
223+
* A function pointer type, with `ItemId`s of its parameter types
224+
and return type.
225+
* An `Alias` to another type (`typedef` or `using X = ...`).
226+
* A fixed size `Array` of `n` elements of another type.
227+
* A `Comp` compound type, which is either a `struct`, `class`,
228+
or `union`. This is potentially a template definition.
229+
* A `TemplateInstantiation` referencing some template definition
230+
and a set of template argument types.
231+
* Etc...
232+
* A `Function`, which contains:
233+
* An ABI
234+
* A mangled name
235+
* a `FunctionKind`, which describes whether this function is a plain
236+
function, method, static method, constructor, destructor, etc.
237+
* The `ItemId` of its function pointer type.
238+
* A `Var` representing a static variable or `#define` constant, which
239+
contains:
240+
* Its type's `ItemId`
241+
* Optionally, a mangled name
242+
* Optionally, a value
243+
244+
The IR forms a graph of interconnected and inter-referencing types and
245+
functions. The `ir::traversal` module provides IR graph traversal
246+
infrastructure: edge kind definitions (base member vs field type vs function
247+
parameter, etc...), the `Trace` trait to enumerate an IR thing's outgoing edges,
248+
various traversal types.
249+
250+
After constructing the IR, we run a series of analyses on it. These analyses do
251+
everything from allocate logical bitfields into physical units, compute for
252+
which types we can `#[derive(Debug)]`, to determining which implicit template
253+
parameters a given type uses. The analyses are defined in
254+
`src/ir/analysis/*`. They are implemented as fixed-point algorithms, using the
255+
`ir::analysis::MonotoneFramework` trait.
256+
257+
The final phase is generating Rust source text from the analyzed IR, and it is
258+
defined in `src/codegen/*`. We use the `quote` crate, which provides the `quote!
259+
{ ... }` macro for quasi-quoting Rust forms.
260+
195261
## Pull Requests and Code Reviews
196262

197263
Ensure that each commit stands alone, and passes tests. This enables better `git

0 commit comments

Comments
 (0)