Skip to content

Parse macro expressions. #219

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 9, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ env_logger = "0.3"
rustc-serialize = "0.3.19"
syntex_syntax = "0.44"
regex = "0.1"
cexpr = "0.2"

[dependencies.aster]
features = ["with-syntex"]
Expand Down
14 changes: 14 additions & 0 deletions src/chooser.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
//! A public API for more fine-grained customization of bindgen behavior.

pub use ir::int::IntKind;
use std::fmt;

/// A trait to allow configuring different kinds of types in different
/// situations.
pub trait TypeChooser: fmt::Debug {
/// The integer kind an integer macro should have, given a name and the
/// value of that macro, or `None` if you want the default to be chosen.
fn int_macro(&self, _name: &str, _value: i64) -> Option<IntKind> {
None
}
}
69 changes: 63 additions & 6 deletions src/clang.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,9 @@
#![allow(non_upper_case_globals, dead_code)]


use cexpr;
use clangll::*;
use std::{mem, ptr};
use std::{mem, ptr, slice};
use std::ffi::{CStr, CString};
use std::fmt;
use std::hash::Hash;
Expand Down Expand Up @@ -1051,18 +1052,18 @@ impl TranslationUnit {
let range = cursor.extent();
let mut tokens = vec![];
unsafe {
let mut token_ptr = ::std::ptr::null_mut();
let mut token_ptr = ptr::null_mut();
let mut num_tokens: c_uint = 0;
clang_tokenize(self.x, range, &mut token_ptr, &mut num_tokens);
if token_ptr.is_null() {
return None;
}
let token_array = ::std::slice::from_raw_parts(token_ptr,
num_tokens as usize);

let token_array = slice::from_raw_parts(token_ptr,
num_tokens as usize);
for &token in token_array.iter() {
let kind = clang_getTokenKind(token);
let spelling: String = clang_getTokenSpelling(self.x, token)
.into();
let spelling = clang_getTokenSpelling(self.x, token).into();

tokens.push(Token {
kind: kind,
Expand All @@ -1073,6 +1074,62 @@ impl TranslationUnit {
}
Some(tokens)
}

/// Convert a set of tokens from clang into `cexpr` tokens, for further
/// processing.
pub fn cexpr_tokens(&self,
cursor: &Cursor)
-> Option<Vec<cexpr::token::Token>> {
use cexpr::token;

let mut tokens = match self.tokens(cursor) {
Some(tokens) => tokens,
None => return None,
};

// FIXME(emilio): LLVM 3.9 at least always include an extra token for no
// good reason (except if we're at EOF). So we do this kind of hack,
// where we skip known-to-cause problems trailing punctuation and
// trailing keywords.
Copy link
Contributor

@jethrogb jethrogb Nov 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not just 3.9, I've observed this in 3.5 through 3.8. The problem is that the cursor extent is always off by one (even on EOF). You don't get an extra token on EOF, because, well, you're at EOF.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's really unfortunate :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After inspecting a bit clang's source, the source range for macro expansions comes from deep in the lexer. That's more that what I want to chew in right now inside LLVM internals, so I'll open a bug for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't that hard actually.

LLVM bug (open since 2011): https://llvm.org/bugs/show_bug.cgi?id=9069
Patch: https://reviews.llvm.org/D26446

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in trunk :)

llvm-mirror/clang@3b61b92

//
// This is sort of unfortunate, though :(.
//
// I'll try to get it fixed in LLVM if I have the time to submit a
// patch.
let mut trim_last_token = false;
if let Some(token) = tokens.last() {
// The starting of the next macro.
trim_last_token |= token.spelling == "#" &&
token.kind == CXToken_Punctuation;

// A following keyword of any kind, like a following declaration.
trim_last_token |= token.kind == CXToken_Keyword;
}

if trim_last_token {
tokens.pop().unwrap();
}

Some(tokens.into_iter()
.filter_map(|token| {
let kind = match token.kind {
CXToken_Punctuation => token::Kind::Punctuation,
CXToken_Literal => token::Kind::Literal,
CXToken_Identifier => token::Kind::Identifier,
CXToken_Keyword => token::Kind::Keyword,
// NB: cexpr is not too happy about comments inside
// expressions, so we strip them down here.
CXToken_Comment => return None,
_ => panic!("Found unexpected token kind: {}", token.kind),
};

Some(token::Token {
kind: kind,
raw: token.spelling.into_bytes().into_boxed_slice(),
})
})
.collect::<Vec<_>>())
}
}

impl Drop for TranslationUnit {
Expand Down
11 changes: 11 additions & 0 deletions src/codegen/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1453,8 +1453,19 @@ impl ToRustTy for Type {
IntKind::ULong => raw!(c_ulong),
IntKind::LongLong => raw!(c_longlong),
IntKind::ULongLong => raw!(c_ulonglong),

IntKind::I8 => aster::ty::TyBuilder::new().i8(),
IntKind::U8 => aster::ty::TyBuilder::new().u8(),
IntKind::I16 => aster::ty::TyBuilder::new().i16(),
IntKind::U16 => aster::ty::TyBuilder::new().u16(),
IntKind::I32 => aster::ty::TyBuilder::new().i32(),
IntKind::U32 => aster::ty::TyBuilder::new().u32(),
IntKind::I64 => aster::ty::TyBuilder::new().i64(),
IntKind::U64 => aster::ty::TyBuilder::new().u64(),
IntKind::Custom { name, .. } => {
let ident = ctx.rust_ident_raw(name);
quote_ty!(ctx.ext_cx(), $ident)
}
// FIXME: This doesn't generate the proper alignment, but we
// can't do better right now. We should be able to use
// i128/u128 when they're available.
Expand Down
29 changes: 19 additions & 10 deletions src/ir/context.rs
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
//! Common context that is passed around during parsing and codegen.

use BindgenOptions;
use cexpr;
use clang::{self, Cursor};
use parse::ClangItemParser;
use std::borrow::{Borrow, Cow};
use std::collections::{HashMap, HashSet, hash_map};
use std::borrow::Cow;
use std::collections::{HashMap, hash_map};
use std::collections::btree_map::{self, BTreeMap};
use std::fmt;
use super::int::IntKind;
Expand Down Expand Up @@ -77,8 +78,9 @@ pub struct BindgenContext<'ctx> {
pub currently_parsed_types: Vec<(Cursor, ItemId)>,

/// A HashSet with all the already parsed macro names. This is done to avoid
/// hard errors while parsing duplicated macros.
parsed_macros: HashSet<String>,
/// hard errors while parsing duplicated macros, as well to allow macro
/// expression parsing.
parsed_macros: HashMap<Vec<u8>, cexpr::expr::EvalResult>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this become Vec<u8> instead of String?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's what cexpr wants.


/// The active replacements collected from replaces="xxx" annotations.
replacements: HashMap<String, ItemId>,
Expand Down Expand Up @@ -243,7 +245,7 @@ impl<'ctx> BindgenContext<'ctx> {

/// Returns a mangled name as a rust identifier.
pub fn rust_ident_raw(&self, name: &str) -> Ident {
self.ext_cx().ident_of(name.borrow())
self.ext_cx().ident_of(name)
}

/// Iterate over all items that have been defined.
Expand Down Expand Up @@ -715,14 +717,21 @@ impl<'ctx> BindgenContext<'ctx> {
}

/// Have we parsed the macro named `macro_name` already?
pub fn parsed_macro(&self, macro_name: &str) -> bool {
self.parsed_macros.contains(macro_name)
pub fn parsed_macro(&self, macro_name: &[u8]) -> bool {
self.parsed_macros.contains_key(macro_name)
}

/// Get the currently parsed macros.
pub fn parsed_macros(&self) -> &HashMap<Vec<u8>, cexpr::expr::EvalResult> {
debug_assert!(!self.in_codegen_phase());
&self.parsed_macros
}

/// Mark the macro named `macro_name` as parsed.
pub fn note_parsed_macro(&mut self, macro_name: String) {
debug_assert!(!self.parsed_macros.contains(&macro_name));
self.parsed_macros.insert(macro_name);
pub fn note_parsed_macro(&mut self,
id: Vec<u8>,
value: cexpr::expr::EvalResult) {
self.parsed_macros.insert(id, value);
}

/// Are we in the codegen phase?
Expand Down
41 changes: 35 additions & 6 deletions src/ir/int.rs
Original file line number Diff line number Diff line change
Expand Up @@ -36,29 +36,58 @@ pub enum IntKind {
/// An `unsigned long long`.
ULongLong,

/// A 8-bit signed integer.
I8,

/// A 8-bit unsigned integer.
U8,

/// A 16-bit signed integer.
I16,

/// Either a `char16_t` or a `wchar_t`.
U16,

/// A `char32_t`.
/// A 32-bit signed integer.
I32,

/// A 32-bit unsigned integer.
U32,

/// A 64-bit signed integer.
I64,

/// A 64-bit unsigned integer.
U64,

/// An `int128_t`
I128,

/// A `uint128_t`.
U128, /* Though now we're at it we could add equivalents for the rust
* types... */
U128,

/// A custom integer type, used to allow custom macro types depending on
/// range.
Custom {
/// The name of the type, which would be used without modification.
name: &'static str,
/// Whether the type is signed or not.
is_signed: bool,
},
}

impl IntKind {
/// Is this integral type signed?
pub fn is_signed(&self) -> bool {
use self::IntKind::*;
match *self {
Bool | UChar | UShort | UInt | ULong | ULongLong | U16 | U32 |
U128 => false,
Bool | UChar | UShort | UInt | ULong | ULongLong | U8 | U16 |
U32 | U64 | U128 => false,

Char | Short | Int | Long | LongLong | I8 | I16 | I32 | I64 |
I128 => true,

Char | Short | Int | Long | LongLong | I128 => true,
Custom { is_signed, .. } => is_signed,
}
}
}
Loading