clang: Tokenize more lazily. #1466

emilio · 2018-12-14T02:34:27Z

Instead of converting all the tokens to utf-8 before-hand, which is costly, and
allocating a new vector unconditionally (on top of the one clang already
allocates), just do the tokenization more lazily.

There's actually only one place in the codebase which needs the utf-8 string,
all the others can just work with the byte slice from clang.

This should have no behavior change, other than be faster. In particular, this
halves the time on my machine spent on the test-case from #1465.

I'm not completely sure that this is going to be enough to make it acceptable,
but we should probably do it regardless.

highfive · 2018-12-14T02:34:31Z

Warning

These commits modify unsafe code. Please review it carefully!

Instead of converting all the tokens to utf-8 before-hand, which is costly, and allocating a new vector unconditionally (on top of the one clang already allocates), just do the tokenization more lazily. There's actually only one place in the codebase which needs the utf-8 string, all the others can just work with the byte slice from clang. This should have no behavior change, other than be faster. In particular, this halves the time on my machine spent on the test-case from rust-lang#1465. I'm not completely sure that this is going to be enough to make it acceptable, but we should probably do it regardless.

Although, bindgen needs .enable_function_attribute_detection() to process __attribute__((__warn_unused_result__)) because parsing attrs can be really slow in certain cases. Benches were performed to confirm our case doesn't face that issue. References: rust-lang/rust-bindgen#2149 rust-lang/rust-bindgen#1465 rust-lang/rust-bindgen#1466 rust-lang/rust-bindgen#1467

highfive added the S-awaiting-review label Dec 14, 2018

emilio mentioned this pull request Dec 14, 2018

Performance regression in 0.44.0 for headers that (transitively) include AVX512F intrinsics #1465

Closed

emilio force-pushed the token-lazy branch from c28d9f7 to 7109c48 Compare December 14, 2018 09:57

emilio merged commit eb97c14 into rust-lang:master Dec 14, 2018

emilio deleted the token-lazy branch December 14, 2018 10:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

clang: Tokenize more lazily. #1466

clang: Tokenize more lazily. #1466

Uh oh!

emilio commented Dec 14, 2018

Uh oh!

highfive commented Dec 14, 2018

Uh oh!

Uh oh!

clang: Tokenize more lazily. #1466

clang: Tokenize more lazily. #1466

Uh oh!

Conversation

emilio commented Dec 14, 2018

Uh oh!

highfive commented Dec 14, 2018

Uh oh!

Uh oh!