-
Notifications
You must be signed in to change notification settings - Fork 747
Turn off regex default features. #1643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Turn off regex default features. #1643
Conversation
Should the unicode features (or any others) stay enabled by default? |
Thank you! This looks reasonable to me... I want to see if CI says something interesting, but right now it's busted because there's no rustfmt in rust nightly (https://rust-lang.github.io/rustup-components-history/).
Perhaps, yeah... |
@emilio rustfmt should be present again. Could the tests be re-run? |
0fc34e8
to
c7ffca0
Compare
This doesn't seem to build:
|
But this looks pretty ok to me otherwise. |
c7ffca0
to
6a11867
Compare
Odd, it built fine here, but I added std anyways. Maybe I was doing something wrong. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
#1643 disabled many deafault features of the `regex` crate but left the `unicode` meta feature enabled. With the `unicode` feature enabled and `bindgen` as a build dependency, `regex-syntax` (a direct dependency of the `regex` crate) takes 7 seconds to compile as a build dependency in my application. The `unicode` feature includes support for many Unicode character class lookups which I find unlikely that bindgen uses. From https://docs.rs/regex/latest/regex/#unicode-features: > - unicode-age - Provide the data for the Unicode Age property. This > makes it possible to use classes like `\p{Age:6.0}` to refer to all > codepoints first introduced in Unicode 6.0 > - unicode-bool - Provide the data for numerous Unicode boolean > properties. The full list is not included here, but contains > properties like `Alphabetic`, `Emoji`, `Lowercase`, `Math`, > `Uppercase` and `White_Space`. > - unicode-case - Provide the data for case insensitive matching using > Unicode's "simple loose matches" specification. > - unicode-gencat - Provide the data for Unicode general categories. > This includes, but is not limited to, `Decimal_Number`, `Letter`, > `Math_Symbol`, `Number` and `Punctuation`. > - unicode-script - Provide the data for Unicode scripts and script > extensions. This includes, but is not limited to, `Arabic`, `Cyrillic`, > `Hebrew`, `Latin` and `Thai`. > - unicode-segment - Provide the data necessary to provide the > properties used to implement the Unicode text segmentation > algorithms. This enables using classes like `\p{gcb=Extend}`, > `\p{wb=Katakana}` and `\p{sb=ATerm}`. I have retained the `unicode-perl` feature, which gives support for `\w`, `\s` and `\d`, because these character classes were required to get tests to pass. Removing support for these character classes removes the need to compile many data tables, which should significantly reduce compile times.
Fulfills #1622