-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Nul terminate rust string literals #138504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
…r=<try> Nul terminate rust string literals This allows taking advantage of the C string merging functionality of linkers, reducing code size. Marked as draft to see if this actually has much of an effect. The disadvantage of this is that people may start to rely on string literals getting nul terminated. A potential solution for that would be to put a byte that is not part of a valid UTF-8 character right before the nul terminator. Builds on rust-lang#138503
This comment has been minimized.
This comment has been minimized.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (f949c9f): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary 1.6%, secondary 2.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary -2.3%, secondary -3.2%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary -0.5%, secondary -0.3%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 775.364s -> 772.882s (-0.32%) |
7c1cd9b
to
827d66e
Compare
This comment has been minimized.
This comment has been minimized.
827d66e
to
d42a662
Compare
☔ The latest upstream changes (presumably #138503) made this pull request unmergeable. Please resolve the merge conflicts. |
d42a662
to
dce272b
Compare
I don't like this. Why Rust is the one to pretend to be C in order to benefit from a (for no reason but) C-only optimization, instead of making that optimization more general to be applicable on Rust strings? I got here by searching why Rust currently cannot merge |
Because this optimization needs to be done by the linker to be effective and there is no reasonable way for us to modify every linker a user may want to use to support string merging for non-C strings. For Windows and macOS I think it is unrealistic to expect this will be added within the next 10 years. The Windows linker doesn't even support weak symbols despite being a relatively common C feature (it does seem to support weak aliases though). And Apple is betting om Swift rather than Rust. And on Linux we still support systems from before Rust 1.0 was released, with associated ancient linker. |
I agree that it's better done in linkers and they are likely to only support C strings. But I think it's too early to just give up before any existing attempt to improve eg. lld. At least I couldn't find discussions under LLVM issues. Note that lld is already shipped with Rust toolchain and is used by default on some platforms (#124129). Adding NUL for strings without the need for C-interop (unlike #135054) should be more conservative without proven to be required. This PR seems to add NUL for Rust type names, which is not for C-interop and also less applicable for suffix merging. When will you have both I also tried to get some numbers on possible size reduction of
Footnotes
|
☔ The latest upstream changes (presumably #139766) made this pull request unmergeable. Please resolve the merge conflicts. |
I'd like to see a perf run of 0xFF, 0x0 termination (or something like it). If that still appears beneficial (even for some embedded programs) this would be great. While just this seems pretty beneficial and good, I think it makes it way too easy to accidentally rely on, so I think only null termination shouldn't be done (we definitely don't want to guarantee that in the language :D). |
This allows taking advantage of the C string merging functionality of linkers, reducing code size.
dce272b
to
0676972
Compare
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Nul terminate rust string literals This allows taking advantage of the C string merging functionality of linkers, reducing code size. Marked as draft to see if this actually has much of an effect. The disadvantage of this is that people may start to rely on string literals getting nul terminated. A potential solution for that would be to put a byte that is not part of a valid UTF-8 character right before the nul terminator. Builds on #138503
This comment has been minimized.
This comment has been minimized.
0676972
to
4749fc0
Compare
This comment has been minimized.
This comment has been minimized.
This prevents people depending on nul-termination.
4749fc0
to
701f892
Compare
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Nul terminate rust string literals This allows taking advantage of the C string merging functionality of linkers, reducing code size. Marked as draft to see if this actually has much of an effect. The disadvantage of this is that people may start to rely on string literals getting nul terminated. A potential solution for that would be to put a byte that is not part of a valid UTF-8 character right before the nul terminator. Builds on #138503
A job failed! Check out the build log: (web) (plain) Click to see the possible cause of the failure (guessed by this bot)
|
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (7cb6c31): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary -1.9%, secondary -1.2%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (secondary -1.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary -0.4%, secondary -0.3%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 777.423s -> 777.095s (-0.04%) |
It's interesting that for the only two benchmarks that are optimized for minimum binary size this was actually a small binary size regression. I wonder if it's caused by using |
Maybe string merging went first, before determining whether the strings were necessary? |
My guess would be that more size-conscious codebases have fewer mergeable strings is the first place, so the overhead of the terminator outweighs gains from merging. But I haven't verified this. |
This allows taking advantage of the C string merging functionality of linkers, reducing code size.
Marked as draft to see if this actually has much of an effect. The disadvantage of this is that people may start to rely on string literals getting nul terminated. A potential solution for that would be to put a byte that is not part of a valid UTF-8 character right before the nul terminator.
Builds on #138503