rustc_session: be more precise about -Z plt=yes on x86-64? #141720

iximeow · 2025-05-29T06:23:48Z

in #109982 rustc switched to -Z plt=yes on non-x86-64 platforms for a bunch of good reasons. and stuck with -Z plt=no by default on x86-64 for also good reasons! unfortunately, defaulting to -Z plt=no is a slight pessimization in programs heavily dependent on calls into statically linked libraries.

PLT calls on x86 end up compiled to e8 <addr> calls, which at link time can be rewritten to direct calls to the callee, and presumably deletion of the GOT entry. when we skip the PLT on x86-64, it seems that linkers are unwilling to do a link-time optimization of ff 15 <GOT addr> into 90 e8 <fn addr> when the callee is local to the object, so an indirect call to the object-local persists*.

i expect -Z plt=no to be better than -Z plt=yes on x86-64 for all cases where the called functions are dynamically linked. i also expect -Z plt=no to be worse than -Z plt=yes on x86-64 for all cases where the called functions are statically linked and <4 GiB from their call sites. it'd be nice if we could skip non_lazy_bind if we know the called function is to be statically linked. if your compiled artifact is >4 GiB .. i've heard of such things but have no idea what's best :)

"if we know the called function is to be statically linked" is the more annoying problem, though, because rustc-link-lib tells rustc only what libraries get what kind of linkage. especially on Unix-y platforms we don't know which of those platforms will provide a given symbol. the extern block can have a #[link(kind="static")] attribute which i've used in this minimized example of the problem i'm talking about, which almost seems like enough information to choose when to do this optimization at codegen-time. unfortunately, if the source file says #[link(name="util", kind="static")] extern "C" { pub fn foo(); }, and then you compile that source like rustc -l dylib=util ..., the command-line parameter simply overrides the link attribute and you end up with a dynamic link to foo with the (in context) reasonable ff 15 [GOT_entry] call.

because of the #[link]/-l KIND=NAME interaction i'm really not sure what to do here. i was going to initially suggest plumbing #[link(kind="static")] through to inform if nonlazybind is appropriate, but i had expected that conflicting link directives would at least produce an error. silently ending up with the command line argument is pretty unfortunate. does it seem reasonable to plumb the #[link] attribute as a hint, advise #[link(kind="static")] for statically linked functions, and make conflicting #[link] and -l arguments produce an error?

memorysafety/rav1d#1417 is a more substantive case which motivates this issue, where hot code is a collection of assembly routines that are statically linked. i've written a longer analysis about the case in that issue, but it's just more supporting information around the observation above.

worse, for code that is hot around an indirect call to a constant target, branch prediction quite effectively hides the cost of this indirect call. if the hot code is more like a large region of warm code, the branch prediction can end up evicted and these indirect calls to a constant local function become quite costly.

worse (pt2), LLVM reasonably tries to improve the indirect call situation by hoisting loads to repeated calls of the same target, which can cause register pressure, additional spills, generally make this kind of unfortunate situation even worse.

The text was updated successfully, but these errors were encountered:

Noratrieb · 2025-05-29T11:02:06Z

@nikic @bjorn3 @durin42

bjorn3 · 2025-05-29T11:16:52Z

i expect -Z plt=yes to be better than -Z plt=no on x86-64 for all cases where the called functions are dynamically linked.

It shouldn't be. We enable RELRO, so we always do eager binding of all symbols, so using the PLT adds overhead for dynamically linked functions.

i also expect -Z plt=no to be worse than -Z plt=yes on x86-64 for all cases where the called functions are statically linked and <4 GiB from their call sites.

Yeah, -Zplt=no prevents relaxing calls to possibly imported functions to direct pcrel calls. Note that until -Zdefault-visibility=hidden becomes the default, all calls between object files need to be resolved by the dynamic linker as default visibility allows a dylib to override the symbol even for local calls. We can't make it the default until a fixed ld.bfd is old enough though.

iximeow · 2025-05-29T15:57:02Z

i expect -Z plt=yes to be ...

It shouldn't be.

i'd rephrased that along the way and effectively swapped yes/no so it was just backwards. sorry! what i meant to say is that on x86 i really cannot imagine a way in which the status quo is worse for dynamically linked functions, but it is always worse for statically linked functions.

i've swapped the yes and no to make this read correctly.

Note that until -Zdefault-visibility=hidden becomes the default ...

i don't follow, wouldn't this only apply if the statically linked symbols were produced via rustc? here, the statically linked code is from other .asm files. i would that expect in most cases of Rust code statically linking other libraries, the other libraries are probably from C with appropriate visibility modifiers.

also, #105518 looks like the default would be protected, not hidden?

bjorn3 · 2025-05-29T20:29:54Z

i don't follow, wouldn't this only apply if the statically linked symbols were produced via rustc? here, the statically linked code is from other .asm files. i would that expect in most cases of Rust code statically linking other libraries, the other libraries are probably from C with appropriate visibility modifiers.

True you are free to use protected visibility in the asm code (please don't use hidden visibility. that breaks with dylibs as there is no guarantee that the rust caller ends up in the same dylib), but -Zplt=yes would regress performance for rust code a bit until the symbol visibility default changes.

also, #105518 looks like the default would be protected, not hidden?

Yes, though the same logic applies. Either hidden or protected visibility is enough for PLT relaxation to work.

dramforever · 2025-06-01T18:39:38Z

Prompted by this being mentioned elsewhere I did some of my own investigation. The visibility stuff seems reasonable, but why is relaxation not kicking in?

It seems that rustc is not properly emitting relaxable (X at the end) R_X86_64{_REX,}_GOTPCRELX calls? This seems fixable but I don't know where one would start looking.

Another thing already mentioned is LLVM trying to help by caching the address is unhelpful in the relaxable case. This also affects Clang. I don't know what we can do.

nikic · 2025-06-01T18:48:50Z

@dramforever Because of broken linkers, see #115267. Possibly enough time has passed that enabling ELF relaxations would have less fallout now.

Edit: Nope, two years later there is still no new cross release, so we're going to see exactly the same issues.

rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label May 29, 2025

jieyouxu added the C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such label May 29, 2025

mati865 mentioned this issue Jun 2, 2025

Enabling additional relaxations in Rust causes executables to crash with Wild davidlattimore/wild#818

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

rustc_session: be more precise about -Z plt=yes on x86-64? #141720

rustc_session: be more precise about -Z plt=yes on x86-64? #141720

iximeow commented May 29, 2025 •

edited

Loading

Noratrieb commented May 29, 2025

Uh oh!

bjorn3 commented May 29, 2025 •

edited

Loading

Uh oh!

iximeow commented May 29, 2025 •

edited

Loading

Uh oh!

bjorn3 commented May 29, 2025

Uh oh!

dramforever commented Jun 1, 2025

Uh oh!

nikic commented Jun 1, 2025 •

edited

Loading

Uh oh!

rustc_session: be more precise about -Z plt=yes on x86-64? #141720

rustc_session: be more precise about -Z plt=yes on x86-64? #141720

Comments

iximeow commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Noratrieb commented May 29, 2025

Uh oh!

bjorn3 commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iximeow commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bjorn3 commented May 29, 2025

Uh oh!

dramforever commented Jun 1, 2025

Uh oh!

nikic commented Jun 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iximeow commented May 29, 2025 •

edited

Loading

bjorn3 commented May 29, 2025 •

edited

Loading

iximeow commented May 29, 2025 •

edited

Loading

nikic commented Jun 1, 2025 •

edited

Loading