Implement __bswap[si]i2 intrinsics #633

tea · 2024-06-24T16:05:16Z

These can be emitted by gcc, at least if requested specifically via __builtin_bswap{32,64}.

tgross35 · 2024-06-24T17:57:23Z

Could you update README.md to mention these?

A generic version is possible, which would work on u128 if needed. It seems to optimize as well as the direct port (bswap on x86, rev on aarch64, one (u23) or two (u64) revs on Arm32). No strong need to change anything though if you're not up for it.

/* int/mod.rs */

// Need to add these two items to the int trait
trait Int {
    type Bytes: IntoIterator<Item =  u8>;
    fn to_be_bytes(self) -> Self::Bytes;
}

// Update `impl_int_common` to implement them
impl Int for .. {
    type Bytes = [u8; Self::BITS as usize / 8];
    fn to_be_bytes(self) -> Self::Bytes {
        self.to_be_bytes()
    }
}

/* bswap.rs */

fn bswap<I: Int + From<u8>>(x: I) -> I {
    let mut ret = I::ZERO;
    for (i, byte) in x.to_be_bytes().into_iter().enumerate() {
        ret |= I::from(byte) << (i as u32 * 8);
    }
    ret
}

#[inline(never)]
pub extern "C" fn __bswapdi2(x: u64) -> u64 {
    bswap(x)
}

#[inline(never)]
pub extern "C" fn __bswapsi2(x: u32) -> u32 {
    bswap(x)
}

testcrate/tests/misc.rs

tgross35 · 2024-06-25T05:46:17Z

Actually it might be good to add the i128 version since it seems like GCC and LLVM can both make use of it https://gcc.gnu.org/pipermail/gcc-cvs/2020-June/301171.html https://reviews.llvm.org/D114425.

Which circumstances require this builtin @tea? On every platform I have tested, the intrinsic gets lowered to asm rather than a function call.

tea · 2024-06-25T06:22:06Z

Which circumstances require this builtin @tea? On every platform I have tested, the intrinsic gets lowered to asm rather than a function call.

I encountered this when building libzlma using gcc on RISC-V, without Zbb extension (library specifically uses __builtin_bswap so maybe it is limited to that use case, but still)

An example

$ riscv64-linux-gnu-gcc -xc  -O2 -S -o - - <<__EOF__
int test_si(int x) {
        return __builtin_bswap32(x);
}

int test_di(int x) {
        return __builtin_bswap64(x);
}
__EOF__
        .file   "<stdin>"
        .option pic
        .text
        .globl  __bswapsi2
        .align  1
        .globl  test_si
        .type   test_si, @function
test_si:
        addi    sp,sp,-16
        sd      ra,8(sp)
        call    __bswapsi2@plt
        ld      ra,8(sp)
        sext.w  a0,a0
        addi    sp,sp,16
        jr      ra
        .size   test_si, .-test_si
        .globl  __bswapdi2
        .align  1
        .globl  test_di
        .type   test_di, @function
test_di:
        addi    sp,sp,-16
        sd      ra,8(sp)
        call    __bswapdi2@plt
        ld      ra,8(sp)
        sext.w  a0,a0
        addi    sp,sp,16
        jr      ra
        .size   test_di, .-test_di
        .ident  "GCC: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0"
        .section        .note.GNU-stack,"",@progbits

src/int/bswap.rs

tea · 2024-06-25T07:19:54Z

A generic version is possible, which would work on u128 if needed. It seems to optimize as well as the direct port (bswap on x86, rev on aarch64, one (u23) or two (u64) revs on Arm32).

Done. Amazing thing is, the end result is EXACTLY the same as it was for bswapdi2 at least. The compiler had to figure out the generic version was "swap bytes" idiom for these optimizations to work; I guess it also figured out MY bitshifting code was the same idiom and replaced all of these implementations with its own "swap bytes" code...

Anyway, it look better than original compiler-rt version does after gcc compilation so I'm quite fine with that.

tgross35 · 2024-06-25T09:19:08Z

src/int/bswap.rs

+
+intrinsics! {
+    #[maybe_use_optimized_c_shim]
+    #[cfg(any(target_pointer_width = "32", target_pointer_width = "64"))]


One last question, why is this gated? I don't notice similar gating in https://github.com/llvm/llvm-project/blob/919b1ecafc010379eff88368b050068223a01f99/compiler-rt/lib/builtins/bswapsi2.c#L15

If it's related to AVR having a custom ABI for a lot of intrinsics, you can just mark these #[avr_skip] instead

Otherwise you should probably just #![cfg(...)]-gate the whole file since bswap will get an unused warning on 16-bit platforms.

Yeah, I was worried about 16-bit targets, whichever they might be. You are right, they are not gated in compiler-rt, and they are specifically 32- and 64- bit, so should probably ungate them.

Ungated si and di versions, left 128-bit under conditional though. I doubt it is ever available or called in 16-bit anything... Is there even any 128 bit int C type in such implementations?

You can just remove the target_pointer_width gate entirely unless there is a reason to restrict it, e.g. broken implementation. Not like these specific methods are likely to be used, but the other i128/f128 routines aren't gated. Maybe add #[avr_skip] to all of them though, because of the custom calling convention on that target.

I think C's __int128 is only defined in 64-bit ABIs and nothing smaller, not sure if Clang/GCC make it available on anything 16-bit. Rust definitely tries a bit harder to keep the same types & methods available on all platforms for portability (and LLVM's i128 (and larger) is always available which makes that easier).

Amanieu · 2024-06-25T21:55:16Z

Since this is never emitted by LLVM but only by GCC, can we just call .swap_bytes() from core?

tgross35 · 2024-06-25T23:44:40Z

Do we need to worry about core built by cg_gcc? Might still not be relevant here because it looks like it also lowers to assembly (can't easily check anything other than x86 with cg_gcc on Godbolt though)

tea · 2024-06-26T15:58:48Z

Since this is never emitted by LLVM but only by GCC, can we just call .swap_bytes() from core?

So, should I scrap the current implementation and use swap_bytes()? Probably it would be safe, given as the compiler already seem to be lifting the entire implementation into some builtin, and then lowering it back into its own implementation, and it works now.

tea · 2024-06-27T15:35:36Z

So I went ahead and replaced it with swap_bytes(). I kind of suspected this wouldn't change a bit in generated code, and it didn't. The compiler was replacing the implementation with the builtin, and now it is placing same builtin through swap_bytes() macro substitution.
As for gcc Rust compiler... Probably too early to care much about it, given its state. I just tried to build a Rust cross-compiler from gcc 14, but it turns out it is not supported. Apparently it can only build native compiler for now (which is probably why there's only x86 on Compiler Explorer).

These can be emitted by gcc, at least if requested specifically via __builtin_bswap{32,64,128}.

tea force-pushed the bswap branch from 4940f05 to 500a276 Compare June 24, 2024 16:21

tgross35 reviewed Jun 24, 2024

View reviewed changes

testcrate/tests/misc.rs Outdated Show resolved Hide resolved

tea force-pushed the bswap branch from 500a276 to 89bcf07 Compare June 25, 2024 06:58

tgross35 reviewed Jun 25, 2024

View reviewed changes

src/int/bswap.rs Outdated Show resolved Hide resolved

tea force-pushed the bswap branch from 89bcf07 to 6ea6820 Compare June 25, 2024 07:19

tgross35 reviewed Jun 25, 2024

View reviewed changes

tea force-pushed the bswap branch 2 times, most recently from 81ce43b to ee8c62f Compare June 25, 2024 12:50

tea force-pushed the bswap branch from ee8c62f to 7c6d884 Compare June 27, 2024 15:19

tea force-pushed the bswap branch from 7c6d884 to 2c051ab Compare July 6, 2024 02:09

Implement __bswap[sdt]i2 intrinsics

3641761

These can be emitted by gcc, at least if requested specifically via __builtin_bswap{32,64,128}.

Amanieu force-pushed the bswap branch from 2c051ab to 3641761 Compare July 6, 2024 09:51

Amanieu enabled auto-merge July 6, 2024 09:51

Amanieu merged commit 06db2de into rust-lang:master Jul 6, 2024
24 checks passed

tea deleted the bswap branch October 17, 2024 06:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement __bswap[si]i2 intrinsics #633

Implement __bswap[si]i2 intrinsics #633

Uh oh!

tea commented Jun 24, 2024

Uh oh!

tgross35 commented Jun 24, 2024 •

edited

Loading

Uh oh!

Uh oh!

tgross35 commented Jun 25, 2024

Uh oh!

tea commented Jun 25, 2024

Uh oh!

Uh oh!

tea commented Jun 25, 2024

Uh oh!

tgross35 Jun 25, 2024

Uh oh!

tea Jun 25, 2024

Uh oh!

tea Jun 25, 2024

Uh oh!

tgross35 Jun 25, 2024

Uh oh!

Amanieu commented Jun 25, 2024

Uh oh!

tgross35 commented Jun 25, 2024

Uh oh!

tea commented Jun 26, 2024

Uh oh!

tea commented Jun 27, 2024

Uh oh!

Uh oh!

Uh oh!

Implement __bswap[si]i2 intrinsics #633

Implement __bswap[si]i2 intrinsics #633

Uh oh!

Conversation

tea commented Jun 24, 2024

Uh oh!

tgross35 commented Jun 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

tgross35 commented Jun 25, 2024

Uh oh!

tea commented Jun 25, 2024

Uh oh!

Uh oh!

tea commented Jun 25, 2024

Uh oh!

tgross35 Jun 25, 2024

Choose a reason for hiding this comment

Uh oh!

tea Jun 25, 2024

Choose a reason for hiding this comment

Uh oh!

tea Jun 25, 2024

Choose a reason for hiding this comment

Uh oh!

tgross35 Jun 25, 2024

Choose a reason for hiding this comment

Uh oh!

Amanieu commented Jun 25, 2024

Uh oh!

tgross35 commented Jun 25, 2024

Uh oh!

tea commented Jun 26, 2024

Uh oh!

tea commented Jun 27, 2024

Uh oh!

Uh oh!

Uh oh!

tgross35 commented Jun 24, 2024 •

edited

Loading