Skip to content

Implement __bswap[si]i2 intrinsics #633

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 6, 2024
Merged

Implement __bswap[si]i2 intrinsics #633

merged 1 commit into from
Jul 6, 2024

Conversation

tea
Copy link
Contributor

@tea tea commented Jun 24, 2024

These can be emitted by gcc, at least if requested specifically via __builtin_bswap{32,64}.

@tgross35
Copy link
Contributor

tgross35 commented Jun 24, 2024

Could you update README.md to mention these?

A generic version is possible, which would work on u128 if needed. It seems to optimize as well as the direct port (bswap on x86, rev on aarch64, one (u23) or two (u64) revs on Arm32). No strong need to change anything though if you're not up for it.

/* int/mod.rs */

// Need to add these two items to the int trait
trait Int {
    type Bytes: IntoIterator<Item =  u8>;
    fn to_be_bytes(self) -> Self::Bytes;
}

// Update `impl_int_common` to implement them
impl Int for .. {
    type Bytes = [u8; Self::BITS as usize / 8];
    fn to_be_bytes(self) -> Self::Bytes {
        self.to_be_bytes()
    }
}

/* bswap.rs */

fn bswap<I: Int + From<u8>>(x: I) -> I {
    let mut ret = I::ZERO;
    for (i, byte) in x.to_be_bytes().into_iter().enumerate() {
        ret |= I::from(byte) << (i as u32 * 8);
    }
    ret
}

#[inline(never)]
pub extern "C" fn __bswapdi2(x: u64) -> u64 {
    bswap(x)
}

#[inline(never)]
pub extern "C" fn __bswapsi2(x: u32) -> u32 {
    bswap(x)
}

@tgross35
Copy link
Contributor

Actually it might be good to add the i128 version since it seems like GCC and LLVM can both make use of it https://gcc.gnu.org/pipermail/gcc-cvs/2020-June/301171.html https://reviews.llvm.org/D114425.

Which circumstances require this builtin @tea? On every platform I have tested, the intrinsic gets lowered to asm rather than a function call.

@tea
Copy link
Contributor Author

tea commented Jun 25, 2024

Which circumstances require this builtin @tea? On every platform I have tested, the intrinsic gets lowered to asm rather than a function call.

I encountered this when building libzlma using gcc on RISC-V, without Zbb extension (library specifically uses __builtin_bswap so maybe it is limited to that use case, but still)

An example
$ riscv64-linux-gnu-gcc -xc  -O2 -S -o - - <<__EOF__
int test_si(int x) {
        return __builtin_bswap32(x);
}

int test_di(int x) {
        return __builtin_bswap64(x);
}
__EOF__
        .file   "<stdin>"
        .option pic
        .text
        .globl  __bswapsi2
        .align  1
        .globl  test_si
        .type   test_si, @function
test_si:
        addi    sp,sp,-16
        sd      ra,8(sp)
        call    __bswapsi2@plt
        ld      ra,8(sp)
        sext.w  a0,a0
        addi    sp,sp,16
        jr      ra
        .size   test_si, .-test_si
        .globl  __bswapdi2
        .align  1
        .globl  test_di
        .type   test_di, @function
test_di:
        addi    sp,sp,-16
        sd      ra,8(sp)
        call    __bswapdi2@plt
        ld      ra,8(sp)
        sext.w  a0,a0
        addi    sp,sp,16
        jr      ra
        .size   test_di, .-test_di
        .ident  "GCC: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0"
        .section        .note.GNU-stack,"",@progbits

@tea
Copy link
Contributor Author

tea commented Jun 25, 2024

A generic version is possible, which would work on u128 if needed. It seems to optimize as well as the direct port (bswap on x86, rev on aarch64, one (u23) or two (u64) revs on Arm32).

Done. Amazing thing is, the end result is EXACTLY the same as it was for bswapdi2 at least. The compiler had to figure out the generic version was "swap bytes" idiom for these optimizations to work; I guess it also figured out MY bitshifting code was the same idiom and replaced all of these implementations with its own "swap bytes" code...

Anyway, it look better than original compiler-rt version does after gcc compilation so I'm quite fine with that.

src/int/bswap.rs Outdated

intrinsics! {
#[maybe_use_optimized_c_shim]
#[cfg(any(target_pointer_width = "32", target_pointer_width = "64"))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last question, why is this gated? I don't notice similar gating in https://github.com/llvm/llvm-project/blob/919b1ecafc010379eff88368b050068223a01f99/compiler-rt/lib/builtins/bswapsi2.c#L15

If it's related to AVR having a custom ABI for a lot of intrinsics, you can just mark these #[avr_skip] instead

Otherwise you should probably just #![cfg(...)]-gate the whole file since bswap will get an unused warning on 16-bit platforms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was worried about 16-bit targets, whichever they might be. You are right, they are not gated in compiler-rt, and they are specifically 32- and 64- bit, so should probably ungate them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ungated si and di versions, left 128-bit under conditional though. I doubt it is ever available or called in 16-bit anything... Is there even any 128 bit int C type in such implementations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just remove the target_pointer_width gate entirely unless there is a reason to restrict it, e.g. broken implementation. Not like these specific methods are likely to be used, but the other i128/f128 routines aren't gated. Maybe add #[avr_skip] to all of them though, because of the custom calling convention on that target.

I think C's __int128 is only defined in 64-bit ABIs and nothing smaller, not sure if Clang/GCC make it available on anything 16-bit. Rust definitely tries a bit harder to keep the same types & methods available on all platforms for portability (and LLVM's i128 (and larger) is always available which makes that easier).

@tea tea force-pushed the bswap branch 2 times, most recently from 81ce43b to ee8c62f Compare June 25, 2024 12:50
@Amanieu
Copy link
Member

Amanieu commented Jun 25, 2024

Since this is never emitted by LLVM but only by GCC, can we just call .swap_bytes() from core?

@tgross35
Copy link
Contributor

Do we need to worry about core built by cg_gcc? Might still not be relevant here because it looks like it also lowers to assembly (can't easily check anything other than x86 with cg_gcc on Godbolt though)

@tea
Copy link
Contributor Author

tea commented Jun 26, 2024

Since this is never emitted by LLVM but only by GCC, can we just call .swap_bytes() from core?

So, should I scrap the current implementation and use swap_bytes()? Probably it would be safe, given as the compiler already seem to be lifting the entire implementation into some builtin, and then lowering it back into its own implementation, and it works now.

@tea
Copy link
Contributor Author

tea commented Jun 27, 2024

So I went ahead and replaced it with swap_bytes(). I kind of suspected this wouldn't change a bit in generated code, and it didn't. The compiler was replacing the implementation with the builtin, and now it is placing same builtin through swap_bytes() macro substitution.
As for gcc Rust compiler... Probably too early to care much about it, given its state. I just tried to build a Rust cross-compiler from gcc 14, but it turns out it is not supported. Apparently it can only build native compiler for now (which is probably why there's only x86 on Compiler Explorer).

These can be emitted by gcc, at least if requested specifically via __builtin_bswap{32,64,128}.
@Amanieu Amanieu enabled auto-merge July 6, 2024 09:51
@Amanieu Amanieu merged commit 06db2de into rust-lang:master Jul 6, 2024
24 checks passed
@tea tea deleted the bswap branch October 17, 2024 06:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants