-
Notifications
You must be signed in to change notification settings - Fork 236
Implement __bswap[si]i2 intrinsics #633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Could you update README.md to mention these? A generic version is possible, which would work on u128 if needed. It seems to optimize as well as the direct port ( /* int/mod.rs */
// Need to add these two items to the int trait
trait Int {
type Bytes: IntoIterator<Item = u8>;
fn to_be_bytes(self) -> Self::Bytes;
}
// Update `impl_int_common` to implement them
impl Int for .. {
type Bytes = [u8; Self::BITS as usize / 8];
fn to_be_bytes(self) -> Self::Bytes {
self.to_be_bytes()
}
}
/* bswap.rs */
fn bswap<I: Int + From<u8>>(x: I) -> I {
let mut ret = I::ZERO;
for (i, byte) in x.to_be_bytes().into_iter().enumerate() {
ret |= I::from(byte) << (i as u32 * 8);
}
ret
}
#[inline(never)]
pub extern "C" fn __bswapdi2(x: u64) -> u64 {
bswap(x)
}
#[inline(never)]
pub extern "C" fn __bswapsi2(x: u32) -> u32 {
bswap(x)
} |
Actually it might be good to add the i128 version since it seems like GCC and LLVM can both make use of it https://gcc.gnu.org/pipermail/gcc-cvs/2020-June/301171.html https://reviews.llvm.org/D114425. Which circumstances require this builtin @tea? On every platform I have tested, the intrinsic gets lowered to asm rather than a function call. |
I encountered this when building libzlma using gcc on RISC-V, without Zbb extension (library specifically uses __builtin_bswap so maybe it is limited to that use case, but still) An example
|
Done. Amazing thing is, the end result is EXACTLY the same as it was for bswapdi2 at least. The compiler had to figure out the generic version was "swap bytes" idiom for these optimizations to work; I guess it also figured out MY bitshifting code was the same idiom and replaced all of these implementations with its own "swap bytes" code... Anyway, it look better than original compiler-rt version does after gcc compilation so I'm quite fine with that. |
src/int/bswap.rs
Outdated
|
||
intrinsics! { | ||
#[maybe_use_optimized_c_shim] | ||
#[cfg(any(target_pointer_width = "32", target_pointer_width = "64"))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last question, why is this gated? I don't notice similar gating in https://github.com/llvm/llvm-project/blob/919b1ecafc010379eff88368b050068223a01f99/compiler-rt/lib/builtins/bswapsi2.c#L15
If it's related to AVR having a custom ABI for a lot of intrinsics, you can just mark these #[avr_skip]
instead
Otherwise you should probably just #![cfg(...)]
-gate the whole file since bswap
will get an unused warning on 16-bit platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was worried about 16-bit targets, whichever they might be. You are right, they are not gated in compiler-rt, and they are specifically 32- and 64- bit, so should probably ungate them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ungated si and di versions, left 128-bit under conditional though. I doubt it is ever available or called in 16-bit anything... Is there even any 128 bit int C type in such implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just remove the target_pointer_width gate entirely unless there is a reason to restrict it, e.g. broken implementation. Not like these specific methods are likely to be used, but the other i128/f128 routines aren't gated. Maybe add #[avr_skip]
to all of them though, because of the custom calling convention on that target.
I think C's __int128
is only defined in 64-bit ABIs and nothing smaller, not sure if Clang/GCC make it available on anything 16-bit. Rust definitely tries a bit harder to keep the same types & methods available on all platforms for portability (and LLVM's i128
(and larger) is always available which makes that easier).
81ce43b
to
ee8c62f
Compare
Since this is never emitted by LLVM but only by GCC, can we just call |
Do we need to worry about core built by cg_gcc? Might still not be relevant here because it looks like it also lowers to assembly (can't easily check anything other than x86 with cg_gcc on Godbolt though) |
So, should I scrap the current implementation and use |
So I went ahead and replaced it with swap_bytes(). I kind of suspected this wouldn't change a bit in generated code, and it didn't. The compiler was replacing the implementation with the builtin, and now it is placing same builtin through swap_bytes() macro substitution. |
These can be emitted by gcc, at least if requested specifically via __builtin_bswap{32,64,128}.
These can be emitted by gcc, at least if requested specifically via __builtin_bswap{32,64}.