Skip to content

Commit d840fb6

Browse files
committed
Use the trifecta div algorithm for 128-bit div on wasm
This commit updates the `#[cfg]` annotations used to select the implementation of 128-bit division in compiler-builtins on wasm targets. This is done with relation to https://github.com/WebAssembly/128-bit-arithmetic where performance of 128-bit operations is being investigated on WebAssembly. While I don't know much about the particulars of the two algorithms involved here the comments indicate that the "trifecta" variant is preferred if possible but it's not selected on 32-bit architectures. This rationale isn't as applicable to WebAssembly targets because despite the 32-bit pointer width there are often wider-than-pointer operations available as it's typically run on 64-bit machines. Locally in testing a benchmark that performs division with a Rust-based bignum libraries whent from 350% slower-than-native to 220% slower-than-native with this change, a nice increase in speed. While this was tested with Wasmtime other runtimes are likely to see an improvement as well.
1 parent 729ba06 commit d840fb6

File tree

1 file changed

+15
-5
lines changed
  • src/int/specialized_div_rem

1 file changed

+15
-5
lines changed

src/int/specialized_div_rem/mod.rs

+15-5
Original file line numberDiff line numberDiff line change
@@ -136,9 +136,15 @@ fn u64_by_u64_div_rem(duo: u64, div: u64) -> (u64, u64) {
136136

137137
// Whether `trifecta` or `delegate` is faster for 128 bit division depends on the speed at which a
138138
// microarchitecture can multiply and divide. We decide to be optimistic and assume `trifecta` is
139-
// faster if the target pointer width is at least 64.
139+
// faster if the target pointer width is at least 64. Note that this
140+
// implementation is additionally included on WebAssembly despite the typical
141+
// pointer width there being 32 because it's typically run on a 64-bit machine
142+
// that has access to faster 64-bit operations.
140143
#[cfg(all(
141-
not(any(target_pointer_width = "16", target_pointer_width = "32")),
144+
any(
145+
target_family = "wasm",
146+
not(any(target_pointer_width = "16", target_pointer_width = "32")),
147+
),
142148
not(all(not(feature = "no-asm"), target_arch = "x86_64")),
143149
not(any(target_arch = "sparc", target_arch = "sparc64"))
144150
))]
@@ -152,10 +158,14 @@ impl_trifecta!(
152158
u128
153159
);
154160

155-
// If the pointer width less than 64, then the target architecture almost certainly does not have
156-
// the fast 64 to 128 bit widening multiplication needed for `trifecta` to be faster.
161+
// If the pointer width less than 64 and this isn't wasm, then the target
162+
// architecture almost certainly does not have the fast 64 to 128 bit widening
163+
// multiplication needed for `trifecta` to be faster.
157164
#[cfg(all(
158-
any(target_pointer_width = "16", target_pointer_width = "32"),
165+
not(any(
166+
target_family = "wasm",
167+
not(any(target_pointer_width = "16", target_pointer_width = "32")),
168+
)),
159169
not(all(not(feature = "no-asm"), target_arch = "x86_64")),
160170
not(any(target_arch = "sparc", target_arch = "sparc64"))
161171
))]

0 commit comments

Comments
 (0)