Skip to content

Commit cb4c7e3

Browse files
Fix nonlocal unsoundness vs. atomic store
MOVNTI, MOVNTDQ, and friends weaken TSO when next to other stores. As most stores are not nontemporal, so LLVM uses simple stores when lowering LLVMIR like `atomic store ... release` on x86. These facts could allow something like the following code to be emitted: vmovntdq [addr], ymmreg vmovntdq [addr+N], ymmreg vmovntdq [addr+N*2], ymmreg vmovntdq [addr+N*3], ymmreg mov byte ptr [flag], 1 ; producer-consumer flag But these stores are NOT ordered with respect to each other! Nontemporal stores induce the CPU to use write-combining buffers. These writes will be resolved in bursts instead of at once, and the write may be further deferred until a serialization point. Even a non-temporal write to any other location will not force the deferred writes to be resolved first. Thus, assuming cache-line-sized buffers of 64 bytes, the CPU may resolve these writes in e.g. this actual order: vmovntdq [addr+N*2], ymmreg vmovntdq [addr+N*3], ymmreg mov byte ptr [flag], 1 vmovntdq [addr+N], ymmreg vmovntdq [addr], ymmreg This could e.g. result in other threads accessing this address after the flag is set, thus accessing memory via safe code that was assumed to be correctly synchronized. This could result in observing tearing or other inconsistent program states. If using `&mut [u8]` to write uninitialized memory is permitted ( per rust-lang/unsafe-code-guidelines#346 ), it could even result in safe code incorrectly reading uninitialized memory! To guarantee program soundness, code using nontemporal stores must currently use sfence in its safety boundary, unless and until LLVM decides this combination of facts should be considered a miscompilation and a motivation to choose lowerings that do not require explicit sfence.
1 parent 748a7a0 commit cb4c7e3

File tree

1 file changed

+6
-1
lines changed

1 file changed

+6
-1
lines changed

src/x86_64.rs

+6-1
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,12 @@ pub fn memset(slice: &mut [u8], byte: u8) {
3636

3737
let ptr = get_impl().load(Ordering::Relaxed);
3838
let ptr = unsafe { mem::transmute::<usize, FnSig>(ptr) };
39-
ptr(slice, byte)
39+
ptr(slice, byte);
40+
41+
// Required before transferring control to code that may perform other atomic stores,
42+
// as LLVM will not lower an `atomic store ... release` on x86-64 to emit SFENCE
43+
// even though that is required in the presence of nontemporal stores for soundness.
44+
unsafe { _mm_sfence() };
4045
}
4146

4247
#[repr(packed)]

0 commit comments

Comments
 (0)