Fix #54391: Invalid pointer aliasing in SAS7BDAT parser #54401

jonashaag · 2023-08-04T07:56:58Z

closes BUG: SIGBUS in test_float_byteswap test on arm #54391 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

SoapGentoo · 2023-08-04T10:07:13Z

pandas/_libs/byteswap.pyx

+    cdef uint32_t num_uint
+    memcpy(&num_uint, &num, sizeof(float))
+    num_uint = _byteswap4(num_uint)
+    return (<float *>&num_uint)[0]


This is UB again, you generally cannot cast pointer types like this.

I wonder why there is not compiler warning if this is UB.

What's the correct way to cast those bits?

Here's an idea:
assuming this code is always compiled as C (and not C++), you can use a union:

cdef float _byteswap_float(float num): cdef union Punning: uint32_t u float f cdef Punning punning = num; punning.u = _byteswap4(punning.u) return punning.f

I haven't tested this code, but you get the gist. Important: this is UB in C++, which doesn't allow type punning through unions.

btw, the memcpy was fine, it's just that using a union is more transparent to MSVC's optimizer, which is pretty weak compared to GCC/Clang's optimizer.

OK, switched to union, thanks a lot!

Why wouldn't we just use a uint8_t * and cast to the appropriate size in the byteswap calls? The union approach seems kind of overkill here

you mean cast the uint8_t* back to a uint32_t* later again or what?

Sorry to be clear I was asking if there was a way to deal with the raw data pointer. I see the pattern is something like:

def read_float_with_byteswap(bytes data, Py_ssize_t offset, bint byteswap): assert offset + 4 < len(data) cdef: const char *data_ptr = data float res = (<float*>(data_ptr + offset))[0] if byteswap: res = _byteswap_float(res) return res

So we are casting to float then going back in a circle to work with raw bytes for the byteswap. It would seem more logically structured if we just did

def read_float_with_byteswap(bytes data, Py_ssize_t offset, bint byteswap): assert offset + 4 < len(data) cdef: float res = (<float*>(data_ptr + offset))[0] if byteswap: return (<float*>_byteswap_float(data_ptr + offset))[0] else: return (<float*>(data_ptr + offset))[0]

And just have the byteswap function work with the data buffer directly. Within byteswap I think could just do

cdef uint32_t _byteswap_float(bytes data): cdef uint32_t value* memcpy(value, data, 4) return value[0]

see my other PR, the loader functions are invalid too

SoapGentoo · 2023-08-04T15:15:30Z

#54407 supersedes this PR

mroeschke · 2023-08-04T17:06:00Z

Closing in favor of #54407

jonashaag requested a review from WillAyd as a code owner August 4, 2023 07:56

SoapGentoo suggested changes Aug 4, 2023

View reviewed changes

jonashaag force-pushed the sas-ptr branch from 2a097e9 to 3708552 Compare August 4, 2023 11:56

Fix pandas-dev#54391: Invalid pointer aliasing in SAS7BDAT parser

000e93a

jonashaag force-pushed the sas-ptr branch from 3708552 to 000e93a Compare August 4, 2023 12:49

SoapGentoo approved these changes Aug 4, 2023

View reviewed changes

mroeschke closed this Aug 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #54391: Invalid pointer aliasing in SAS7BDAT parser #54401

Fix #54391: Invalid pointer aliasing in SAS7BDAT parser #54401

jonashaag commented Aug 4, 2023

SoapGentoo Aug 4, 2023

jonashaag Aug 4, 2023

jonashaag Aug 4, 2023

SoapGentoo Aug 4, 2023

SoapGentoo Aug 4, 2023

jonashaag Aug 4, 2023

WillAyd Aug 4, 2023

SoapGentoo Aug 4, 2023

WillAyd Aug 4, 2023

SoapGentoo Aug 4, 2023

SoapGentoo commented Aug 4, 2023

mroeschke commented Aug 4, 2023

Fix #54391: Invalid pointer aliasing in SAS7BDAT parser #54401

Fix #54391: Invalid pointer aliasing in SAS7BDAT parser #54401

Conversation

jonashaag commented Aug 4, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SoapGentoo commented Aug 4, 2023

mroeschke commented Aug 4, 2023