-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Fix #54391: Invalid pointer aliasing in SAS7BDAT parser #54401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/_libs/byteswap.pyx
Outdated
cdef uint32_t num_uint | ||
memcpy(&num_uint, &num, sizeof(float)) | ||
num_uint = _byteswap4(num_uint) | ||
return (<float *>&num_uint)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is UB again, you generally cannot cast pointer types like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why there is not compiler warning if this is UB.
What's the correct way to cast those bits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an idea:
assuming this code is always compiled as C (and not C++), you can use a union:
cdef float _byteswap_float(float num):
cdef union Punning:
uint32_t u
float f
cdef Punning punning = num;
punning.u = _byteswap4(punning.u)
return punning.f
I haven't tested this code, but you get the gist. Important: this is UB in C++, which doesn't allow type punning through unions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, the memcpy was fine, it's just that using a union is more transparent to MSVC's optimizer, which is pretty weak compared to GCC/Clang's optimizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, switched to union, thanks a lot!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wouldn't we just use a uint8_t *
and cast to the appropriate size in the byteswap calls? The union approach seems kind of overkill here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean cast the uint8_t*
back to a uint32_t*
later again or what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to be clear I was asking if there was a way to deal with the raw data pointer. I see the pattern is something like:
def read_float_with_byteswap(bytes data, Py_ssize_t offset, bint byteswap):
assert offset + 4 < len(data)
cdef:
const char *data_ptr = data
float res = (<float*>(data_ptr + offset))[0]
if byteswap:
res = _byteswap_float(res)
return res
So we are casting to float then going back in a circle to work with raw bytes for the byteswap. It would seem more logically structured if we just did
def read_float_with_byteswap(bytes data, Py_ssize_t offset, bint byteswap):
assert offset + 4 < len(data)
cdef:
float res = (<float*>(data_ptr + offset))[0]
if byteswap:
return (<float*>_byteswap_float(data_ptr + offset))[0]
else:
return (<float*>(data_ptr + offset))[0]
And just have the byteswap function work with the data buffer directly. Within byteswap I think could just do
cdef uint32_t _byteswap_float(bytes data):
cdef uint32_t value*
memcpy(value, data, 4)
return value[0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my other PR, the loader functions are invalid too
#54407 supersedes this PR |
Closing in favor of #54407 |
@thesamesam please review.
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.