Skip to content

Fix #54391: Invalid pointer aliasing in SAS7BDAT parser #54401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jonashaag
Copy link
Contributor

@thesamesam please review.

@jonashaag jonashaag requested a review from WillAyd as a code owner August 4, 2023 07:56
cdef uint32_t num_uint
memcpy(&num_uint, &num, sizeof(float))
num_uint = _byteswap4(num_uint)
return (<float *>&num_uint)[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is UB again, you generally cannot cast pointer types like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why there is not compiler warning if this is UB.

What's the correct way to cast those bits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an idea:
assuming this code is always compiled as C (and not C++), you can use a union:

cdef float _byteswap_float(float num):
    cdef union Punning:
        uint32_t u
        float f

    cdef Punning punning = num;
    punning.u = _byteswap4(punning.u)
    return punning.f

I haven't tested this code, but you get the gist. Important: this is UB in C++, which doesn't allow type punning through unions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, the memcpy was fine, it's just that using a union is more transparent to MSVC's optimizer, which is pretty weak compared to GCC/Clang's optimizer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, switched to union, thanks a lot!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't we just use a uint8_t * and cast to the appropriate size in the byteswap calls? The union approach seems kind of overkill here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean cast the uint8_t* back to a uint32_t* later again or what?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to be clear I was asking if there was a way to deal with the raw data pointer. I see the pattern is something like:

def read_float_with_byteswap(bytes data, Py_ssize_t offset, bint byteswap):
    assert offset + 4 < len(data)
    cdef:
        const char *data_ptr = data
        float res = (<float*>(data_ptr + offset))[0]
    if byteswap:
        res = _byteswap_float(res)
    return res

So we are casting to float then going back in a circle to work with raw bytes for the byteswap. It would seem more logically structured if we just did

def read_float_with_byteswap(bytes data, Py_ssize_t offset, bint byteswap):
    assert offset + 4 < len(data)
    cdef:
        float res = (<float*>(data_ptr + offset))[0]
    if byteswap:
        return (<float*>_byteswap_float(data_ptr + offset))[0]
   else:
        return (<float*>(data_ptr + offset))[0]

And just have the byteswap function work with the data buffer directly. Within byteswap I think could just do

cdef uint32_t _byteswap_float(bytes data):
    cdef uint32_t value*
    memcpy(value, data, 4)
    return value[0]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my other PR, the loader functions are invalid too

@SoapGentoo
Copy link
Contributor

#54407 supersedes this PR

@mroeschke
Copy link
Member

Closing in favor of #54407

@mroeschke mroeschke closed this Aug 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: SIGBUS in test_float_byteswap test on arm
4 participants