Skip to content

BUG: Fix out-of-bounds access in safe_sort with an empty array and non-empty codes #59489

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 12, 2024
Merged

BUG: Fix out-of-bounds access in safe_sort with an empty array and non-empty codes #59489

merged 3 commits into from
Aug 12, 2024

Conversation

jsspencer
Copy link
Contributor

safe_sort could handle out-of-bounds references in codes so long as the array was not empty.
By masking all out-of-bounds values to -1, we can handle both empty and non-empty array cases.
Previously we masked out-of-bounds values to 0 and then fixed them up afterwards, but this results in accessing the 0th element which doesn't exist in the empty array case.

Previously we masked `codes` referring to out-of-bounds elements to 0 and
then fixed them after to -1 using `np.putmask`. However, this results in
out-of-bounds access in `take_nd` if the array is empty.

Instead, set all out-of-bounds indices in `codes` to -1 immediately, as
these can be handled by `take_nd`.
`use_na_sentinel` cannot be truthy inside an else branch where it is
falsy.
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a unit test for your original example in #59421 (comment)?

@mroeschke mroeschke added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Aug 12, 2024
@jsspencer
Copy link
Contributor Author

Could you also add a unit test for your original example in #59421 (comment)?

Done. I also verified that with bounds checking enabled on take_1d_*, this additional test failed at HEAD and passes with the changes in this PR.

@mroeschke mroeschke added this to the 3.0 milestone Aug 12, 2024
@mroeschke mroeschke merged commit 66a2111 into pandas-dev:main Aug 12, 2024
45 of 47 checks passed
@mroeschke
Copy link
Member

Thanks @jsspencer

@jsspencer jsspencer deleted the fix_safe_sort_empty_array branch August 12, 2024 21:33
shreyas-dev pushed a commit to shreyas-dev/pandas that referenced this pull request Aug 13, 2024
…n-empty codes (pandas-dev#59489)

* Fix out-of-bounds violations in safe_sort for empty arrays.

Previously we masked `codes` referring to out-of-bounds elements to 0 and
then fixed them after to -1 using `np.putmask`. However, this results in
out-of-bounds access in `take_nd` if the array is empty.

Instead, set all out-of-bounds indices in `codes` to -1 immediately, as
these can be handled by `take_nd`.

* Remove dead code.

`use_na_sentinel` cannot be truthy inside an else branch where it is
falsy.

* Add test based upon pandas-dev#59421
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: merging DataFrames on a column containing just NaN values triggers address violation in safe_sort
2 participants