Skip to content

BUG: Fix segfault in Categorical.set_categories #24680

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 9, 2019

Conversation

jschendel
Copy link
Member


The fix basically amounts to changing self._codes --> cat._codes in the code block below:

cat = self if inplace else self.copy()
if rename:
if (cat.dtype.categories is not None and
len(new_dtype.categories) < len(cat.dtype.categories)):
# remove all _codes which are larger and set to -1/NaN
self._codes[self._codes >= len(new_dtype.categories)] = -1

Since the new object is being referred to as cat instead of self, the existing version didn't actually change the _codes of the resulting object. The segfault would occur when try to view the resulting Categorical, as you'd have take_1d trying to take out of bounds _codes here:

ret = take_1d(self.categories.values, self._codes)

@jschendel jschendel added Bug Categorical Categorical Data Type Segfault Non-Recoverable Error labels Jan 9, 2019
@jschendel jschendel added this to the 0.24.0 milestone Jan 9, 2019
else:
codes = _recode_for_categories(self.codes, self.categories,
codes = _recode_for_categories(cat.codes, cat.categories,
Copy link
Member Author

@jschendel jschendel Jan 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change shouldn't alter existing behavior, as self and cat should be identical within this branch of the if/else. More so a defensive future-proofing change in case future modifications cause a divergence between self and cat prior to this. All operations at this point should be in relation to cat anyways, as that's the object we'll be returning.

@jreback jreback merged commit 8de2a92 into pandas-dev:master Jan 9, 2019
@jreback
Copy link
Contributor

jreback commented Jan 9, 2019

thanks @jschendel

@jschendel jschendel deleted the set-categories-segfault branch January 9, 2019 15:47
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Categorical Categorical Data Type Segfault Non-Recoverable Error
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segmentation fault after set_categories()
2 participants