Skip to content

BUG-26988 implement replace for categorical blocks #27026

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Nov 16, 2019
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -585,6 +585,7 @@ Categorical

- Bug in :func:`DataFrame.at` and :func:`Series.at` that would raise exception if the index was a :class:`CategoricalIndex` (:issue:`20629`)
- Fixed bug in comparison of ordered :class:`Categorical` that contained missing values with a scalar which sometimes incorrectly resulted in ``True`` (:issue:`26504`)
- Bug in :func:`DataFrame.replace` and :func:`Series.replace` that would give unusual results on categorical data (:issue:`26988`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"unusual" --> "incorrect"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

-

Datetimelike
Expand Down
23 changes: 23 additions & 0 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -2978,6 +2978,29 @@ def where(self, other, cond, align=True, errors='raise',
axis=axis, transpose=transpose)
return result

def replace(self, to_replace, value, inplace=False, filter=None,
regex=False, convert=True):
if filter is None or not self.mgr_locs.isin(filter).any():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this block looks unrelated to the linked issue. what case is it solving? needs a separate test?

Copy link
Contributor Author

@JustinZhengBC JustinZhengBC Jun 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first case is used when the replace function applies to the entire block, so the replace can be carried out just by manipulating categories. Most calls to Series.replace should be covered with this case (including the new test in series/test_replace.py). If there's a filter, this might not be possible so we fix the categories then use the original implementation (second case)

I added a test in frame/test_replace.py to cover the second case, where a filter is used. It improves upon the previous behaviour, which would have casted the frame to object

result = self if inplace else self.copy()
categories = result.values.categories.tolist()
if to_replace in categories:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we not define a Categorical.replace to handle this code here (from 3013 thru 3020)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, with tests added in pandas/tests/arrays/categorical/test_algos.py

if isna(value):
result.values.remove_categories(to_replace, inplace=True)
else:
index = categories.index(to_replace)
categories[index] = value
result.values.rename_categories(categories, inplace=True)
if convert:
return result.convert(by_item=True, numeric=False,
copy=not inplace)
else:
return result
else:
if not isna(value):
self.values.add_categories(value, inplace=True)
return super().replace(to_replace, value, inplace,
filter, regex, convert)


# -----------------------------------------------------------------
# Constructor Helpers
Expand Down
17 changes: 17 additions & 0 deletions pandas/tests/series/test_replace.py
Original file line number Diff line number Diff line change
Expand Up @@ -292,6 +292,23 @@ def test_replace_categorical(self, categorical, numeric):
expected = pd.Series(numeric)
tm.assert_series_equal(expected, result, check_dtype=False)

def test_replace_categorical_single(self):
# GH 26988
dti = pd.date_range('2016-01-01', periods=3, tz='US/Pacific')
s = pd.Series(dti)
c = s.astype('category')

expected = c.copy()
expected.cat.add_categories('foo', inplace=True)
expected[2] = 'foo'
expected.cat.remove_unused_categories(inplace=True)

result = c.replace(c[2], 'foo')
tm.assert_series_equal(expected, result)

c.replace(c[2], 'foo', inplace=True)
tm.assert_series_equal(expected, c)

def test_replace_with_no_overflowerror(self):
# GH 25616
# casts to object without Exception from OverflowError
Expand Down