Skip to content

GH24241 make Categorical.map transform nans #24275

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Dec 20, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1273,6 +1273,7 @@ Categorical
- Bug when resampling :meth:`Dataframe.resample()` and aggregating on categorical data, the categorical dtype was getting lost. (:issue:`23227`)
- Bug in many methods of the ``.str``-accessor, which always failed on calling the ``CategoricalIndex.str`` constructor (:issue:`23555`, :issue:`23556`)
- Bug in :meth:`Series.where` losing the categorical dtype for categorical data (:issue:`24077`)
- Bug in :meth:`Categorical.apply` where the given function would not be applied to ``NaN`` values (:issue:`24241`)

Datetimelike
^^^^^^^^^^^^
Expand Down
21 changes: 18 additions & 3 deletions pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -1229,11 +1229,26 @@ def map(self, mapper):
Index(['first', 'second', nan], dtype='object')
"""
new_categories = self.categories.map(mapper)

try:
return self.from_codes(self._codes.copy(),
categories=new_categories,
ordered=self.ordered)
if isinstance(mapper, (dict, ABCSeries)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this a special case? use is_dict_like

new_value = mapper[np.nan]
else:
new_value = mapper(np.nan)
except (AttributeError, KeyError, TypeError, ValueError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really comfortable with this. mapper is a user-defined function. Consider

def f(x):
    if isnan(x):
        raise TypeError
    ...

that TypeError would be swallowed by pandas.

new_value = np.nan

try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you try/except here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how / why can this fail

Copy link
Contributor Author

@JustinZhengBC JustinZhengBC Dec 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AttributeError: if mapper calls a method of the element (e.g. lambda x: x.lower())
KeyError: if mapper is a dict without a key for NaN
TypeError: if mapper expects some type other than a float
ValueError: if mapper tries converting float values to ints (e.g. lambda x: int(x))

if you mean the try/except below that, that was already there. from_codes raises a ValueError if the mapping isn't one-to-one

ret = self.from_codes(self._codes.copy(),
categories=new_categories,
ordered=self.ordered)
if new_value not in ret.categories and any(self._codes == -1):
ret.add_categories(new_value, inplace=True)
ret = ret.fillna(new_value)
return ret
except ValueError:
new_categories = new_categories.insert(len(new_categories),
new_value)
return np.take(new_categories, self._codes)

__eq__ = _cat_compare_op('__eq__')
Expand Down
31 changes: 31 additions & 0 deletions pandas/tests/indexes/test_category.py
Original file line number Diff line number Diff line change
Expand Up @@ -311,6 +311,37 @@ def test_map_with_categorical_series(self):
exp = pd.Index(["odd", "even", "odd", np.nan])
tm.assert_index_equal(a.map(c), exp)

@pytest.mark.parametrize('data, f', [[[1, 1, np.nan], pd.isna],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write this as

@pytest.mark.parametrize(
     'data',
    'f',
      [
         ......
      ]))
.....```

[[1, 2, np.nan], pd.isna],
[[1, 1, np.nan], {1: False,
np.nan: True}],
[[1, 2, np.nan], {1: False,
2: False,
np.nan: True}]])
def test_map_fill_nan(self, data, f):
values = pd.Categorical(data)
result = values.map(f)
if data[1] == 1:
expected = pd.Categorical([False, False, True])
tm.assert_categorical_equal(result, expected)
else:
expected = pd.Index([False, False, True])
tm.assert_index_equal(result, expected)

@pytest.mark.parametrize('data, f', [[[1, 1, np.nan], {1: False}],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

[[1, 2, np.nan], {1: False,
2: False}],
[[1, 1, np.nan], pd.Series([False,
False])],
[[1, 2, np.nan], pd.Series([False,
False,
False])]])
def test_map_dont_fill_nan(self, data, f):
values = pd.Categorical(data)
result = values.map(f)
expected = pd.Index([False, False, np.nan])
tm.assert_index_equal(result, expected)

@pytest.mark.parametrize('klass', [list, tuple, np.array, pd.Series])
def test_where(self, klass):
i = self.create_index()
Expand Down