Skip to content

Commit ff69f45

Browse files
JustinZhengBCTomAugspurger
authored andcommitted
GH24241 make Categorical.map transform nans (#24275)
* BUG-24241 make Categorical.map transform nans
1 parent f6cf7d9 commit ff69f45

File tree

4 files changed

+32
-2
lines changed

4 files changed

+32
-2
lines changed

doc/source/categorical.rst

+2-1
Original file line numberDiff line numberDiff line change
@@ -1145,7 +1145,8 @@ dtype in apply
11451145

11461146
Pandas currently does not preserve the dtype in apply functions: If you apply along rows you get
11471147
a `Series` of ``object`` `dtype` (same as getting a row -> getting one element will return a
1148-
basic type) and applying along columns will also convert to object.
1148+
basic type) and applying along columns will also convert to object. ``NaN`` values are unaffected.
1149+
You can use ``fillna`` to handle missing values before applying a function.
11491150

11501151
.. ipython:: python
11511152

doc/source/whatsnew/v0.24.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -1284,6 +1284,7 @@ Categorical
12841284
- Bug when resampling :meth:`DataFrame.resample()` and aggregating on categorical data, the categorical dtype was getting lost. (:issue:`23227`)
12851285
- Bug in many methods of the ``.str``-accessor, which always failed on calling the ``CategoricalIndex.str`` constructor (:issue:`23555`, :issue:`23556`)
12861286
- Bug in :meth:`Series.where` losing the categorical dtype for categorical data (:issue:`24077`)
1287+
- Bug in :meth:`Categorical.apply` where ``NaN`` values could be handled unpredictably. They now remain unchanged (:issue:`24241`)
12871288

12881289
Datetimelike
12891290
^^^^^^^^^^^^

pandas/core/arrays/categorical.py

+6-1
Original file line numberDiff line numberDiff line change
@@ -1166,7 +1166,7 @@ def map(self, mapper):
11661166
Maps the categories to new categories. If the mapping correspondence is
11671167
one-to-one the result is a :class:`~pandas.Categorical` which has the
11681168
same order property as the original, otherwise a :class:`~pandas.Index`
1169-
is returned.
1169+
is returned. NaN values are unaffected.
11701170
11711171
If a `dict` or :class:`~pandas.Series` is used any unmapped category is
11721172
mapped to `NaN`. Note that if this happens an :class:`~pandas.Index`
@@ -1234,6 +1234,11 @@ def map(self, mapper):
12341234
categories=new_categories,
12351235
ordered=self.ordered)
12361236
except ValueError:
1237+
# NA values are represented in self._codes with -1
1238+
# np.take causes NA values to take final element in new_categories
1239+
if np.any(self._codes == -1):
1240+
new_categories = new_categories.insert(len(new_categories),
1241+
np.nan)
12371242
return np.take(new_categories, self._codes)
12381243

12391244
__eq__ = _cat_compare_op('__eq__')

pandas/tests/indexes/test_category.py

+23
Original file line numberDiff line numberDiff line change
@@ -311,6 +311,29 @@ def test_map_with_categorical_series(self):
311311
exp = pd.Index(["odd", "even", "odd", np.nan])
312312
tm.assert_index_equal(a.map(c), exp)
313313

314+
@pytest.mark.parametrize(
315+
(
316+
'data',
317+
'f'
318+
),
319+
(
320+
([1, 1, np.nan], pd.isna),
321+
([1, 2, np.nan], pd.isna),
322+
([1, 1, np.nan], {1: False}),
323+
([1, 2, np.nan], {1: False, 2: False}),
324+
([1, 1, np.nan], pd.Series([False, False])),
325+
([1, 2, np.nan], pd.Series([False, False, False]))
326+
))
327+
def test_map_with_nan(self, data, f): # GH 24241
328+
values = pd.Categorical(data)
329+
result = values.map(f)
330+
if data[1] == 1:
331+
expected = pd.Categorical([False, False, np.nan])
332+
tm.assert_categorical_equal(result, expected)
333+
else:
334+
expected = pd.Index([False, False, np.nan])
335+
tm.assert_index_equal(result, expected)
336+
314337
@pytest.mark.parametrize('klass', [list, tuple, np.array, pd.Series])
315338
def test_where(self, klass):
316339
i = self.create_index()

0 commit comments

Comments
 (0)