Skip to content

Backport PR #42087: REGR: undocumented astype("category").astype(str) type inconsistency between pandas 1.1 & 1.2 #42165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.2.5.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Fixed regressions
- Regression in :func:`read_csv` when using ``memory_map=True`` with an non-UTF8 encoding (:issue:`40986`)
- Regression in :meth:`DataFrame.replace` and :meth:`Series.replace` when the values to replace is a NumPy float array (:issue:`40371`)
- Regression in :func:`ExcelFile` when a corrupt file is opened but not closed (:issue:`41778`)
- Fixed regression in :meth:`DataFrame.astype` with ``dtype=str`` failing to convert ``NaN`` in categorical columns (:issue:`41797`)

.. ---------------------------------------------------------------------------

Expand Down
9 changes: 7 additions & 2 deletions pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

from pandas._config import get_option

from pandas._libs import NaT, algos as libalgos, hashtable as htable
from pandas._libs import NaT, algos as libalgos, hashtable as htable, lib
from pandas._libs.lib import no_default
from pandas._typing import ArrayLike, Dtype, Ordered, Scalar
from pandas.compat.numpy import function as nv
Expand Down Expand Up @@ -429,14 +429,19 @@ def astype(self, dtype: Dtype, copy: bool = True) -> ArrayLike:
try:
new_cats = np.asarray(self.categories)
new_cats = new_cats.astype(dtype=dtype, copy=copy)
fill_value = lib.item_from_zerodim(np.array(np.nan).astype(dtype))
except (
TypeError, # downstream error msg for CategoricalIndex is misleading
ValueError,
):
msg = f"Cannot cast {self.categories.dtype} dtype to {dtype}"
raise ValueError(msg)

result = take_1d(new_cats, libalgos.ensure_platform_int(self._codes))
result = take_1d(
new_cats,
libalgos.ensure_platform_int(self._codes),
fill_value=fill_value,
)

return result

Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/frame/methods/test_astype.py
Original file line number Diff line number Diff line change
Expand Up @@ -616,3 +616,11 @@ def test_astype_bytes(self):
# GH#39474
result = DataFrame(["foo", "bar", "baz"]).astype(bytes)
assert result.dtypes[0] == np.dtype("S3")

def test_astype_categorical_to_string_missing(self):
# https://github.com/pandas-dev/pandas/issues/41797
df = DataFrame(["a", "b", np.nan])
expected = df.astype(str)
cat = df.astype("category")
result = cat.astype(str)
tm.assert_frame_equal(result, expected)