Skip to content

DataFrame.replace doesn't work with categorical data #21726

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lovasoa opened this issue Jul 4, 2018 · 1 comment
Closed

DataFrame.replace doesn't work with categorical data #21726

lovasoa opened this issue Jul 4, 2018 · 1 comment
Labels
Duplicate Report Duplicate issue or pull request

Comments

@lovasoa
Copy link

lovasoa commented Jul 4, 2018

Code Sample

import pandas as pd
df = pd.DataFrame(["x", "y", "z"], dtype='category')
df.replace({"x": "y"})

Problem description

Current output
~/.local/lib/python3.6/site-packages/pandas/core/internals.py in __init__(self, values, placement, ndim, fastpath)
    109             ndim = values.ndim
    110         elif values.ndim != ndim:
--> 111             raise ValueError('Wrong number of dimensions')
    112         self.ndim = ndim
    113 

ValueError: Wrong number of dimensions
Expected output

The following DataFrame:

   0
0  y
1  y
2  z
Explanation

At the very least, the error message is confusing. But I think this is a bug in the replace method, that should just work with categorical data.

This problem has confused other people before: https://stackoverflow.com/questions/48807344/how-to-replace-values-in-multiple-categoricals-in-a-pandas-dataframe

Output of pd.show_versions()

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.17.3-200.fc28.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8

pandas: 0.22.0
pytest: 3.3.2
pip: 9.0.3
setuptools: 39.2.0
Cython: 0.27.3
numpy: 1.14.1
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: 1.2.7
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@TomAugspurger
Copy link
Contributor

Duplicate of #18250

As of 0.23 or master (not sure which) the categorical is coerced to objects and the replacement happens. Ideally it wouldn't coerce to object.

@TomAugspurger TomAugspurger added the Duplicate Report Duplicate issue or pull request label Jul 4, 2018
@TomAugspurger TomAugspurger added this to the No action milestone Jul 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

2 participants