Skip to content

Series.map on a Categorical with a function only processes each unique value once #15706

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Kodiologist opened this issue Mar 16, 2017 · 2 comments
Labels
Categorical Categorical Data Type Docs

Comments

@Kodiologist
Copy link
Contributor

Code Sample

>>> import pandas as pd
>>> pd.Series(list("cabaa")).map(print)
c
a
b
a
a
0    None
1    None
2    None
3    None
4    None
dtype: object
>>> pd.Series(list("cabaa"), dtype = "category").map(print)
a
b
c
/usr/local/lib/python3.6/site-packages/pandas/core/categorical.py:952: FutureWarning: 
Setting NaNs in `categories` is deprecated and will be removed in a future version of pandas.
  ordered=self.ordered)
0    None
1    None
2    None
3    None
4    None
dtype: object

Problem description

Either Categorical Series should act the same as other Series, or this special behavior for Categoricals should be mentioned in the documentation for Series.map.

A limitation of the current behavior for Categoricals is that the function is never called for any nulls (codes equal to -1).

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.0.final.0
python-bits: 64
OS: Linux
OS-release: 4.8.0-37-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 28.8.0
Cython: None
numpy: 1.12.0
scipy: 0.18.1
statsmodels: 0.8.0
xarray: None
IPython: None
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 2.0.0
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Mar 16, 2017

this is the point of a categorical. by-definition you have uniques which you can act on.

I suppose you could add some more documentation (or an example).

@topper-123
Copy link
Contributor

It's like mentioned by @jreback: Categorical.map is designed to work on the categories, so will not necessarily work on each value in the array. This is mentioned in the doc string "Maps the categories to new categories.", but could maybe be written clearer.

So IMO everything works as intended and this issue can be.closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Docs
Projects
None yet
Development

No branches or pull requests

4 participants