Skip to content

pd.Series.map never maps NAs through a dictionary #17648

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Kodiologist opened this issue Sep 23, 2017 · 5 comments · Fixed by #29367
Closed

pd.Series.map never maps NAs through a dictionary #17648

Kodiologist opened this issue Sep 23, 2017 · 5 comments · Fixed by #29367
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions

Comments

@Kodiologist
Copy link
Contributor

Code Sample

import pandas as pd, numpy as np
print(pd.Series([1, 2, np.nan]).map({1: "a", 2: "b", np.nan: "c"}))

This prints:

0      a
1      b
2    NaN
dtype: object

Problem description

The parameter na_action of pd.Series.map is ignored if the first argument of map is a dictionary. Rather than mapping NAs if na_action is None and passing them through unaltered if it's "ignore", pandas always passes NAs through unaltered, as if na_action were "ignore". (The default is None.)

Expected Output

0      a
1      b
2      c
dtype: object

Output of pd.show_versions()

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.10.0-33-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.20.3
pytest: 3.0.7
pip: 9.0.1
setuptools: 36.0.1
Cython: 0.25.2
numpy: 1.13.1
scipy: 0.19.1
xarray: None
IPython: None
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: None
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.5
s3fs: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Sep 23, 2017

np.nan is a bit odd to use as a dictionary key. any reason you don't want to use .fillna()?

e.g.

In [9]: s.map(pd.Series({1: "a", 2: "b", np.nan: "c"})).fillna('c')
Out[9]: 
0    a
1    b
2    c
dtype: object

# or replace
In [12]: s.replace({1: "a", 2: "b", np.nan: "c"})
Out[12]: 
0    a
1    b
2    c
dtype: object

@Kodiologist
Copy link
Contributor Author

Kodiologist commented Sep 23, 2017

In my case, I have some series I got out of a Stata dataset that I'm turning into Categoricals based on a codebook. What appears as an NA in the original series is often given a descriptive label by the codebook, analogously to the non-missing values. And I want to preserve the order of categories and the presence of categories with no corresponding values. So I wrote

  v = v.map(codes).astype("category", categories = val_labels)

But replace seems to work fine, so I'll use that instead. Thanks. (The documentation for a dict to_replace in Series.replace looks weird, by the way. Perhaps it was incorrectly copied from DataFrame.replace.)

@jreback
Copy link
Contributor

jreback commented Sep 23, 2017

@Kodiologist ok, so pls submit a PR for the docs issue. always best from a user perspective.

I'll mark this as a bug in any event, its a bit tricky though. welcome to have a look.

@jreback jreback added Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate API Design and removed Bug labels Sep 23, 2017
@jreback jreback added this to the Next Major Release milestone Sep 23, 2017
@jreback
Copy link
Contributor

jreback commented Nov 25, 2017

xref #14210

@mroeschke
Copy link
Member

Looks to work on master. Could use a test.

In [59]: print(pd.Series([1, 2, np.nan]).map({1: "a", 2: "b", np.nan: "c"}))
    ...:
0    a
1    b
2    c
dtype: object

In [60]: pd.__version__
Out[60]: '0.26.0.dev0+593.g9d45934af'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed API Design Difficulty Intermediate Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Oct 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants