Inconsistent behavior of .map() #20495

readyready15728 · 2018-03-27T02:42:39Z

Code Sample, a copy-pastable example if possible

pd.Series(list('abc') + [np.nan]).map({np.nan: 'bar'})

Result:

0   NaN
1   NaN
2   NaN
3   NaN
dtype: float64

Compare:

pd.Series(list('abc') + [np.nan]).map({'a': 'foo', np.nan: 'bar'})

Result:

0    foo
1    NaN
2    NaN
3    bar
dtype: object

Problem description

If only np.nan is present in the mapping, then the associated mapping is not carried out. If another value is present, then it is carried out. Forgive me if this is intended behavior for some reason.

Expected Output

I'm not sure what is appropriate here.

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 32 OS: Linux OS-release: 4.4.0-116-generic machine: i686 processor: i686 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: None
pip: 8.1.1
setuptools: 20.7.0
Cython: None
numpy: 1.11.0
scipy: 0.17.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: 0.5.0
dateutil: 2.4.2
pytz: 2014.10
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.4.3
feather: None
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.4.1
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.6.0

The text was updated successfully, but these errors were encountered:

WillAyd · 2018-03-27T05:47:08Z

Strange but it looks like this works fine on master:

In [3]: pd.Series(list('abc') + [np.nan]).map({np.nan: 'bar'})
Out[3]: 
0    NaN
1    NaN
2    NaN
3    bar
dtype: object

INSTALLED VERSIONS

commit: c36e9f7
python: 3.6.4.final.0
python-bits: 64
OS: Darwin
OS-release: 17.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.0.dev0+685.gc36e9f705
pytest: 3.4.1
pip: 9.0.1
setuptools: 38.5.1
Cython: 0.27.3
numpy: 1.14.1
scipy: 1.0.0
pyarrow: 0.8.0
xarray: 0.10.0
IPython: 6.2.1
sphinx: 1.7.0
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.1.2
openpyxl: 2.5.0
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.1
pymysql: 0.8.0
psycopg2: None
jinja2: 2.10
s3fs: 0.1.3
fastparquet: 0.1.4
pandas_gbq: None
pandas_datareader: None

jreback · 2018-03-27T09:55:03Z

yeah this lgtm. can one of you check if we have a test covering this case, if not pls add (advise either way).

WillAyd · 2018-03-29T05:11:21Z

I looked through our tests but couldn't find anything to really cover this. From what I can tell there are a few similar tests in pandas/tests/series/test_apply.py but none of them mix NA and non-NA values in the caller.

@readyready15728 any interest in trying your hand at a PR to add this scenario? I think the module above would be a good location unless @jreback disagrees

readyready15728 · 2018-03-29T05:16:37Z

@WillAyd

I barely know anything about pull requests. Could you be a little more specific about how I should proceed?

WillAyd · 2018-03-29T05:21:07Z

There is a pretty detailed guide for contributing to pandas - give that a look and let me know if you have any questions.

https://pandas.pydata.org/pandas-docs/stable/contributing.html

readyready15728 · 2018-03-29T06:10:26Z

OK so do you intend that I submit a pull request to the code that implements .map() itself or the test code? And what is the intended behavior of .map()? I'm not sure which is right.

WillAyd · 2018-03-29T18:18:51Z

You should be adding a test to pandas/tests/series/test_apply.py. You can probably place it right below functions like test_empty and call it something like test_missing_mixed.

The expected value would be your second example from your original post

readyready15728 · 2018-03-30T10:34:10Z

Alright, I'm virtually certain that my addition will work but I want to test it first. To do so, I need to get the tests to run. pytest pandas said that I need to do python setup.py build_ext --inplace --force. I did that but then I got more errors:

Exception: Cython-generated file 'pandas/_libs/reduction.c' not found.
                Cython is required to compile pandas from a development branch.
                Please install Cython or download a release package of pandas.

But I have Cython installed. :?

(Am running Ubuntu 16.04)

EDIT: I did some Googling and someone suggested using pip to get the latest version and now the Cython extensions are compiling.

readyready15728 · 2018-03-30T13:20:01Z

There is something else about the behavior of .map() on my previous version (0.22) that may interest you:

pd.Series([42] + [np.nan]).map({42: 'the answer', np.nan: 'not NaN'})

Has the result:

0    the answer
1           NaN
dtype: object

In other words, the intended effect doesn't happen when there's just an integer alongside np.nan in the map. With strings the intended effect obviously happens and also with tuples but I didn't test floats and now I have the development version installed so going back will just be a pain. Might want to review 0.22 to see if any other bugs are lurking in there.

I am almost ready to do a pull request but I'd like a little guidance about creating a whatsnew entry, if it is needed at all.

WillAyd · 2018-04-01T19:19:07Z

Sorry for the delay in response. Since you aren't changing any of the core code (just adding tests) I don't think you need a whatsnew entry, since there's nothing that you can link to here that would change behavior.

That said, there's already a map change in the v0.23 whatsnew (see #15081) . If you are feeling ambitious, you could take a look at that and see if it is responsible for the fix here. If so, we can update that existing note to mention the bug being fixed and link to this issue as well

readyready15728 · 2018-04-01T20:10:48Z

Frankly I wouldn't know how to do that. Additionally git diff upstream/master -u -- "*.py" | flake8 --diff gives fatal: bad revision 'upstream/master'.

WillAyd · 2018-04-01T20:34:08Z

Did you configure upstream to point to this repository? git remote show -v should yield at least the following:

git remote -v show
upstream	https://github.com/pandas-dev/pandas.git (fetch)
upstream	https://github.com/pandas-dev/pandas.git (push)

readyready15728 · 2018-04-02T04:04:08Z

No, it has only my forked repository. [dumb comment by me] I'm accustomed to working only on my own repos.

EDIT: OK never mind I see what is going on here.

WillAyd · 2018-04-02T04:05:46Z

Make sure you check out the contributing guide as it gives instructions on how to properly set your upstream repo. You will need this for comparisons and pulling updates to master

https://pandas.pydata.org/pandas-docs/stable/contributing.html

readyready15728 · 2018-04-02T04:09:11Z

I did git remote add upstream https://github.com/pandas-dev/pandas.git and I'm still getting the same error message. What the hell?

WillAyd · 2018-04-02T04:12:00Z

Did you git fetch upstream? I've never gotten that error before but it would make sense that the comp fails if a copy of the upstream repo doesn't exist locally

readyready15728 · 2018-04-02T04:23:49Z

Alright, it succeeded. While my test works I still might not meet all your expectations. A pull request has been submitted anyway. We can go back and forth until it's acceptable.

Closes #20495

jreback added Testing pandas testing functions or related to the test suite Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Mar 27, 2018

jreback added this to the 0.23.0 milestone Mar 27, 2018

readyready15728 mentioned this issue Apr 2, 2018

Adding test_map_missing_mixed to test_apply.py in pandas test suite series #20574

Merged

4 tasks

TomAugspurger closed this as completed in #20574 Apr 3, 2018

TomAugspurger pushed a commit that referenced this issue Apr 3, 2018

TST: Added test for mapping on mixed NA (#20574)

85c7900

Closes #20495

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent behavior of .map() #20495

Inconsistent behavior of .map() #20495

readyready15728 commented Mar 27, 2018 •

edited

Loading

WillAyd commented Mar 27, 2018

INSTALLED VERSIONS

jreback commented Mar 27, 2018

WillAyd commented Mar 29, 2018

readyready15728 commented Mar 29, 2018

WillAyd commented Mar 29, 2018

readyready15728 commented Mar 29, 2018

WillAyd commented Mar 29, 2018

readyready15728 commented Mar 30, 2018 •

edited

Loading

readyready15728 commented Mar 30, 2018 •

edited

Loading

WillAyd commented Apr 1, 2018

readyready15728 commented Apr 1, 2018

WillAyd commented Apr 1, 2018

readyready15728 commented Apr 2, 2018 •

edited

Loading

WillAyd commented Apr 2, 2018

readyready15728 commented Apr 2, 2018

WillAyd commented Apr 2, 2018

readyready15728 commented Apr 2, 2018

Inconsistent behavior of .map() #20495

Inconsistent behavior of .map() #20495

Comments

readyready15728 commented Mar 27, 2018 • edited Loading

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

WillAyd commented Mar 27, 2018

INSTALLED VERSIONS

jreback commented Mar 27, 2018

WillAyd commented Mar 29, 2018

readyready15728 commented Mar 29, 2018

WillAyd commented Mar 29, 2018

readyready15728 commented Mar 29, 2018

WillAyd commented Mar 29, 2018

readyready15728 commented Mar 30, 2018 • edited Loading

readyready15728 commented Mar 30, 2018 • edited Loading

WillAyd commented Apr 1, 2018

readyready15728 commented Apr 1, 2018

WillAyd commented Apr 1, 2018

readyready15728 commented Apr 2, 2018 • edited Loading

WillAyd commented Apr 2, 2018

readyready15728 commented Apr 2, 2018

WillAyd commented Apr 2, 2018

readyready15728 commented Apr 2, 2018

readyready15728 commented Mar 27, 2018 •

edited

Loading

Output of `pd.show_versions()`

readyready15728 commented Mar 30, 2018 •

edited

Loading

readyready15728 commented Mar 30, 2018 •

edited

Loading

readyready15728 commented Apr 2, 2018 •

edited

Loading