Skip to content

Inconsistent behavior of .map() #20495

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
readyready15728 opened this issue Mar 27, 2018 · 17 comments · Fixed by #20574
Closed

Inconsistent behavior of .map() #20495

readyready15728 opened this issue Mar 27, 2018 · 17 comments · Fixed by #20574
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Reshaping Concat, Merge/Join, Stack/Unstack, Explode Testing pandas testing functions or related to the test suite
Milestone

Comments

@readyready15728
Copy link
Contributor

readyready15728 commented Mar 27, 2018

Code Sample, a copy-pastable example if possible

pd.Series(list('abc') + [np.nan]).map({np.nan: 'bar'})

Result:

0   NaN
1   NaN
2   NaN
3   NaN
dtype: float64

Compare:

pd.Series(list('abc') + [np.nan]).map({'a': 'foo', np.nan: 'bar'})

Result:

0    foo
1    NaN
2    NaN
3    bar
dtype: object

Problem description

If only np.nan is present in the mapping, then the associated mapping is not carried out. If another value is present, then it is carried out. Forgive me if this is intended behavior for some reason.

Expected Output

I'm not sure what is appropriate here.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 32 OS: Linux OS-release: 4.4.0-116-generic machine: i686 processor: i686 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: None
pip: 8.1.1
setuptools: 20.7.0
Cython: None
numpy: 1.11.0
scipy: 0.17.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: 0.5.0
dateutil: 2.4.2
pytz: 2014.10
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.4.3
feather: None
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.4.1
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.6.0

@WillAyd
Copy link
Member

WillAyd commented Mar 27, 2018

Strange but it looks like this works fine on master:

In [3]: pd.Series(list('abc') + [np.nan]).map({np.nan: 'bar'})
Out[3]: 
0    NaN
1    NaN
2    NaN
3    bar
dtype: object

INSTALLED VERSIONS

commit: c36e9f7
python: 3.6.4.final.0
python-bits: 64
OS: Darwin
OS-release: 17.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.0.dev0+685.gc36e9f705
pytest: 3.4.1
pip: 9.0.1
setuptools: 38.5.1
Cython: 0.27.3
numpy: 1.14.1
scipy: 1.0.0
pyarrow: 0.8.0
xarray: 0.10.0
IPython: 6.2.1
sphinx: 1.7.0
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.1.2
openpyxl: 2.5.0
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.1
pymysql: 0.8.0
psycopg2: None
jinja2: 2.10
s3fs: 0.1.3
fastparquet: 0.1.4
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Mar 27, 2018

yeah this lgtm. can one of you check if we have a test covering this case, if not pls add (advise either way).

@jreback jreback added Testing pandas testing functions or related to the test suite Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Mar 27, 2018
@jreback jreback added this to the 0.23.0 milestone Mar 27, 2018
@WillAyd
Copy link
Member

WillAyd commented Mar 29, 2018

I looked through our tests but couldn't find anything to really cover this. From what I can tell there are a few similar tests in pandas/tests/series/test_apply.py but none of them mix NA and non-NA values in the caller.

@readyready15728 any interest in trying your hand at a PR to add this scenario? I think the module above would be a good location unless @jreback disagrees

@readyready15728
Copy link
Contributor Author

@WillAyd

I barely know anything about pull requests. Could you be a little more specific about how I should proceed?

@WillAyd
Copy link
Member

WillAyd commented Mar 29, 2018

There is a pretty detailed guide for contributing to pandas - give that a look and let me know if you have any questions.

https://pandas.pydata.org/pandas-docs/stable/contributing.html

@readyready15728
Copy link
Contributor Author

OK so do you intend that I submit a pull request to the code that implements .map() itself or the test code? And what is the intended behavior of .map()? I'm not sure which is right.

@WillAyd
Copy link
Member

WillAyd commented Mar 29, 2018

You should be adding a test to pandas/tests/series/test_apply.py. You can probably place it right below functions like test_empty and call it something like test_missing_mixed.

The expected value would be your second example from your original post

@readyready15728
Copy link
Contributor Author

readyready15728 commented Mar 30, 2018

Alright, I'm virtually certain that my addition will work but I want to test it first. To do so, I need to get the tests to run. pytest pandas said that I need to do python setup.py build_ext --inplace --force. I did that but then I got more errors:

Exception: Cython-generated file 'pandas/_libs/reduction.c' not found.
                Cython is required to compile pandas from a development branch.
                Please install Cython or download a release package of pandas.

But I have Cython installed. :?

(Am running Ubuntu 16.04)

EDIT: I did some Googling and someone suggested using pip to get the latest version and now the Cython extensions are compiling.

@readyready15728
Copy link
Contributor Author

readyready15728 commented Mar 30, 2018

There is something else about the behavior of .map() on my previous version (0.22) that may interest you:

pd.Series([42] + [np.nan]).map({42: 'the answer', np.nan: 'not NaN'})

Has the result:

0    the answer
1           NaN
dtype: object

In other words, the intended effect doesn't happen when there's just an integer alongside np.nan in the map. With strings the intended effect obviously happens and also with tuples but I didn't test floats and now I have the development version installed so going back will just be a pain. Might want to review 0.22 to see if any other bugs are lurking in there.

I am almost ready to do a pull request but I'd like a little guidance about creating a whatsnew entry, if it is needed at all.

@WillAyd
Copy link
Member

WillAyd commented Apr 1, 2018

Sorry for the delay in response. Since you aren't changing any of the core code (just adding tests) I don't think you need a whatsnew entry, since there's nothing that you can link to here that would change behavior.

That said, there's already a map change in the v0.23 whatsnew (see #15081) . If you are feeling ambitious, you could take a look at that and see if it is responsible for the fix here. If so, we can update that existing note to mention the bug being fixed and link to this issue as well

@readyready15728
Copy link
Contributor Author

Frankly I wouldn't know how to do that. Additionally git diff upstream/master -u -- "*.py" | flake8 --diff gives fatal: bad revision 'upstream/master'.

@WillAyd
Copy link
Member

WillAyd commented Apr 1, 2018

Did you configure upstream to point to this repository? git remote show -v should yield at least the following:

git remote -v show
upstream	https://github.com/pandas-dev/pandas.git (fetch)
upstream	https://github.com/pandas-dev/pandas.git (push)

@readyready15728
Copy link
Contributor Author

readyready15728 commented Apr 2, 2018

No, it has only my forked repository. [dumb comment by me] I'm accustomed to working only on my own repos.

EDIT: OK never mind I see what is going on here.

@WillAyd
Copy link
Member

WillAyd commented Apr 2, 2018

Make sure you check out the contributing guide as it gives instructions on how to properly set your upstream repo. You will need this for comparisons and pulling updates to master

https://pandas.pydata.org/pandas-docs/stable/contributing.html

@readyready15728
Copy link
Contributor Author

I did git remote add upstream https://github.com/pandas-dev/pandas.git and I'm still getting the same error message. What the hell?

@WillAyd
Copy link
Member

WillAyd commented Apr 2, 2018

Did you git fetch upstream? I've never gotten that error before but it would make sense that the comp fails if a copy of the upstream repo doesn't exist locally

@readyready15728
Copy link
Contributor Author

Alright, it succeeded. While my test works I still might not meet all your expectations. A pull request has been submitted anyway. We can go back and forth until it's acceptable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Reshaping Concat, Merge/Join, Stack/Unstack, Explode Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants