Skip to content

Inconsistent .loc slicing behaviour with NaNs in MultiIndex dataframe #29751

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pepicello opened this issue Nov 20, 2019 · 8 comments · Fixed by #38772
Closed

Inconsistent .loc slicing behaviour with NaNs in MultiIndex dataframe #29751

pepicello opened this issue Nov 20, 2019 · 8 comments · Fixed by #38772
Assignees
Labels
good first issue Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@pepicello
Copy link
Contributor

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

a = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]]).set_index([0, 1])
b = pd.DataFrame([[1, np.nan, 3], [4, np.nan, 6], [7, np.nan, 9]]).set_index([0, 1])
a_idx = a.index[1]
b_idx = b.index[1]
>>> a.loc[a_idx:]
     2
0 1
4 5  6
7 8  9
>>> a.loc[:a_idx]
     2
0 1
1 2  3
4 5  6
>>> b.loc[b_idx:]
       2
0 1
7 NaN  9
>>> b.loc[:b_idx]
       2
0 1
1 NaN  3
4 NaN  6

Problem description

As above, slicing on MultiIndex has different behaviours with nans: it excludes the current index of selection from the slice. This is possibly because - rightly so - there is no match on NaNs when comparing the indices, but if that is the case, this should also not work:

>>> b.loc[b_idx]
2    6
Name: (4, nan), dtype: int64

Note: might be a duplicate of #25154, but OP does not slice directly on the MultiIndex with NaNs, so I am not sure if fixing that would also fix this issue.

Expected Output

>>> b.loc[b_idx:]
       2
0 1
4 NaN  6
7 NaN  9

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 0.25.1
numpy : 1.16.4
pytz : 2019.2
dateutil : 2.8.0
pip : 19.2.2
setuptools : 41.0.1
Cython : 0.29.13
pytest : 5.1.2
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 1.1.8
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.8.0
pandas_datareader: None
bs4 : None
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.1
numexpr : None
odfpy : None
openpyxl : 2.6.2
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.7
tables : None
xarray : 0.12.3
xlrd : 1.2.0
xlwt : None
xlsxwriter : 1.1.8

@jbrockmendel jbrockmendel added Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex labels Dec 11, 2019
@phofl
Copy link
Member

phofl commented Nov 23, 2020

Works now.

     2
0 1   
4 5  6
7 8  9
     2
0 1   
1 2  3
4 5  6
       2
0 1     
4 NaN  6
7 NaN  9
       2
0 1     
1 NaN  3
4 NaN  6

May need some tests

@phofl phofl added good first issue Needs Tests Unit test(s) needed to prevent regressions labels Nov 23, 2020
@ankushduacodes
Copy link
Contributor

Hi @phofl, I would love to contribute to this issue, Please let me know if I may :)

@phofl
Copy link
Member

phofl commented Dec 6, 2020

Yes for sure. As long as it is unassigned, everyone is free to work on

@kasim95
Copy link
Contributor

kasim95 commented Dec 29, 2020

@ankushduacodes seems like you are not working on this issue as it is unassigned

@kasim95
Copy link
Contributor

kasim95 commented Dec 29, 2020

take

@ankushduacodes
Copy link
Contributor

@kasim95 hi... I didn't know contributors could assign themselves as i am new to the repo. But fair enough, I will work on some other issue.

@kasim95
Copy link
Contributor

kasim95 commented Dec 29, 2020

Oh, i assumed that the issue is stale since there was no comment/update in the past 21 days. Are you still working on this issue? Let me know if you want to work on it so that I'll unassign myself.

@ankushduacodes
Copy link
Contributor

I was just busy working on some other stuff. I have not started working on it. So its all you. Thank your for confirmation.😀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants