Skip to content

Multiindex slicing with NaNs, unexpected results #25154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tunnij opened this issue Feb 5, 2019 · 6 comments · Fixed by #39356
Closed

Multiindex slicing with NaNs, unexpected results #25154

tunnij opened this issue Feb 5, 2019 · 6 comments · Fixed by #39356
Assignees
Labels
good first issue Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate MultiIndex Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@tunnij
Copy link

tunnij commented Feb 5, 2019

Code Sample, a copy-pastable example if possible

import pandas as pd
df = pd.DataFrame(
    pd.np.random.rand(2, 3), 
    columns=pd.MultiIndex.from_tuples([('a', 'foo'), ('b', 'bar'), ('b', pd.np.nan)], names=['first','second'])
)
# EXPECTED slicing everything on first level
df.loc[:, (['a', 'b'])]
Out[35]: 
first          a         b          
second       foo       bar       NaN
0       0.678021  0.383672  0.074164
1       0.738492  0.992545  0.661247

# EXPECTED just slicing one value from first level
df.loc[:, (['b'])]
Out[29]: 
first          b          
second       bar       NaN
0       0.383672  0.074164
1       0.992545  0.661247

# EXPECTED slicing out b, bar
df.loc[:, (['b'], ['bar'])]
Out[33]: 
first          b
second       bar
0       0.383672
1       0.992545

# UNEXPECTED slicing out b, nan
df.loc[:, (['b'], [pd.np.nan])]
Out[36]: 
Empty DataFrame
Columns: []
Index: [0, 1]

# UNEXPECTED slicing out b, [nan, 'bar']
df.loc[:, (['b'], ['bar', pd.np.nan])]
Out[39]: 
first          b
second       bar
0       0.383672
1       0.992545

# EXPECTED slicing out b, nan without the index
df.loc[:, ('b', pd.np.nan)]
Out[37]: 
0    0.074164
1    0.661247
Name: (b, nan), dtype: float64

Problem description

When trying to slice out multiple values from a particular level including levels with a nan value, the levels with nan are not retrieved.

Expected Output

Both of these I expect to work:

df.loc[:, (['b'], ['bar', pd.np.nan])]
Out[40]: 
first          b          
second       bar       NaN
0       0.383672  0.074164
1       0.992545  0.661247

df.loc[:, (['b'], [pd.np.nan])]
Out[40]: 
first          b          
second       NaN
0       0.074164
1       0.661247

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 2.7.15.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-327.36.3.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.22.0
pytest: 3.10.0
pip: 18.1
setuptools: 40.5.0
Cython: 0.28.5
numpy: 1.14.2
scipy: 1.0.1
pyarrow: None
xarray: 0.10.9
IPython: 5.8.0
sphinx: 1.8.1
patsy: 0.5.1
dateutil: 2.7.2
pytz: 2018.7
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.7
feather: None
matplotlib: 2.2.3
openpyxl: 2.5.9
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.1.2
lxml: 4.2.1
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: None
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@gfyoung gfyoung added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate MultiIndex labels Feb 7, 2019
@gfyoung
Copy link
Member

gfyoung commented Feb 7, 2019

cc @toobaz

@toobaz
Copy link
Member

toobaz commented Feb 7, 2019

Definitely a bug. There might be a cleaner fix, but for the moment replacing the following line

code = level_index.get_loc(key)

with

python3 code = -1 if isna(key) else level_index.get_loc(key)

should do.

@phofl
Copy link
Member

phofl commented Nov 23, 2020

This works now. May need some tests.

first          b
second       NaN
0       0.299683
1       0.267543
first          b          
second       bar       NaN
0       0.623092  0.299683
1       0.581611  0.267543

@phofl phofl added good first issue Needs Tests Unit test(s) needed to prevent regressions labels Nov 23, 2020
@theodorju
Copy link
Contributor

Hi, I'd like to try and work on adding some tests for this one.

@phofl
Copy link
Member

phofl commented Jan 12, 2021

As long as no one is asigned you are free to go :)

@theodorju
Copy link
Contributor

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate MultiIndex Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants