Skip to content

BUG: reset_index with NaN in MultiIndex #6322

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wangshun98 opened this issue Feb 11, 2014 · 7 comments
Closed

BUG: reset_index with NaN in MultiIndex #6322

wangshun98 opened this issue Feb 11, 2014 · 7 comments
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate MultiIndex
Milestone

Comments

@wangshun98
Copy link

This is related to the following commit.
jreback@625ee94

When one level in the MultiIndex is all NaN, line 2819 in DataFrame.reset_index() and subfunction _maybe_cast, will have a problem, because values is empty.
2819: values = values.take(labels)
IndexError: cannot do a non-empty take from an empty axes

A possible fix is to add the following lines before 2819:
if mask.all():
values = labels * np.nan
return values

@jreback
Copy link
Contributor

jreback commented Feb 11, 2014

can U give an example of your use case
an all nan level in a multi index is really really hard to support properly

@wangshun98
Copy link
Author

This comes out of a database query of mixed fields of dates and values. And I want to use the combination of dates as index so I can easily join the values from multiple queries. however, sometimes all dates in one column are all null.

NaN in MultiIndex is already supported. That 3 lines I suggested would make it work with a level with all NaN. We can test case other functionalities. But after I made that change in my local copy, I haven't experienced other issues with all NaN MultiIndex yet.

@jreback
Copy link
Contributor

jreback commented Feb 12, 2014

go ahead and do a PR (with a test case) and i'll take a look

@jreback jreback added this to the 0.15.0 milestone Mar 22, 2014
@boombard
Copy link
Contributor

I can confirm I am also experiencing precisely this issue with an all-NaN level in my multi-index. Looks to me like those three lines should work if included.

@gryBox
Copy link

gryBox commented Nov 7, 2016

@jreback

The code below returns the error: IndexError: cannot do a non-empty take from an empty axes.

import pandas as pd
import numpy as np
import random
import string



# Create 12 random strings 3 char long 
rndm_strgs = [''.join(random.SystemRandom().choice(string.ascii_uppercase + string.digits) for _ in range(3)) for i in range(12)]            
rndm_strgs[0] = None
rndm_strgs[5] = None

# Make Dataframe
df = pd.DataFrame({'A' : list('pandasisgood'),
                   'B' : np.nan,
                   'C' : rndm_strgs,
                   'D' : np.random.rand(12)})

# Set an Index -> Columns have Nans
df_w_idx = df.set_index(['A','B','C'])

# This is where it fails
df_w_idx.reset_index()

This is the desired result:

for clmNm in df_w_idx.index.names:
    print(clmNm)

    print(clmNm)

    df_w_idx[clmNm] = df_w_idx.index.get_level_values(clmNm)


df_w_idx = df_w_idx.reset_index(drop=True).copy()
df_w_idx

@tzinckgraf
Copy link
Contributor

@jreback I have the same issue and was playing around with a solution.

The reset_index function in frame.py function has the following code

            # if we have the labels, extract the values with a mask
            if labels is not None:
                mask = labels == -1
                values = values.take(labels)

see here.

The mask variable is all True only if all the labels are -1. Assuming this means all the values were np.nan, one solution I was playing with is as follows

            # if we have the labels, extract the values with a mask
            if labels is not None:
                mask = labels == -1
                if mask.all():
                    values = (np.nan * mask).values()
                else:
                    values = values.take(labels)

Please advise. More than happy to create a pull request and create all appropriate test cases.

@jreback
Copy link
Contributor

jreback commented Feb 20, 2017

@tzinckgraf you can just do the take if len(values); .take() is picky :>

sure why don't you put up a PR with a test (you can use the above example).

@jreback jreback modified the milestones: 0.20.0, Next Major Release Feb 21, 2017
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this issue Mar 21, 2017
closes pandas-dev#6322

Author: tzinckgraf <[email protected]>

Closes pandas-dev#15466 from tzinckgraf/GH6322 and squashes the following commits:

35f97f4 [tzinckgraf] GH6322, Bug on reset_index for a MultiIndex of all NaNs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate MultiIndex
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants