Skip to content

df.reset_index introduces wrong elements with NaN index values #3727

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
floux opened this issue May 31, 2013 · 2 comments
Closed

df.reset_index introduces wrong elements with NaN index values #3727

floux opened this issue May 31, 2013 · 2 comments
Labels
Duplicate Report Duplicate issue or pull request

Comments

@floux
Copy link

floux commented May 31, 2013

When there are NaN values in the index, then reset_index introduces incorrect values. That is the case even if the reset_index operation occurs on a different index than the one containing the NaN values:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({
   ...:         'col1' : [1,2,3,4,5,6,7,8],
   ...:         'col2' : [8,7,6,5,4,3,2,1],
   ...: })

In [4]: arrays = [
   ...:         ['a','a','a','a','b','b','b','b'],
   ...:         ['c',None,'d',None,'e','e',None,'f']
   ...:         ]

In [6]: idx = pd.MultiIndex.from_tuples(zip(*arrays),
   ...:         names=['first', 'second'])

In [7]: df.index = idx

In [8]: df
Out[8]:
              col1  col2
first second
a     c          1     8
      NaN        2     7
      d          3     6
      NaN        4     5
b     e          5     4
      e          6     3
      NaN        7     2
      f          8     1

In [9]: df.reset_index()
Out[9]:
  first second  col1  col2
0     a      c     1     8
1     a      f     2     7
2     a      d     3     6
3     a      f     4     5
4     b      e     5     4
5     b      e     6     3
6     b      f     7     2
7     b      f     8     1

In [10]: df.reset_index('second')
Out[10]:
      second  col1  col2
first
a          c     1     8
a          f     2     7
a          d     3     6
a          f     4     5
b          e     5     4
b          e     6     3
b          f     7     2
b          f     8     1

In [11]: df.reset_index('first')
Out[11]:
       first  col1  col2
second
c          a     1     8
f          a     2     7
d          a     3     6
f          a     4     5
e          b     5     4
e          b     6     3
f          b     7     2
f          b     8     1
@jreback
Copy link
Contributor

jreback commented May 31, 2013

This is a dup of #3586, fixed in #3587, currently in master

In [25]: df
Out[25]: 
              col1  col2
first second            
a     c          1     8
      NaN        2     7
      d          3     6
      NaN        4     5
b     e          5     4
      e          6     3
      NaN        7     2
      f          8     1

In [26]: df.reset_index().set_index('first')
Out[26]: 
      second  col1  col2
first                   
a          c     1     8
a        NaN     2     7
a          d     3     6
a        NaN     4     5
b          e     5     4
b          e     6     3
b        NaN     7     2
b          f     8     1

In [27]: df.reset_index().set_index('second')
Out[27]: 
       first  col1  col2
second                  
c          a     1     8
NaN        a     2     7
d          a     3     6
NaN        a     4     5
e          b     5     4
e          b     6     3
NaN        b     7     2
f          b     8     1

@floux
Copy link
Author

floux commented Jun 1, 2013

Okay, thanks.

@floux floux closed this as completed Jun 1, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

2 participants