Skip to content

Pivot NaN Bug #3588

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
seignour opened this issue May 13, 2013 · 5 comments · Fixed by #3627
Closed

Pivot NaN Bug #3588

seignour opened this issue May 13, 2013 · 5 comments · Fixed by #3627
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@seignour
Copy link

The 'pivot' function does not properly handle NaNs. E.g.:

import numpy as np, pandas as pd
test = pd.DataFrame({"a":['R1', 'R2', 'R3', 'R4'], "b":["C1", "C3", np.nan , "C4"], "c":[10, 15, np.nan , 20]})

test.head(2).pivot('a', 'b', 'c')
b   C1  C3
a
R1  10 NaN
R2 NaN  15

(No NaNs, works properly)

But:

test.pivot('a', 'b', 'c')
b   C1  C3  C4
a
R1 NaN NaN NaN
R2 NaN  10  15
R3 NaN NaN NaN
R4 NaN NaN  20

(Incorrect results, does not crash)

Bug is present in versions 0.8.1 and 0.10.1

@john-granieri
Copy link

This looks like a general ‘multi-index with Nan’ problem, and as the pivot() method is re-indexing, it blows up.

At a minimum, it would be nice to have an exception thrown when sum(df.index.isnull()>0 to warn, then people can clean up the Nans in the columns before indexing (either directly or indirectly through things like pivot).

>>> est = pd.DataFrame({"a":['R1', 'R2', NaN, 'R4'], 'b':["C1", "C2", "C3" , "C4"], "c":[10, 15, NaN , 20]})
>>> est
     a   b   c
0   R1  C1  10
1   R2  C2  15
2  NaN  C3 NaN
3   R4  C4  20

>>> # this is the wrong result
>>> est.pivot('a','b','c')

b   C1  C2  C3  C4
a                 
R1 NaN NaN NaN NaN
R2 NaN  10 NaN NaN
R4 NaN NaN  15  20

the index as used then drops the Nan, and then the values are scrambled as the 'a' column lost the Nan index value

>>> est.set_index(['a','b'], drop=False)
         a   b   c
a  b              
R1 C1   R1  C1  10
R2 C2   R2  C2  15
R4 C3  NaN  C3 NaN
   C4   R4  C4  20

even though the index has the values:

>>> est.set_index(['a','b']).index, est.set_index(['a','b']).index.levels
(MultiIndex
[(R1, C1), (R2, C2), (nan, C3), (R4, C4)],
[Index([R1, R2, R4], dtype=object), Index([C1, C2, C3, C4], dtype=object)])

single column index with Nan seems to work OK

>>> est.set_index(['a'], drop=False)
       a   b   c
a               
R1    R1  C1  10
R2    R2  C2  15
nan  NaN  C3 NaN
R4    R4  C4  20

@jreback
Copy link
Contributor

jreback commented May 16, 2013

set_index seems fine in lastest master, what version are you on?

In [1]: df = DataFrame({"a":['R1', 'R2', np.nan, 'R4'], 'b':["C1", "C2", "C3" , "C4"], "c":[10, 15, np.nan , 20]})

In [2]: df.set_index(['a','b'], drop=False)
Out[2]: 
          a   b   c
a   b              
R1  C1   R1  C1  10
R2  C2   R2  C2  15
NaN C3  NaN  C3 NaN
R4  C4   R4  C4  20

@jreback
Copy link
Contributor

jreback commented May 16, 2013

a bit non-trivial, but fixed by #3588, @seignour @john-granieri want to test out?

@john-granieri
Copy link

Thanks for fixing. I had been on 0.10.1. Indeed, on 0.11 the indexing problem went away, but the pivot was still broken.

Thanks for fixing! We'll test it out in the latest build.

@jreback
Copy link
Contributor

jreback commented May 20, 2013

glad to hear...v0.11.1 should be out shortly...let us know of any issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants