Skip to content

Trouble with NaNs: set_index().reset_index() corrupts data #3586

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cpbl opened this issue May 13, 2013 · 8 comments · Fixed by #3587
Closed

Trouble with NaNs: set_index().reset_index() corrupts data #3586

cpbl opened this issue May 13, 2013 · 8 comments · Fixed by #3587
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@cpbl
Copy link

cpbl commented May 13, 2013

In the following code, my data are corrupted by the sequence set_index().reset_index(). The value of "QC" is actually changed to 1 from NaN where it should be NaN.

Btw, for symmetry I added the ".reset_index()", but the data corruption is introduced by set_index.

This problem exists in both version '0.10.1' and '0.11.0'

    bug=pd.DataFrame({'PRuid': {17: 'nonQC', 18: 'nonQC', 19: 'nonQC', 20: '10', 21: '11', 22: '12', 23: '13', 24: '24', 25: '35', 26: '46', 27: '47', 28: '48', 29: '59', 30: '10'}, 'QC': {17: 0.0, 18: 0.0, 19: 0.0, 20: nan, 21: nan, 22: nan, 23: nan, 24: 1.0, 25: nan, 26: nan, 27: nan, 28: nan, 29: nan, 30: nan}, 'data': {17: 7.9544899999999998, 18: 8.0142609999999994, 19: 7.8591520000000008, 20: 0.86140349999999999, 21: 0.87853110000000001, 22: 0.8427041999999999, 23: 0.78587700000000005, 24: 0.73062459999999996, 25: 0.81668560000000001, 26: 0.81927080000000008, 27: 0.80705009999999999, 28: 0.81440240000000008, 29: 0.80140849999999997, 30: 0.81307740000000006}, 'year': {17: 2006, 18: 2007, 19: 2008, 20: 1985, 21: 1985, 22: 1985, 23: 1985, 24: 1985, 25: 1985, 26: 1985, 27: 1985, 28: 1985, 29: 1985, 30: 1986}})
    print bug
    #print bug.set_index(['year','PRuid','QC'])
    print bug.set_index(['year','PRuid','QC']).reset_index()
@cpcloud
Copy link
Member

cpcloud commented May 13, 2013

@cpbl What version are you using? I cannot reproduce this with git master.

@jreback
Copy link
Contributor

jreback commented May 13, 2013

@cpcloud this is a bug, not sure why wasn't tested before....fixed in #3587

@cpbl if you would give this PR a try and let us know, would be appreciated..thanks

@cpbl
Copy link
Author

cpbl commented May 13, 2013

@cpcloud I tried 0.10.1 and 0.11.0; see original report.
@jreback: I plead naive user: can you give me a link to explain meaning of/instructions for "give this PR a try"? (I'm willing to try, if it won't permanently mess up my stableish ubuntu installation.)

@jreback
Copy link
Contributor

jreback commented May 13, 2013

@cpbl

do this (make sure you have git installed, if not apt-get install git)

git clone -b set_index_nan https://github.com/jreback/pandas.git pandas_test
cd pandas_test
python setup.py build_ext --inplace
python your_script.py

@cpcloud
Copy link
Member

cpcloud commented May 13, 2013

@cpbl ah, yes. sorry. was skimming too fast and went straight to code.

@cpbl
Copy link
Author

cpbl commented May 13, 2013

Thanks for the instructions (worked after install cython). Your git version fixes my reported problem.

Inevitable(?) question from naive bug reporter: How long until my Ubuntu 13.04 and my RHEL 6.3 machines' auto updates have this resolved?
Thanks!

@jreback
Copy link
Contributor

jreback commented May 13, 2013

@cpbl

ok great....0.11.1 (which will have this fix), will be out next week or 2

I know debian picks the newest version (in unstable usually) up pretty quickly, I think ubuntu is similar

you can always install yourself in any event, download the tarball and install as root

@jreback
Copy link
Contributor

jreback commented May 13, 2013

closed by #3587

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants