Skip to content

Dropping rows inplace, weird bug with 0.13.0rc1 #5628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
michaelaye opened this issue Dec 2, 2013 · 17 comments · Fixed by #5631
Closed

Dropping rows inplace, weird bug with 0.13.0rc1 #5628

michaelaye opened this issue Dec 2, 2013 · 17 comments · Fixed by #5631
Labels
Milestone

Comments

@michaelaye
Copy link
Contributor

In c239b06 I have the following weird behavior. I am even unsure how to describe it, but somehow the dropping of rows selected via a boolean leads to a failure of dropping the rows only for the Series of a DataFrame that met the condition in the first place.
Please have a look at the MWE I figured out after some hacking and trying out:

http://nbviewer.ipython.org/gist/michaelaye/7744592

Please tell me I'm not being dumb again?

@jtratner
Copy link
Contributor

jtratner commented Dec 2, 2013

Definitely a bug (I think it's a caching issue)

@michaelaye
Copy link
Contributor Author

I'm going backwards to see where it came in, already at 0.12.0-1140-ge82c1f7 and it's still there.

@jtratner
Copy link
Contributor

jtratner commented Dec 2, 2013

Unfortunately, you're going to have to rebuild at various points, but if you don't already have this, simpler to do this to check:

import pandas as pd
df = pd.DataFrame({"A": [1, 2, 3, 4], "B": [5, 6, 7, 8]})
df.drop(labels=df[df.B < 7].index, inplace=True)
assert len(df['B']) == len(df.index)

and then you can run that with git bisect.

@michaelaye
Copy link
Contributor Author

Cool, didn't know about git bisect. Will do.

@jtratner
Copy link
Contributor

jtratner commented Dec 2, 2013

you know, not really worth it for you to run. inplace was only added for 0.13. I'm pretty sure there's some quirk with inplace that is being missed here.

@michaelaye
Copy link
Contributor Author

Well, I need to find where it's working, but don't want to go back to v0.12.0 because there were some improvements regarding HDF files that I would be missing.

@jtratner
Copy link
Contributor

jtratner commented Dec 2, 2013

@michaelaye the issue is doing it inplace. It's really just a convenience (i.e. still copies under the hood). If you do it not inplace this should work fine.

So, for example:

df.drop(labels=df[df.B < 0], inplace=True)

is equivalent to:

df = df.drop(labels=df[df.B < 0])

@michaelaye
Copy link
Contributor Author

ah, that's true. Thanks.

@jreback
Copy link
Contributor

jreback commented Dec 2, 2013

isn't df[~(df.B<0)] what u r looking for anyhow?

@jtratner
Copy link
Contributor

jtratner commented Dec 2, 2013

@jreback just to be clear, here's the actual error:

In [13]: df.drop(labels=df[df.B < 7].index, inplace=True)

In [14]: df
Out[14]:
   A  B
2  3  7
3  4  8

[2 rows x 2 columns]

In [15]: df['B']
Out[15]:
0    5
1    6
2    7
3    8
Name: B, dtype: int64

It's just something wrong with _update_inplace

@jreback
Copy link
Contributor

jreback commented Dec 2, 2013

not clearing the cache
I'll fix tom

@michaelaye
Copy link
Contributor Author

It was an interesting learning experience to use bisect to confirm that the error simply exists since 4e0d29a when Jeff added the inplace option to drop and dropna.

@jreback
Copy link
Contributor

jreback commented Dec 2, 2013

@michaelaye good catch...was a very subtle error (though the internal method _update_inplace is relatively new and so not used very much yet, so hard to repro this)

as an FYI. inplace doesn't really save you much (if at all), as unless things are single-dtyped inplace, is just a behind the scenes reassignment. In addition you cannot chain inplace assignments. and IMHO are much less clear to what is going on (as the vast majority of pandas returns a new object).

@michaelaye
Copy link
Contributor Author

Could it be that your merge into 6b2c5fd did not update the version id? I still get 0.13.0rc1 from 6b2c5fd?

Regarding your FYI:
Isn't inplace=True the only way I can do something convenient as this:

for df in [df1, df2,df3]:
    df.drop('something', inplace=True)

because when I wouldn't do inplace=True it wouldn't work like this?

@jreback
Copy link
Contributor

jreback commented Dec 5, 2013

are you sure you updated to latest master?

that looks like a reasonable use of drop with inplace

@michaelaye
Copy link
Contributor Author

No, I didn't b/c I wanted to stay as close to a RC as possible, but now I saw that all masters are tagged as RCs anyway.
Would be nice if you guys could do RC branches, than we "external's" could easily help debugging those. Are you no fans of the git-flow paradigm?
So, no prob with the version id anymore. Glad you agree on my example, thanks.

@jtratner
Copy link
Contributor

jtratner commented Dec 6, 2013

RLS needs to be set to False.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants