-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Dropping rows inplace, weird bug with 0.13.0rc1 #5628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Definitely a bug (I think it's a caching issue) |
I'm going backwards to see where it came in, already at 0.12.0-1140-ge82c1f7 and it's still there. |
Unfortunately, you're going to have to rebuild at various points, but if you don't already have this, simpler to do this to check: import pandas as pd
df = pd.DataFrame({"A": [1, 2, 3, 4], "B": [5, 6, 7, 8]})
df.drop(labels=df[df.B < 7].index, inplace=True)
assert len(df['B']) == len(df.index) and then you can run that with git bisect. |
Cool, didn't know about git bisect. Will do. |
you know, not really worth it for you to run. |
Well, I need to find where it's working, but don't want to go back to v0.12.0 because there were some improvements regarding HDF files that I would be missing. |
@michaelaye the issue is doing it inplace. It's really just a convenience (i.e. still copies under the hood). If you do it not inplace this should work fine. So, for example: df.drop(labels=df[df.B < 0], inplace=True) is equivalent to: df = df.drop(labels=df[df.B < 0]) |
ah, that's true. Thanks. |
isn't df[~(df.B<0)] what u r looking for anyhow? |
@jreback just to be clear, here's the actual error: In [13]: df.drop(labels=df[df.B < 7].index, inplace=True)
In [14]: df
Out[14]:
A B
2 3 7
3 4 8
[2 rows x 2 columns]
In [15]: df['B']
Out[15]:
0 5
1 6
2 7
3 8
Name: B, dtype: int64 It's just something wrong with |
not clearing the cache |
It was an interesting learning experience to use bisect to confirm that the error simply exists since 4e0d29a when Jeff added the inplace option to drop and dropna. |
@michaelaye good catch...was a very subtle error (though the internal method as an FYI. inplace doesn't really save you much (if at all), as unless things are single-dtyped inplace, is just a behind the scenes reassignment. In addition you cannot chain inplace assignments. and IMHO are much less clear to what is going on (as the vast majority of pandas returns a new object). |
Could it be that your merge into 6b2c5fd did not update the version id? I still get 0.13.0rc1 from 6b2c5fd? Regarding your FYI: for df in [df1, df2,df3]:
df.drop('something', inplace=True) because when I wouldn't do inplace=True it wouldn't work like this? |
are you sure you updated to latest master? that looks like a reasonable use of |
No, I didn't b/c I wanted to stay as close to a RC as possible, but now I saw that all masters are tagged as RCs anyway. |
RLS needs to be set to False. |
In c239b06 I have the following weird behavior. I am even unsure how to describe it, but somehow the dropping of rows selected via a boolean leads to a failure of dropping the rows only for the Series of a DataFrame that met the condition in the first place.
Please have a look at the MWE I figured out after some hacking and trying out:
http://nbviewer.ipython.org/gist/michaelaye/7744592
Please tell me I'm not being dumb again?
The text was updated successfully, but these errors were encountered: