-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Bug? Replacing NaN values based on a condition. #8669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think there's some weird type conversion going on, I can't figure out why import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, 2, 3],'B': [np.NaN, np.NaN, np.NaN]})
df
Out[6]:
A B
0 1 NaN
1 2 NaN
2 3 NaN
df.dtypes
Out[10]:
A int64
B float64
dtype: object
# This loc shouldn't match any rows
df.loc[df.B > df.A, 'B'] = df.A
df
Out[8]:
A B
0 1 -9223372036854775808
1 2 -9223372036854775808
2 3 -9223372036854775808
# Why has this become int? Is this expected
# behaviour for this assignment?
df.dtypes
Out[12]:
A int64
B int64
dtype: object |
so the reason this shows up is that
is happening, eg. coercing a float64 to a int64. It shouldn't be coercing because the indexer is empty (e.g. |
fixed in #8671 don't ask me to explain that their are prob 10+ cases of what to do with a value on the rhs of an assignment when you need to coerce dtypes / infer - I dont fully understand some of the cases. good thing we have a comprehensive test suite. Not really sure it CAN be any simpler, as pandas allows like anything to be set :) |
Thank you! |
Dataframe with 2 columns: A and B. If values in B are larger than values in A - replace those values with values of A. I used to do this by doing df.B[df.B > df.A] = df.A, however recent upgrade of pandas started giving a SettingWithCopyWarning when encountering this chained assignment. Official documentation recommends using .loc.
Okay, I said, and did it through df.loc[df.B > df.A, 'B'] = df.A and it all works fine, unless column B has all values of NaN. Then something weird happens:
Now, if even one of B's elements satisfies the condition (larger than A), then it all works fine:
But if none of B's elements satisfy, then all NaNs get replaces with -9223372036854775808:
Am I doing something wrong, or this is a bug?
pandas: 0.15.0
The text was updated successfully, but these errors were encountered: