Skip to content

Bug? Replacing NaN values based on a condition. #8669

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ozhogin opened this issue Oct 29, 2014 · 4 comments · Fixed by #8671
Closed

Bug? Replacing NaN values based on a condition. #8669

ozhogin opened this issue Oct 29, 2014 · 4 comments · Fixed by #8671
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@ozhogin
Copy link

ozhogin commented Oct 29, 2014

Dataframe with 2 columns: A and B. If values in B are larger than values in A - replace those values with values of A. I used to do this by doing df.B[df.B > df.A] = df.A, however recent upgrade of pandas started giving a SettingWithCopyWarning when encountering this chained assignment. Official documentation recommends using .loc.

Okay, I said, and did it through df.loc[df.B > df.A, 'B'] = df.A and it all works fine, unless column B has all values of NaN. Then something weird happens:

In [1]: df = pd.DataFrame({'A': [1, 2, 3],'B': [np.NaN, np.NaN, np.NaN]})

In [2]: df
Out[2]: 
   A   B
0  1 NaN
1  2 NaN
2  3 NaN

In [3]: df.loc[df.B > df.A, 'B'] = df.A

In [4]: df
Out[4]: 
   A                    B
0  1 -9223372036854775808
1  2 -9223372036854775808
2  3 -9223372036854775808

Now, if even one of B's elements satisfies the condition (larger than A), then it all works fine:

In [1]: df = pd.DataFrame({'A': [1, 2, 3],'B': [np.NaN, 4, np.NaN]})

In [2]: df
Out[2]: 
   A   B
0  1 NaN
1  2   4
2  3 NaN

In [3]: df.loc[df.B > df.A, 'B'] = df.A

In [4]: df
Out[4]: 
   A   B
0  1 NaN
1  2   2
2  3 NaN

But if none of B's elements satisfy, then all NaNs get replaces with -9223372036854775808:

In [1]: df = pd.DataFrame({'A':[1,2,3],'B':[np.NaN,1,np.NaN]})

In [2]: df
Out[2]: 
   A   B
0  1 NaN
1  2   1
2  3 NaN

In [3]: df.loc[df.B > df.A, 'B'] = df.A

In [4]: df
Out[4]: 
   A                    B
0  1 -9223372036854775808
1  2                    1
2  3 -9223372036854775808

Am I doing something wrong, or this is a bug?

pandas: 0.15.0

@onesandzeroes
Copy link
Contributor

I think there's some weird type conversion going on, I can't figure out why B converts to int64 here:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': [1, 2, 3],'B': [np.NaN, np.NaN, np.NaN]})
df
Out[6]: 
   A   B
0  1 NaN
1  2 NaN
2  3 NaN

df.dtypes
Out[10]: 
A      int64
B    float64
dtype: object

# This loc shouldn't match any rows
df.loc[df.B > df.A, 'B'] = df.A
df
Out[8]: 
   A                    B
0  1 -9223372036854775808
1  2 -9223372036854775808
2  3 -9223372036854775808

# Why has this become int? Is this expected
# behaviour for this assignment?
df.dtypes
Out[12]: 
A    int64
B    int64
dtype: object

@jreback
Copy link
Contributor

jreback commented Oct 29, 2014

so the reason this shows up is that

(Pdb) p np.array([np.nan]).astype(np.int64)
array([-9223372036854775808])

is happening, eg. coercing a float64 to a int64. It shouldn't be coercing because the indexer is empty (e.g. df.B > df.A is all False. So a buggy.

@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves Dtype Conversions Unexpected or buggy dtype conversions labels Oct 29, 2014
@jreback jreback added this to the 0.15.1 milestone Oct 29, 2014
@jreback
Copy link
Contributor

jreback commented Oct 29, 2014

fixed in #8671

don't ask me to explain that their are prob 10+ cases of what to do with a value on the rhs of an assignment when you need to coerce dtypes / infer - I dont fully understand some of the cases. good thing we have a comprehensive test suite.

Not really sure it CAN be any simpler, as pandas allows like anything to be set :)

@ozhogin
Copy link
Author

ozhogin commented Oct 29, 2014

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants