Bug? Replacing NaN values based on a condition. #8669

ozhogin · 2014-10-29T00:28:27Z

Dataframe with 2 columns: A and B. If values in B are larger than values in A - replace those values with values of A. I used to do this by doing df.B[df.B > df.A] = df.A, however recent upgrade of pandas started giving a SettingWithCopyWarning when encountering this chained assignment. Official documentation recommends using .loc.

Okay, I said, and did it through df.loc[df.B > df.A, 'B'] = df.A and it all works fine, unless column B has all values of NaN. Then something weird happens:

In [1]: df = pd.DataFrame({'A': [1, 2, 3],'B': [np.NaN, np.NaN, np.NaN]})

In [2]: df
Out[2]: 
   A   B
0  1 NaN
1  2 NaN
2  3 NaN

In [3]: df.loc[df.B > df.A, 'B'] = df.A

In [4]: df
Out[4]: 
   A                    B
0  1 -9223372036854775808
1  2 -9223372036854775808
2  3 -9223372036854775808

Now, if even one of B's elements satisfies the condition (larger than A), then it all works fine:

In [1]: df = pd.DataFrame({'A': [1, 2, 3],'B': [np.NaN, 4, np.NaN]})

In [2]: df
Out[2]: 
   A   B
0  1 NaN
1  2   4
2  3 NaN

In [3]: df.loc[df.B > df.A, 'B'] = df.A

In [4]: df
Out[4]: 
   A   B
0  1 NaN
1  2   2
2  3 NaN

But if none of B's elements satisfy, then all NaNs get replaces with -9223372036854775808:

In [1]: df = pd.DataFrame({'A':[1,2,3],'B':[np.NaN,1,np.NaN]})

In [2]: df
Out[2]: 
   A   B
0  1 NaN
1  2   1
2  3 NaN

In [3]: df.loc[df.B > df.A, 'B'] = df.A

In [4]: df
Out[4]: 
   A                    B
0  1 -9223372036854775808
1  2                    1
2  3 -9223372036854775808

Am I doing something wrong, or this is a bug?

pandas: 0.15.0

onesandzeroes · 2014-10-29T00:37:15Z

I think there's some weird type conversion going on, I can't figure out why B converts to int64 here:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': [1, 2, 3],'B': [np.NaN, np.NaN, np.NaN]})
df
Out[6]: 
   A   B
0  1 NaN
1  2 NaN
2  3 NaN

df.dtypes
Out[10]: 
A      int64
B    float64
dtype: object

# This loc shouldn't match any rows
df.loc[df.B > df.A, 'B'] = df.A
df
Out[8]: 
   A                    B
0  1 -9223372036854775808
1  2 -9223372036854775808
2  3 -9223372036854775808

# Why has this become int? Is this expected
# behaviour for this assignment?
df.dtypes
Out[12]: 
A    int64
B    int64
dtype: object

jreback · 2014-10-29T00:42:08Z

so the reason this shows up is that

(Pdb) p np.array([np.nan]).astype(np.int64)
array([-9223372036854775808])

is happening, eg. coercing a float64 to a int64. It shouldn't be coercing because the indexer is empty (e.g. df.B > df.A is all False. So a buggy.

jreback · 2014-10-29T01:12:36Z

fixed in #8671

don't ask me to explain that their are prob 10+ cases of what to do with a value on the rhs of an assignment when you need to coerce dtypes / infer - I dont fully understand some of the cases. good thing we have a comprehensive test suite.

Not really sure it CAN be any simpler, as pandas allows like anything to be set :)

ozhogin · 2014-10-29T11:35:32Z

Thank you!

onesandzeroes mentioned this issue Oct 29, 2014

BUG: Strange type conversions when assigning with df.loc #8670

Closed

jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves Dtype Conversions Unexpected or buggy dtype conversions labels Oct 29, 2014

jreback added this to the 0.15.1 milestone Oct 29, 2014

jreback mentioned this issue Oct 29, 2014

BUG: Bug in setitem with empty indexer and unwanted coercion of dtypes (GH8669) #8671

Merged

jreback closed this as completed in #8671 Oct 29, 2014

jreback mentioned this issue Oct 29, 2014

TST: fix up for 32-bit indexers w.r.t. (GH8669) #8675

Merged

jreback modified the milestones: 0.15.2, 0.15.1 Oct 30, 2014

jreback mentioned this issue Apr 9, 2018

API: categorical grouping will no longer return the cartesian product #20583

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug? Replacing NaN values based on a condition. #8669

Bug? Replacing NaN values based on a condition. #8669

ozhogin commented Oct 29, 2014

onesandzeroes commented Oct 29, 2014

jreback commented Oct 29, 2014

jreback commented Oct 29, 2014

ozhogin commented Oct 29, 2014

Bug? Replacing NaN values based on a condition. #8669

Bug? Replacing NaN values based on a condition. #8669

Comments

ozhogin commented Oct 29, 2014

onesandzeroes commented Oct 29, 2014

jreback commented Oct 29, 2014

jreback commented Oct 29, 2014

ozhogin commented Oct 29, 2014