Skip to content

DOC: fill_value argument in Series.combine() #22953

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kaybhutani opened this issue Oct 2, 2018 · 12 comments · Fixed by #22971
Closed

DOC: fill_value argument in Series.combine() #22953

kaybhutani opened this issue Oct 2, 2018 · 12 comments · Fixed by #22971

Comments

@kaybhutani
Copy link

kaybhutani commented Oct 2, 2018

Series.combine() method is not handling NaN values correctly.
On running this code, NaN is still returned and isnt replaced by 5.

#importing pandas module
import pandas as pd

#importing numpy module
import numpy as np

#creating first series
first=[1,2,np.nan,5,6,3,np.nan,7,11,0,4,8]

#creating second series
second=[5,3,2,np.nan,1,3,9,21,3,np.nan,1,np.nan]

#making series
first=pd.Series(first)

#making seriesa
second=pd.Series(second)

#calling .combine() method
result=first.combine(second, (lambda x1, x2: x1 if x1 > x2 else x2) , fill_value=5)

#display
result

The NULL values in Caller series are replaced but the Null values in the passed series remain there. Unlike other similar methods like .add(), .sub() etc. this one isn't handling Null values of passed series correctly.

If this issue is approved, i would like to contribute and fix it

@tm9k1
Copy link
Contributor

tm9k1 commented Oct 2, 2018

I would like to work this bug !

@tm9k1
Copy link
Contributor

tm9k1 commented Oct 2, 2018

>>> result
0 5.0
1 3.0
2 2.0
3 NaN
4 6.0
5 3.0
6 9.0
7 21.0
8 11.0
9 NaN
10 4.0
11 NaN
dtype: float64
>>> type(first[0])
<class 'numpy.float64'>
>>> first[0]
1.0
>>> first[0] > np.nan
False

@tm9k1
Copy link
Contributor

tm9k1 commented Oct 2, 2018

honestly, @kbhutan0001 , I don't know if we should even compare a Not A Number and a float
See this answer from StackOverflow

@kaybhutani
Copy link
Author

@brute4s99 We are not comparing, we are combining them. This bug is because the function is probably executed before na replacement and hence null is compared with numeric values

@tm9k1
Copy link
Contributor

tm9k1 commented Oct 2, 2018

okay, @kbhutani0001 . I'll look more into this(.) tomorrow !

@mroeschke
Copy link
Member

mroeschke commented Oct 2, 2018

This is the correct behavior as the document states:

optional fill value when an index is missing from one Series or the other

so fill_value represents the placeholder of the value corresponding to the index that doesn't exist not the fill_value of NaN values.

In [13]: second.loc[12] = 1

In [14]: result=first.combine(second, (lambda x1, x2: x1 if x1 > x2 else
    ...: x2) , fill_value=5)

In [15]: result
Out[15]:
0      5.0
1      3.0
2      2.0
3      NaN
4      6.0
5      3.0
6      9.0
7     21.0
8     11.0
9      NaN
10     4.0
11     NaN
12     5.0 # 5 > 1 so we get 5 here
dtype: float64

Would welcome an example in the docstring showing this behavior.

@mroeschke mroeschke changed the title Series.combine() not handling NaN values correctly DOC: fill_value argument in Series.combine() Oct 2, 2018
@kaybhutani
Copy link
Author

kaybhutani commented Oct 2, 2018 via email

@mroeschke
Copy link
Member

The names of the arguments are slightly different fill_na vs fill_value (though fill_value isn't a great name for what it does).

Plus all comparisons with np.nan are False, so the given function lambda x1, x2: x1 if x1 > x2 else x2, np.nan comparisons will always fill with the 2nd value.

@tm9k1
Copy link
Contributor

tm9k1 commented Oct 2, 2018

Would welcome an example in the docstring showing this behavior.

on it !

@tm9k1
Copy link
Contributor

tm9k1 commented Oct 2, 2018

The names of the arguments are slightly different fill_na vs fill_value (though fill_value isn't a great name for what it does).

how about fallback_value , @mroeschke ?

@mroeschke
Copy link
Member

That'd be a better name, but I don't think changing this keyword is worth the API churn.

If it's properly documented with an example, the meaning would be entirely clear.

@tm9k1
Copy link
Contributor

tm9k1 commented Oct 3, 2018

If it's properly documented with an example, the meaning would be entirely clear.

I'll push a PR shortly, @mroeschke , reviewing for typos.

vvyomjjain added a commit to vvyomjjain/pandas that referenced this issue Oct 5, 2018
@vvyomjjain vvyomjjain mentioned this issue Oct 5, 2018
4 tasks
@jreback jreback added this to the 0.24.0 milestone Oct 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment