-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: Clarifiy fill_value behavior in arithmetic ops #19653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
pls read the documentation. This is doing an align then a fillna operation.
|
Hi, I wouldn't post a GitHub issue without reading the documentation first :) The documentation can be understood both ways, and actually I think that my "explanation" is more intuitive and straight-forward. It currently simply states that:
It's very unclear that it refers to two types of NaN values - (a) the ones that already exist in the data, (b) and the ones that are generated after the alignment step and before the addition step - but does not include the NaNs that are visible after a simple addition of the dataframes, which are the most relevant to the end user. At the least I believe that the documentation should be edited, won't you think? |
you are welcome to PR a doc update if u think it is not clear |
Yep, I can see the point of confusion. @HagaiHargil are you comfortable making a PR clarifying things? Lines 248 to 259 in a277108
and Lines 273 to 284 in a277108
|
I'd be happy to, I'm just having a hard time phrasing it. Here's a draft:
I'd also like to add an example, similar to the one in my initial post. Should it be added in both locations as well? |
That part is incorrect. Existing missing values will still be NA after the operation. Only newly-created missing values (created in the background by the
In [19]: a = pd.Series([1, 1, np.nan, np.nan], index=['a', 'b', 'c', 'd'])
In [20]: b = pd.Series([1, 1, np.nan, np.nan], index=['b', 'c', 'd', 'e'])
In [21]: a
Out[21]:
a 1.0
b 1.0
c NaN
d NaN
dtype: float64
In [22]: b
Out[22]:
b 1.0
c 1.0
d NaN
e NaN
dtype: float64
In [23]: a.add(b, fill_value=0)
Out[23]:
a 1.0
b 2.0
c 1.0
d NaN
e NaN
dtype: float64
Yep, both places (one using Series, one using DataFrame). Feel free to just put an example for It'd be good to include an example that has existing missing values and newly created missing values. |
Umm, I might be misunderstanding you, but I think that it does fill existing NA values: a = pd.DataFrame(np.ones((3, 2)))
b = pd.DataFrame(np.ones((4, 3)))
a.iloc[0, 0] = np.nan
a.add(b, fill_value=10)
# 0 1 2
# 0 11.0 2.0 11.0
# 1 2.0 2.0 11.0
# 2 2.0 2.0 11.0
# 3 11.0 11.0 11.0 I consider the NA value in |
I was apparently confused. It seems to fill when there are not NAs on both In [90]: a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
In [91]: b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'c_', 'd'])
In [92]: a
Out[92]:
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
In [93]: b
Out[93]:
a 1.0
b NaN
c_ 1.0
d NaN
dtype: float64
In [94]: a.add(b, fill_value=0)
Out[94]:
a 2.0
b 1.0
c 1.0
c_ 1.0
d NaN
dtype: float64
That's a bit strange to me, but not worth changing at this point I think. |
I think the document is clear. |
An example would still be helpful.
…On Mon, Feb 12, 2018 at 5:45 PM, ZhuBaohe ***@***.***> wrote:
DataFrame.add(other, axis=’columns’, level=None, fill_value=None)
Addition of dataframe and other, element-wise (binary operator add).
Equivalent to dataframe + other, but with support to substitute a
fill_value for missing data
*in one of the inputs*.
I think the document is clear.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#19653 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIpGn7WYL-wta0z6HlLIAGbnp60Z5ks5tUMz9gaJpZM4SB21_>
.
|
I posted #19675. @TomAugspurger Notice that after rebuilding the docs I found out that without creating the last two commits in that PR, referring to line 338 and below, the pd.DataFrame docs didn't update, only the pd.Series ones. |
When adding two DataFrames using
df1.add(df2)
one can use thefill_value
parameter to fill in any NaNs that might come up. This parameter seems pretty broken:As you see, it filled the NaNs with
1.0
. Changingfill_value=1
will fill everything with2.0
. However, changing some of the values insideb
leads to more peculiar results, and I couldn't really connect the dots and find some pattern.This was observed on Python 3.6 on both Linux and Windows.
Thanks.
The text was updated successfully, but these errors were encountered: