Skip to content

DOC: fixed doc-string for combine & combine_first in pandas/core/series.py #22971

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Nov 21, 2018
Merged
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 61 additions & 25 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@
"""
from __future__ import division

# pylint: disable=E1101,E1103
# pylint: disable=W0703,W0622,W0613,W0201

import warnings
from textwrap import dedent
import warnings

Expand Down Expand Up @@ -2281,36 +2285,62 @@ def _binop(self, other, func, level=None, fill_value=None):

def combine(self, other, func, fill_value=None):
"""
Perform elementwise binary operation on two Series using given function
with optional fill value when an index is missing from one Series or
the other
Combine the Series with a Series or scalar according to `func`.

Combine the Series and `other` using `func` to perform elementwise
selection for combined Series.
`fill_value` is assumed when value is missing at some index
from one of the two objects being combined.

Parameters
----------
other : Series or scalar value
other : Series or scalar
The value(s) to be combined with the `Series`.
func : function
Function that takes two scalars as inputs and return a scalar
fill_value : scalar value
The default specifies to use the appropriate NaN value for
the underlying dtype of the Series
Function that takes two scalars as inputs and returns an element.
fill_value : scalar, optional
The value to assume when an index is missing from
one Series or the other. The default specifies to use the
appropriate NaN value for the underlying dtype of the Series.

Returns
-------
result : Series
Series
The result of combining the Series with the other object.

See Also
--------
Series.combine_first : Combine Series values, choosing the calling
Series' values first.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation is wrong, should be indented 4 spaces respect to the previous line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you fix this please?


Examples
--------
>>> s1 = pd.Series([1, 2])
>>> s2 = pd.Series([0, 3])
>>> s1.combine(s2, lambda x1, x2: x1 if x1 < x2 else x2)
0 0
1 2
>>> s2 = pd.Series([0, 3, 4])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep this first example as it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, @mroeschke

>>> s1.combine(s2, lambda x1, x2: x1 if x1 > x2 else x2)
0 1
1 3
2 4
dtype: int64

>>> arms = pd.Series({'dog':2,'cat': 2,'mouse': 2})
>>> legs = pd.Series({'dog':2,'cat': 2})
>>> arms
dog 2
cat 2
mouse 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say all those animals have 4 legs and 0 arms. Use correct data, and different values (spider, monkey...)

Copy link
Contributor Author

@tm9k1 tm9k1 Nov 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this example look fine?

>>> arms = pd.Series({'starfish': 5, 'kangaroo': 2})
>>> legs = pd.Series({'dog': 4,'kangaroo': 3})
>>> arms.combine(legs, lambda x, y: x+y, fill_value=0)
dog         4
kangaroo    5
starfish    5
dtype: int64
>>> arms.combine(legs, lambda x, y: x+y, fill_value=10)
dog         14
kangaroo     5
starfish    15
dtype: int64
>>>

Copy link
Contributor Author

@tm9k1 tm9k1 Nov 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find any alternative to defining a lambda, since the next alternative was to use + itself, which cannot be done as a function param afaik

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the example not following pep8 again, you may not be able to use a sum without a lambda, but as I said, you can use the builtin max function.

In the first example, remove completely the fill_value param, and in the second use a example that makes sense.

Also, add a short description before each case, explaining what is the goal of what you're showing. pandas does not contain functions or methods that are not useful for real-world examples. And in the documentation examples we should show those.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of combine() is to merge two Series into one.
So, I need 2 Series with some conflicting values[that use func to resolve conflict], and some NOT CONFLICTING ones[that could make use of fill_value to assume a value from the lacking Series]
I will use max for an example I just thought!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks much better, yes. I'd name the Series s1 and s2 following our standards, and also have the animal names in all lower case.

And I don't quite understand the fill_value part. I guess I'm misunderstanding something, but if:

>>> max(float('NaN'), 25)
nan

I would expect that in the first example duck has a value of nan. And in that case I'd use fill_value=0 to get the known value. I don't quite lack making up an average speed. But I guess the function doesn't work as I think.

Can you do a bit of research and move the examples to the PR, so I can add new comments in a review.

Thanks!

Copy link
Contributor Author

@tm9k1 tm9k1 Nov 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datapythonista look! something funny!

>>> max(30,float('nan'))
30
>>> max(float('nan'),30)
nan

Copy link
Member

@mroeschke mroeschke Nov 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datapythonista fill_value is used as the default value for the missing key in the corresponding series before the comparison is made.

So in the example above, since data_B doesn't have Duck, the comparison is made as if data_B['Duck'] = fill_value = 40 so max(data_B['Duck'], data_A['Duck']) = max(40, 30) = 40

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect that in the first example duck has a value of nan.

This answer on StackOverflow clears this doubt for me!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datapythonista please read this !

dtype: int64
>>> legs
dog 2
cat 2
dtype: int64
>>> limbs = arms.combine(legs, lambda x1,x2: x1+x2, fill_value=0)
>>> limbs
cat 4
dog 4
mouse 2
dtype: int64

See Also
--------
Series.combine_first : Combine Series values, choosing the calling
Series's values first.
"""
if fill_value is None:
fill_value = na_value_for_dtype(self.dtype, compat=False)
Expand Down Expand Up @@ -2352,16 +2382,26 @@ def combine(self, other, func, fill_value=None):

def combine_first(self, other):
"""
Combine Series values, choosing the calling Series's values
first. Result index will be the union of the two indexes
Combine Series values, choosing the calling Series's values first.

Parameters
----------
other : Series
The value(s) to be combined with the `Series`.

Returns
-------
combined : Series
Series
The result of combining the Series with the other object.

See Also
--------
Series.combine : Perform elementwise operation on two Series
using a given function

Notes
-----
Result index will be the union of the two indexes.

Examples
--------
Expand All @@ -2371,11 +2411,7 @@ def combine_first(self, other):
0 1.0
1 4.0
dtype: float64

See Also
--------
Series.combine : Perform elementwise operation on two Series
using a given function.

"""
new_index = self.index.union(other.index)
this = self.reindex(new_index, copy=False)
Expand Down