Skip to content

ENH: implement SeriesGroupBy.corrwith #32293

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
fujiaxiang opened this issue Feb 27, 2020 · 3 comments
Open

ENH: implement SeriesGroupBy.corrwith #32293

fujiaxiang opened this issue Feb 27, 2020 · 3 comments

Comments

@fujiaxiang
Copy link
Member

Sample data

>>> import pandas as pd

>>> pd.__version__  # on current master
'1.1.0.dev0+613.g97c0ce962'

>>> df = pd.DataFrame({'key': list('aaabbb'), 'value': [1, 2, 3, 3, 2, 1]})
>>> df
  key  value
0   a      1
1   a      2
2   a      3
3   b      3
4   b      2
5   b      1

Issue

>>> df.groupby('key')
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001C0D32FC400>

# DataFrameGroupBy.corrwith has no issues
>>> df.groupby('key').corrwith(pd.Series([1,2,3,1,2,3]))
     value
key
a      1.0
b     -1.0

>>> df.groupby('key')['value']
<pandas.core.groupby.generic.SeriesGroupBy object at 0x000001C0D279BF28>

# SeriesGroupBy.corrwith is not implemented
>>> df.groupby('key')['value'].corrwith(pd.Series([1, 2, 3, 1, 2, 3]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\users\xxx\xxx\pandas\pandas\core\groupby\groupby.py", line 580, in __getattr__
    f"'{type(self).__name__}' object has no attribute '{attr}'"
AttributeError: 'SeriesGroupBy' object has no attribute 'corrwith'

Problem description

As shown above, DataFrameGroupBy.corrwith has no issues and works as expected. However, the corresponding SeriesGroupBy.corrwith is not implemented and reports error.

Expected Output

>>> df.groupby('key')['value'].corrwith(pd.Series([1, 2, 3, 1, 2, 3]))
     value
key
a      1.0
b     -1.0
@fujiaxiang
Copy link
Member Author

Turns out Series.corr behaves similarly to DataFrame.corrwith, and because SeriesGroupBy.corr, and DataFrameGroupBy.corrwith reuses base counterparts, they behave similarly too.

So this behavior can be achieved by:

>>> import pandas as pd
>>> pd.__version__
'1.1.0.dev0+1712.g1cad9e52e'
>>> df = pd.DataFrame({'key': list('aaabbb'), 'value': [1, 2, 3, 3, 2, 1]})
>>> df.groupby('key')['value'].corr(pd.Series([1, 2, 3, 1, 2, 3]))
key
a    1.0
b   -1.0
Name: value, dtype: float64

A few thoughts:

  1. Why does Series.corr behaves like DataFrame.corrwith, while DataFrame.corr behaves differently?
    The first two compute correlation with another object - parameter other, whereas the latter compute correlation matrix by itself.
  2. I tried implementing Series.corrwith, but it will behave exactly the same as Series.corr if other is also a Series. This causes confusion so I don't feel it's a good solution.
  3. DataFrame.corrwith can accept both DataFrame and Series as other, but Series.corr can only accept Series and returns a single number.
  4. I feel the "best" solution is to rename Series.corr into Series.corrwith, enhance it to accept DataFrame, and deprecate Series.corr or make it an alias for Series.corrwith, which helps maintain backward compatibility. Finally we also want to add corrwith in common_apply_whitelist in pandas/core/groupby/base.py so that SeriesGroupBy.corrwith is automatically usable.

@simonjayhawkins what do you think?

@simonjayhawkins
Copy link
Member

4. I feel the "best" solution is to rename Series.corr into Series.corrwith, enhance it to accept DataFrame, and deprecate Series.corr or make it an alias for Series.corrwith, which helps maintain backward compatibility. Finally we also want to add corrwith in common_apply_whitelist in pandas/core/groupby/base.py so that SeriesGroupBy.corrwith is automatically usable.

@simonjayhawkins what do you think?

see also #11260 cc @pandas-dev/pandas-core

@rhshadrach
Copy link
Member

DataFrameGroupBy.agg(["corrwith"]) also fails because it attempts to break up the operation into SeriesGroupBy and call corrwith there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants