Skip to content

CLN: Integrate .corrwith and .corr #11260

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
max-sixty opened this issue Oct 7, 2015 · 3 comments
Open

CLN: Integrate .corrwith and .corr #11260

max-sixty opened this issue Oct 7, 2015 · 3 comments
Labels
API - Consistency Internal Consistency of API/Behavior cov/corr Enhancement

Comments

@max-sixty
Copy link
Contributor

Currently:

  • corr on a DataFrame requires another DataFrame, and fails on a Series
  • corrwith on a DataFrame takes a Series

Is there a good reason these are separate? Should corr do whatever corrwith does when passed a Series, and corrwith could be deprecated?

@jorisvandenbossche
Copy link
Member

corr on a DataFrame works without another DataFrame? (as it computes the correlation of the combinations of its columns):

In [4]: df = pd.DataFrame(np.random.randn(10,3))

In [6]: df.corr()
Out[6]:
          0         1         2
0  1.000000  0.116443  0.127691
1  0.116443  1.000000  0.472557
2  0.127691  0.472557  1.000000

@jreback
Copy link
Contributor

jreback commented Oct 7, 2015

you would have to change the signature of .corr to something like:

def corr(self, other=None, method='pearson', min_periods=1, axis=0, drop=False):

if other is None then it becomes self.

with a Series is tricker because then you need to know how to broadcast it, e.g. row-wise or column-wise (usually you mean this), though I think we could simply use the axis arg for this

@jreback jreback added API Design Numeric Operations Arithmetic, Comparison, and Logical operations labels Oct 7, 2015
@jreback jreback changed the title CLN: Integrate corrwith and corr CLN: Integrate .corrwith and .corr Oct 7, 2015
@max-sixty
Copy link
Contributor Author

With the changes to rolling(), now .corr() is incongruent between the rolling & normal implementation:

# df.corr(series) works with rolling

In [3]: pd.DataFrame(pd.np.random.rand(10,3)).rolling(window=3).corr(pd.Series(p
   ...: d.np.random.rand(10)))
Out[3]: 
          0         1         2
0       NaN       NaN       NaN
1       NaN       NaN       NaN
2 -0.673346  0.020557 -0.907277
3 -0.751201  0.589850 -0.956764
4 -0.744613  0.858481 -0.935376
5 -0.880597  0.611522 -0.990112
6 -0.968260 -0.530005 -0.095204
7 -0.241248  0.684507 -0.112472
8 -0.007827  0.769953 -0.845051
9 -0.341660  0.995147 -0.994606

# .corr(series) doesn't work without `rolling`:

In [4]: pd.DataFrame(pd.np.random.rand(10,3)).corr(pd.Series(pd.np.random.rand(1
   ...: 0)))
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/ops.py:716: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  result = getattr(x, name)(y)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-05b6520eb259> in <module>()
----> 1 pd.DataFrame(pd.np.random.rand(10,3)).corr(pd.Series(pd.np.random.rand(10)))

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.pyc in corr(self, method, min_periods)
   4553         mat = numeric_df.values
   4554 
-> 4555         if method == 'pearson':
   4556             correl = _algos.nancorr(com._ensure_float64(mat), minp=min_periods)
   4557         elif method == 'spearman':

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/ops.pyc in wrapper(self, other, axis)
    761                 other = np.asarray(other)
    762 
--> 763             res = na_op(values, other)
    764             if isscalar(res):
    765                 raise TypeError('Could not compare %s type with Series' %

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/ops.pyc in na_op(x, y)
    716                 result = getattr(x, name)(y)
    717                 if result is NotImplemented:
--> 718                     raise TypeError("invalid type comparison")
    719             except AttributeError:
    720                 result = op(x, y)

TypeError: invalid type comparison

@mroeschke mroeschke added API - Consistency Internal Consistency of API/Behavior Enhancement and removed API Design labels Apr 20, 2021
@jbrockmendel jbrockmendel added cov/corr and removed Numeric Operations Arithmetic, Comparison, and Logical operations labels Mar 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior cov/corr Enhancement
Projects
None yet
Development

No branches or pull requests

5 participants