Skip to content

DOC: add documentation to core.window.corr #20268

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 8, 2018
Merged
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 102 additions & 10 deletions pandas/core/window.py
Original file line number Diff line number Diff line change
Expand Up @@ -1028,19 +1028,113 @@ def _get_cov(X, Y):
_get_cov, pairwise=bool(pairwise))

_shared_docs['corr'] = dedent("""
%(name)s sample correlation
Calculate %(name)s correlation.

This function uses Pearson's definition of correlation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move to Notes

(https://en.wikipedia.org/wiki/Pearson_correlation_coefficient).

Parameters
----------
other : Series, DataFrame, or ndarray, optional
if not supplied then will default to self and produce pairwise output
If not supplied then will default to self.
pairwise : bool, default None
If False then only matching columns between self and other will be
used and the output will be a DataFrame.
If True then all pairwise combinations will be calculated and the
output will be a MultiIndex DataFrame in the case of DataFrame inputs.
In the case of missing elements, only complete pairwise observations
will be used.""")
Calculate pairwise combinations of columns within a
DataFrame. If `other` is not specified, defaults to `True`,
otherwise defaults to `False`. Not relevant for :class:`~pandas.Series`.
See notes.
**kwargs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC we remove this. @TomAugspurger

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is fine.

For the explanation, you can put "For compatibility with other %(name)s methods. Not used."

Under Review.

Returns
-------
Series or DataFrame
Returned object type is determined by the caller of the
%(name)s calculation.

See Also
--------
Series.%(name)s : Calling object with Series data
DataFrame.%(name)s : Calling object with DataFrames
Series.corr : Equivalent method for Series
DataFrame.corr : Equivalent method for DataFrame
%(name)s.cov : Similar method to calculate covariance
numpy.corrcoef : NumPy Pearson's correlation calculation

Notes
-----
When `other` is not specified, the output will be self correlation (e.g.
all 1's), except for :class:`~pandas.DataFrame` inputs with `pairwise`
set to `True`.

Function will return `NaN`s for correlations of equal valued sequences;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this. If the sequences are equally valued, like in the case of non specifying other and pairwise=False, the correlation of each column with itself should be all 1's. Am I wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true, but trivial. I'm rewording for clarity.

this is the result of a 0/0 division error.

When `pairwise` is set to `False`, only matching columns between `self` and
`other` will be used.

When `pairwise` is set to `True`, the output will be a MultiIndex DataFrame
with the original index on the first level, and the `other` DataFrame
columns on the second level.

In the case of missing elements, only complete pairwise observations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the correlation of "non-complete" elements will be set to NaN? Can we write this in the explanation if so?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct as currently implemented. I agree that this would be the desired behavior, but would require a separate pull request.

will be used.

Examples
--------
The below example shows a rolling calculation with a window size of
four matching the equivalent function call using `numpy.corrcoef`.

>>> v1 = [3, 3, 3, 5, 8]
>>> v2 = [3, 4, 4, 4, 8]
>>> fmt = "{0:.6f}" # limit the printed precision to 6 digits
>>> # numpy returns a 2X2 array, the correlation coefficient
>>> # is the number at entry [0][1]
>>> print(fmt.format(np.corrcoef(v1[:-1], v2[:-1])[0][1]))
0.333333
>>> print(fmt.format(np.corrcoef(v1[1:], v2[1:])[0][1]))
0.916949
>>> s1 = pd.Series(v1)
>>> s2 = pd.Series(v2)
>>> s1.rolling(4).corr(s2)
0 NaN
1 NaN
2 NaN
3 0.333333
4 0.916949
dtype: float64

The below example shows a similar rolling calculation on a
DataFrame using the pairwise option.

>>> matrix = np.array([[51., 35.], [49., 30.], [47., 32.],\
[46., 31.], [50., 36.]])
>>> print(np.corrcoef(matrix[:-1,0], matrix[:-1,1]).round(7))
[[1. 0.6263001]
[0.6263001 1. ]]
>>> print(np.corrcoef(matrix[1:,0], matrix[1:,1]).round(7))
[[1. 0.5553681]
[0.5553681 1. ]]
>>> df = pd.DataFrame(matrix, columns=['X','Y'])
>>> df
X Y
0 51.0 35.0
1 49.0 30.0
2 47.0 32.0
3 46.0 31.0
4 50.0 36.0
>>> df.rolling(4).corr(pairwise=True)
X Y
0 X NaN NaN
Y NaN NaN
1 X NaN NaN
Y NaN NaN
2 X NaN NaN
Y NaN NaN
3 X 1.000000 0.626300
Y 0.626300 1.000000
4 X 1.000000 0.555368
Y 0.555368 1.000000
""")

def corr(self, other=None, pairwise=None, **kwargs):
if other is None:
Expand Down Expand Up @@ -1288,7 +1382,6 @@ def cov(self, other=None, pairwise=None, ddof=1, **kwargs):
ddof=ddof, **kwargs)

@Substitution(name='rolling')
@Appender(_doc_template)
@Appender(_shared_docs['corr'])
def corr(self, other=None, pairwise=None, **kwargs):
return super(Rolling, self).corr(other=other, pairwise=pairwise,
Expand Down Expand Up @@ -1527,7 +1620,6 @@ def cov(self, other=None, pairwise=None, ddof=1, **kwargs):
ddof=ddof, **kwargs)

@Substitution(name='expanding')
@Appender(_doc_template)
@Appender(_shared_docs['corr'])
def corr(self, other=None, pairwise=None, **kwargs):
return super(Expanding, self).corr(other=other, pairwise=pairwise,
Expand Down