-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: add documentation to core.window.corr #20268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
4792d70
98ed79b
ce7fdac
248598b
87cbb3f
8690d50
fc2ee99
4c322b0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1028,19 +1028,112 @@ def _get_cov(X, Y): | |
_get_cov, pairwise=bool(pairwise)) | ||
|
||
_shared_docs['corr'] = dedent(""" | ||
%(name)s sample correlation | ||
Calculate %(name)s correlation. | ||
|
||
This function uses Pearson's definition of correlation. | ||
|
||
Parameters | ||
---------- | ||
other : Series, DataFrame, or ndarray, optional | ||
if not supplied then will default to self and produce pairwise output | ||
If not supplied then will default to self. | ||
pairwise : bool, default None | ||
If False then only matching columns between self and other will be | ||
used and the output will be a DataFrame. | ||
If True then all pairwise combinations will be calculated and the | ||
output will be a MultiIndex DataFrame in the case of DataFrame inputs. | ||
In the case of missing elements, only complete pairwise observations | ||
will be used.""") | ||
Calculate pairwise combinations of columns within a | ||
DataFrame. If other is not specified, defaults to True, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Put back ticks around parameters and built ins, so `other`, `True`, and `False` here. |
||
otherwise defaults to False. Not relevant for Series. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. :class:`~pandas.Series` |
||
See notes. | ||
**kwargs | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIRC we remove this. @TomAugspurger There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This line is fine. For the explanation, you can put "For compatibility with other %(name)s methods. Not used." |
||
Under Review. | ||
|
||
Returns | ||
------- | ||
Series or DataFrame | ||
Returned object type is determined by the caller of the | ||
%(name)s calculation. | ||
|
||
See Also | ||
-------- | ||
Series.%(name)s : Calling object with Series data | ||
DataFrame.%(name)s : Calling object with DataFrames | ||
Series.corr : Equivalent method for Series | ||
DataFrame.corr : Equivalent method for DataFrame | ||
%(name)s.cov : Similar method to calculate covariance | ||
numpy.corrcoef : NumPy Pearson's correlation calculation | ||
|
||
Notes | ||
----- | ||
Other should be always be specified, except for DataFrame inputs with | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. `other` and :class:~`pandas.DataFrame` |
||
pairwise set to `True`. All other input combinations will return all 1's. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "pairwise" in single back-ticks |
||
|
||
Function will return `NaN`s for correlations of equal valued sequences; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand this. If the sequences are equally valued, like in the case of non specifying There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is true, but trivial. I'm rewording for clarity. |
||
this is the result of a 0/0 division error. | ||
|
||
When pairwise is set to `False`, only matching columns between self and | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. `self` and `other` |
||
other will be used. | ||
|
||
When pairwise is set to `True`, the output will be a MultiIndex DataFrame | ||
with the original index on the first level, and the "other" DataFrame | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. other in single back-ticks I believe |
||
columns on the second level. | ||
|
||
In the case of missing elements, only complete pairwise observations | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I understand the correlation of "non-complete" elements will be set to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not correct as currently implemented. I agree that this would be the desired behavior, but would require a separate pull request. |
||
will be used. | ||
|
||
Examples | ||
-------- | ||
The below example shows a rolling calculation with a window size of | ||
four matching the equivalent function call using `numpy.corrcoef`. | ||
|
||
>>> v1 = [3, 3, 3, 5, 8] | ||
>>> v2 = [3, 4, 4, 4, 8] | ||
>>> fmt = "{0:.6f}" # limit the printed precision to 6 digits | ||
>>> import numpy as np | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You don't need to import numpy, it's automatically imported for each docstring, see https://python-sprints.github.io/pandas/guide/pandas_docstring.html#conventions-for-the-examples |
||
>>> # numpy returns a 2X2 array, the correlation coefficient | ||
>>> # is the number at entry [0][1] | ||
>>> print(fmt.format(np.corrcoef(v1[:-1], v2[:-1])[0][1])) | ||
0.333333 | ||
>>> print(fmt.format(np.corrcoef(v1[1:], v2[1:])[0][1])) | ||
0.916949 | ||
>>> s1 = pd.Series(v1) | ||
>>> s2 = pd.Series(v2) | ||
>>> s1.rolling(4).corr(s2) | ||
0 NaN | ||
1 NaN | ||
2 NaN | ||
3 0.333333 | ||
4 0.916949 | ||
dtype: float64 | ||
|
||
The below example shows a similar rolling calculation on a | ||
DataFrame using the pairwise option. | ||
|
||
>>> matrix = np.array([[51., 35.], [49., 30.], [47., 32.],\ | ||
[46., 31.], [50., 36.]]) | ||
>>> print(np.corrcoef(matrix[:-1,0], matrix[:-1,1]).round(7)) | ||
[[1. 0.6263001] | ||
[0.6263001 1. ]] | ||
>>> print(np.corrcoef(matrix[1:,0], matrix[1:,1]).round(7)) | ||
[[1. 0.5553681] | ||
[0.5553681 1. ]] | ||
>>> df = pd.DataFrame(matrix, columns=['X','Y']) | ||
>>> df | ||
X Y | ||
0 51.0 35.0 | ||
1 49.0 30.0 | ||
2 47.0 32.0 | ||
3 46.0 31.0 | ||
4 50.0 36.0 | ||
>>> df.rolling(4).corr(pairwise=True) | ||
X Y | ||
0 X NaN NaN | ||
Y NaN NaN | ||
1 X NaN NaN | ||
Y NaN NaN | ||
2 X NaN NaN | ||
Y NaN NaN | ||
3 X 1.000000 0.626300 | ||
Y 0.626300 1.000000 | ||
4 X 1.000000 0.555368 | ||
Y 0.555368 1.000000 | ||
""") | ||
|
||
def corr(self, other=None, pairwise=None, **kwargs): | ||
if other is None: | ||
|
@@ -1288,7 +1381,6 @@ def cov(self, other=None, pairwise=None, ddof=1, **kwargs): | |
ddof=ddof, **kwargs) | ||
|
||
@Substitution(name='rolling') | ||
@Appender(_doc_template) | ||
@Appender(_shared_docs['corr']) | ||
def corr(self, other=None, pairwise=None, **kwargs): | ||
return super(Rolling, self).corr(other=other, pairwise=pairwise, | ||
|
@@ -1527,7 +1619,6 @@ def cov(self, other=None, pairwise=None, ddof=1, **kwargs): | |
ddof=ddof, **kwargs) | ||
|
||
@Substitution(name='expanding') | ||
@Appender(_doc_template) | ||
@Appender(_shared_docs['corr']) | ||
def corr(self, other=None, pairwise=None, **kwargs): | ||
return super(Expanding, self).corr(other=other, pairwise=pairwise, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a link to Wikipedia or similar here?