DOC: add documentation to core.window.corr #20268

theandygross · 2018-03-11T01:48:18Z

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

PR title is "DOC: update the docstring"
The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
The html version looks good: python doc/make.py --single <your-function-or-method>
It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
################# Docstring (pandas.core.window.Rolling.corr)  #################
################################################################################

Calculate rolling correlation.

This function uses Pearson's definition of correlation.

Parameters
----------
other : Series, DataFrame, or ndarray, optional
    If not supplied then will default to self.
pairwise : bool, default None
    Calculate pairwise combinations of columns within a
    DataFrame. If other is not specified, defaults to True,
    otherwise defaults to False. Not relevant for Series.
    See notes.
**kwargs
    Under Review.

Returns
-------
Series or DataFrame
    Returned object type is determined by the caller of the
    rolling calculation.

See Also
--------
Series.rolling : Calling object with Series data
DataFrame.rolling : Calling object with DataFrames
Series.corr : Equivalent method for Series
DataFrame.corr : Equivalent method for DataFrame
rolling.cov : Similar method to calculate covariance
numpy.corrcoef : NumPy Pearson's correlation calculation

Notes
-----
Other should be always be specified, except for DataFrame inputs with
pairwise set to `True`. All other input combinations will return all 1's.

Function will return `NaN`s for correlations of equal valued sequences;
this is the result of a 0/0 division error.

When pairwise is set to `False`, only matching columns between self and
other will be used.

When pairwise is set to `True`, the output will be a MultiIndex DataFrame
with the original index on the first level, and the "other" DataFrame
columns on the second level.

In the case of missing elements, only complete pairwise observations
will be used.

Examples
--------
The below example shows a rolling calculation with a window size of
four matching the equivalent function call using `numpy.corrcoef`.

>>> v1 = [3, 3, 3, 5, 8]
>>> v2 = [3, 4, 4, 4, 8]
>>> fmt = "{0:.6f}"  # limit the printed precision to 6 digits
>>> import numpy as np
>>> # numpy returns a 2X2 array, the correlation coefficient
>>> # is the number at entry [0][1]
>>> print(fmt.format(np.corrcoef(v1[:-1], v2[:-1])[0][1]))
0.333333
>>> print(fmt.format(np.corrcoef(v1[1:], v2[1:])[0][1]))
0.916949
>>> s1 = pd.Series(v1)
>>> s2 = pd.Series(v2)
>>> s1.rolling(4).corr(s2)
0         NaN
1         NaN
2         NaN
3    0.333333
4    0.916949
dtype: float64

The below example shows a similar rolling calculation on a
DataFrame using the pairwise option.

>>> matrix = np.array([[51., 35.], [49., 30.], [47., 32.],    [46., 31.], [50., 36.]])
>>> print(np.corrcoef(matrix[:-1,0], matrix[:-1,1]).round(7))
[[1.         0.6263001]
 [0.6263001  1.       ]]
>>> print(np.corrcoef(matrix[1:,0], matrix[1:,1]).round(7))
[[1.         0.5553681]
 [0.5553681  1.        ]]
>>> df = pd.DataFrame(matrix, columns=['X','Y'])
>>> df
      X     Y
0  51.0  35.0
1  49.0  30.0
2  47.0  32.0
3  46.0  31.0
4  50.0  36.0
>>> df.rolling(4).corr(pairwise=True)
            X         Y
0 X       NaN       NaN
  Y       NaN       NaN
1 X       NaN       NaN
  Y       NaN       NaN
2 X       NaN       NaN
  Y       NaN       NaN
3 X  1.000000  0.626300
  Y  0.626300  1.000000
4 X  1.000000  0.555368
  Y  0.555368  1.000000

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	Errors in parameters section
		Parameters {'kwargs'} not documented
		Unknown parameters {'**kwargs'}
		Parameter "**kwargs" has no type

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

Checklist for other PRs (remove this part if you are doing a PR for the pandas documentation sprint):

closes #xxxx
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

WillAyd · 2018-03-11T02:24:35Z

pandas/core/window.py

-        In the case of missing elements, only complete pairwise observations
-        will be used.""")
+        Calculate pairwise combinations of columns within a
+        DataFrame. If other is not specified, defaults to True,


Put back ticks around parameters and built ins, so `other`, `True`, and `False` here.

WillAyd · 2018-03-11T02:26:06Z

pandas/core/window.py

-        will be used.""")
+        Calculate pairwise combinations of columns within a
+        DataFrame. If other is not specified, defaults to True,
+        otherwise defaults to False. Not relevant for Series.


:class:`~pandas.Series`

WillAyd · 2018-03-11T02:27:17Z

pandas/core/window.py

+
+    Notes
+    -----
+    Other should be always be specified, except for DataFrame inputs with


`other` and :class:~`pandas.DataFrame`

WillAyd · 2018-03-11T02:28:04Z

pandas/core/window.py

+    Function will return `NaN`s for correlations of equal valued sequences;
+    this is the result of a 0/0 division error.
+
+    When pairwise is set to `False`, only matching columns between self and


`self` and `other`

dukebody · 2018-03-14T18:16:45Z

pandas/core/window.py

-    %(name)s sample correlation
+    Calculate %(name)s correlation.
+
+    This function uses Pearson's definition of correlation.


Can you add a link to Wikipedia or similar here?

dukebody · 2018-03-14T18:22:53Z

pandas/core/window.py

+    Notes
+    -----
+    Other should be always be specified, except for DataFrame inputs with
+    pairwise set to `True`. All other input combinations will return all 1's.


"pairwise" in single back-ticks

dukebody · 2018-03-14T18:24:45Z

pandas/core/window.py

+    Other should be always be specified, except for DataFrame inputs with
+    pairwise set to `True`. All other input combinations will return all 1's.
+
+    Function will return `NaN`s for correlations of equal valued sequences;


I don't understand this. If the sequences are equally valued, like in the case of non specifying other and pairwise=False, the correlation of each column with itself should be all 1's. Am I wrong?

This is true, but trivial. I'm rewording for clarity.

dukebody · 2018-03-14T18:26:36Z

pandas/core/window.py

+    other will be used.
+
+    When pairwise is set to `True`, the output will be a MultiIndex DataFrame
+    with the original index on the first level, and the "other" DataFrame


other in single back-ticks I believe

dukebody · 2018-03-14T18:27:29Z

pandas/core/window.py

+    with the original index on the first level, and the "other" DataFrame
+    columns on the second level.
+
+    In the case of missing elements, only complete pairwise observations


I understand the correlation of "non-complete" elements will be set to NaN? Can we write this in the explanation if so?

This is not correct as currently implemented. I agree that this would be the desired behavior, but would require a separate pull request.

dukebody · 2018-03-14T18:28:53Z

pandas/core/window.py

+    >>> v1 = [3, 3, 3, 5, 8]
+    >>> v2 = [3, 4, 4, 4, 8]
+    >>> fmt = "{0:.6f}"  # limit the printed precision to 6 digits
+    >>> import numpy as np


You don't need to import numpy, it's automatically imported for each docstring, see https://python-sprints.github.io/pandas/guide/pandas_docstring.html#conventions-for-the-examples

pep8speaks · 2018-04-15T18:22:24Z

Hello @theandygross! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on July 08, 2018 at 14:31 Hours UTC

jreback · 2018-04-15T18:54:54Z

pandas/core/window.py

-    %(name)s sample correlation
+    Calculate %(name)s correlation.
+
+    This function uses Pearson's definition of correlation 


can you move to Notes

jreback · 2018-04-15T18:55:29Z

pandas/core/window.py

+        DataFrame. If `other` is not specified, defaults to `True`,
+        otherwise defaults to `False`. Not relevant for :class:`~pandas.Series`.
+        See notes.
+    **kwargs


IIRC we remove this. @TomAugspurger

This line is fine.

For the explanation, you can put "For compatibility with other %(name)s methods. Not used."

…corr

codecov · 2018-07-08T14:30:09Z

Codecov Report

❗ No coverage uploaded for pull request base (master@13febab). Click here to learn what that means.
The diff coverage is 100%.

@@            Coverage Diff            @@
##             master   #20268   +/-   ##
=========================================
  Coverage          ?   91.84%           
=========================================
  Files             ?      153           
  Lines             ?    49275           
  Branches          ?        0           
=========================================
  Hits              ?    45255           
  Misses            ?     4020           
  Partials          ?        0

Flag	Coverage Δ
#multiple	`90.23% <100%> (?)`
#single	`41.9% <27.27%> (?)`

Impacted Files	Coverage Δ
pandas/core/window.py	`96.25% <100%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 13febab...4c322b0. Read the comment docs.

WillAyd · 2018-07-08T14:35:22Z

Thanks @theandygross !

mnagarkar added 3 commits March 10, 2018 17:15

add documentation to rolling.corr

4792d70

clean-up pep8

98ed79b

clean-up pep8

ce7fdac

WillAyd requested changes Mar 11, 2018

View reviewed changes

jorisvandenbossche added Docs and removed Docs labels Mar 11, 2018

dukebody reviewed Mar 14, 2018

View reviewed changes

theandygross added 2 commits April 15, 2018 10:59

add link to Pearson's correlation calc.

248598b

Add changes requested in review.

87cbb3f

jreback requested changes Apr 15, 2018

View reviewed changes

WillAyd added 2 commits July 8, 2018 09:26

Merge remote-tracking branch 'upstream/master' into docstring_window_…

8690d50

…corr

Minor fixup

fc2ee99

LINT fixup

4c322b0

WillAyd merged commit 7d58ce6 into pandas-dev:master Jul 8, 2018

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

DOC: add documentation to core.window.corr (pandas-dev#20268)

b78ef4a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: add documentation to core.window.corr #20268

DOC: add documentation to core.window.corr #20268

theandygross commented Mar 11, 2018

WillAyd Mar 11, 2018

WillAyd Mar 11, 2018

WillAyd Mar 11, 2018

WillAyd Mar 11, 2018

dukebody Mar 14, 2018

dukebody Mar 14, 2018

dukebody Mar 14, 2018

theandygross Apr 15, 2018

dukebody Mar 14, 2018

dukebody Mar 14, 2018

theandygross Apr 15, 2018

dukebody Mar 14, 2018

pep8speaks commented Apr 15, 2018 •

edited

Loading

jreback Apr 15, 2018

jreback Apr 15, 2018

TomAugspurger Apr 15, 2018

codecov bot commented Jul 8, 2018 •

edited

Loading

WillAyd commented Jul 8, 2018

DOC: add documentation to core.window.corr #20268

DOC: add documentation to core.window.corr #20268

Conversation

theandygross commented Mar 11, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pep8speaks commented Apr 15, 2018 • edited Loading

Comment last updated on July 08, 2018 at 14:31 Hours UTC

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jul 8, 2018 • edited Loading

Codecov Report

WillAyd commented Jul 8, 2018

pep8speaks commented Apr 15, 2018 •

edited

Loading

codecov bot commented Jul 8, 2018 •

edited

Loading