-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Use a more helpful error message for invalid correlation methods in DataFrame.corr #22298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @dsaxton! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on August 14, 2018 at 15:49 Hours UTC |
Codecov Report
@@ Coverage Diff @@
## master #22298 +/- ##
==========================================
- Coverage 92.08% 92.05% -0.03%
==========================================
Files 169 169
Lines 50706 50713 +7
==========================================
- Hits 46691 46683 -8
- Misses 4015 4030 +15
Continue to review full report at Codecov.
|
Thanks. Could you add a release note in 0.24.0, and a test to ensure that code is hit for bad inputs? |
I'm working on some additional modifications to the DataFrame correlation methods to address some potential bugs I found and close another existing issue, is it okay if I close this pull request and reopen another with all the changes? |
no pls don’t put independent changes in independent PRs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also test / fix this on Series as well.
pandas/core/frame.py
Outdated
@@ -6682,6 +6682,9 @@ def corr(self, method='pearson', min_periods=1): | |||
c = corrf(ac, bc) | |||
correl[i, j] = c | |||
correl[j, i] = c | |||
else: | |||
raise ValueError("method must be either 'pearson', " | |||
"'spearman' or 'kendall'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add ....{method} was supplied'.format(method=method)
pandas/tests/frame/test_analytics.py
Outdated
@@ -130,6 +130,11 @@ def test_corr_cov_independent_index_column(self): | |||
assert result.index is not result.columns | |||
assert result.index.equals(result.columns) | |||
|
|||
def test_corr_invalid_method(self): | |||
df = pd.DataFrame(np.random.normal(size=(10, 2))) | |||
with pytest.raises(ValueError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you check this with an assert_raises_regex (e.g. match the error)
add the gh issue number as a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. Is the convention here to use the most specific regex that matches the error message, or just one that is reasonably specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reasonable is fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comments. ping on green.
pandas/core/series.py
Outdated
if method in ['pearson', 'spearman', 'kendall']: | ||
return nanops.nancorr(this.values, other.values, method=method, | ||
min_periods=min_periods) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need the else here
pandas/tests/frame/test_analytics.py
Outdated
@@ -130,6 +130,13 @@ def test_corr_cov_independent_index_column(self): | |||
assert result.index is not result.columns | |||
assert result.index.equals(result.columns) | |||
|
|||
def test_corr_invalid_method(self): | |||
df = pd.DataFrame(np.random.normal(size=(10, 2))) | |||
pttrn = ("method must be either 'pearson', 'spearman', " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pttrn -> msg
pandas/tests/frame/test_analytics.py
Outdated
@@ -130,6 +130,13 @@ def test_corr_cov_independent_index_column(self): | |||
assert result.index is not result.columns | |||
assert result.index.equals(result.columns) | |||
|
|||
def test_corr_invalid_method(self): | |||
df = pd.DataFrame(np.random.normal(size=(10, 2))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the gh issue number as a comment
@@ -778,6 +778,14 @@ def test_corr_rank(self): | |||
tm.assert_almost_equal(A.corr(B, method='kendall'), kexp) | |||
tm.assert_almost_equal(A.corr(B, method='spearman'), sexp) | |||
|
|||
def test_corr_invalid_method(self): | |||
s1 = pd.Series(np.random.randn(10)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
def test_corr_invalid_method(self): | ||
s1 = pd.Series(np.random.randn(10)) | ||
s2 = pd.Series(np.random.randn(10)) | ||
pttrn = ("method must be either 'pearson', 'spearman', " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pttrn -> msg
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -468,6 +468,7 @@ Other API Changes | |||
- :meth:`PeriodIndex.tz_convert` and :meth:`PeriodIndex.tz_localize` have been removed (:issue:`21781`) | |||
- :class:`Index` subtraction will attempt to operate element-wise instead of raising ``TypeError`` (:issue:`19369`) | |||
- :class:`pandas.io.formats.style.Styler` supports a ``number-format`` property when using :meth:`~pandas.io.formats.style.Styler.to_excel` (:issue:`22015`) | |||
- :meth:`DataFrame.corr` now raises a ``ValueError`` instead of a ``KeyError`` when supplied with an invalid method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the issue number
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the PR number as we don't have an issue number
…mber to tests and release notes
@jreback Looks like the tests passed. Let me know if there are any other changes that should be made, if not, thanks for all your help. |
Thanks @dsaxton! |
@TomAugspurger Happy to help! |
DataFrame.corr
currently returns aKeyError
for invalid correlation methods. The proposed change would instead return aValueError
with an error message reminding the user of the valid correlation methods.