-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REG: fix regression in df.corrwith on tied data when method is spearman #49032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/frame.py
Outdated
@@ -10598,18 +10598,23 @@ def corrwith( | |||
cols = self.columns | |||
ndf = self.values.transpose() | |||
k = other.values | |||
k_mask = ~np.isnan(k) | |||
if k.dtype == "bool": | |||
k = k.astype("float") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these casts needed? If needed, can it be converted to int instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bool type won't be accepted by rank_1d's signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we cast it to int, then nan in bool array will be -2147483648.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rank should accept uint8, so k = k.view(uint8)
What happens with nullable dtypes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TypeError: No matching signature found
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rank_1d is compiled for uint8_t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry. My code was tested on 1.4.3.. I find the signature is changed in 1.5.0. as numeric_object_t
instead of iu_64_floating_t
. Now it works. Thanks. @phofl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This worked on 1.4.4, so have to fix this too or revert as suggested in the issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason we can't use pd.isna?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to pd.isna now
with pytest.raises(TypeError, match="not supported for the input types"): | ||
with pytest.raises( | ||
TypeError, | ||
match=r"unsupported operand type\(s\) for /: 'str' and 'int'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now this is raised by np.corrcoef.
The failure comes from
|
"boolean" is currently not considered numeric |
k = other.values | ||
k_mask = ~other.isna() | ||
if isinstance(k, BaseMaskedArray): | ||
k = k._data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we special Casing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nullable arrays are not supported by rank_1d.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use the mask too in that case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean we still cast the type, but write rank step as the following?
libalgos.rank_1d(k, mask=nonnull_mask)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant more like k_mask = k._mask
@mroeschke please move this to milestone 2.0. |
Update to 1.5.2 for now, thanks @GYHHAHA |
Moving to 2.0 as we just reverted the original PR in #49140 and I think it's safer than "re-reverting" for 1.5.2 |
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen. |
DataFrame.corrwith
is changed in pandas 1.5.0 ?? #48826doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.