-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Fix regression in groupby.rolling.corr/cov when other is same size as each group #44824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this fix relate to the changes in the PR that caused the regression? (maybe should put that in the issue rather than respond here)
I not familiar with this section of the codebase but this fix looks very specific to the reported case.
for instance, varying the OP example in the issue slightly
on 1.2.5
on 1.3.x/master/this PR
which now seems inconsistent with the
df_a.groupby(level=0).rolling(2).cov(df_b)
caseThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In OPs example,
len(other) == len(all groups)
.In your
df_a[:-1]
twist,len(other) != len(all groups)
, and comparing to the equivalentgroupby(level=0).apply(lambda x.rolling(2).cov(df_b))
case the result isBut we already have a similar unit test to the twist,
window/test_groupby.py:TestRolling:test_rolling_corr_cov
wherelen(other) != len(all groups)
and tests against the equivalentgroupby(...).apply(lambda x.rolling(...).cov(...))
case. The result is thatother
is reindexing each group withother.index
which "expands the result"There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bottom line:
I do acknowledge this fix seems a bit brittle.
The behavior where
len(other) != len(all groups)
is a bit opaque to me because comparing to thegroupby.apply(lambda x.rolling)
equivalent expression doesn't to have a discernable pattern to me.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that comparison to apply might not be appropriate - in general the built-in op should produce results that are potentially better than apply because apply has be agnostic to the operation.
That said, the current output on master from the example in #44824 (comment) doesn't make sense to me. It appears to me that pandas is creating a record with
(idx1, idx1, ...)
being(1, 2, ...)
and then declaring the value to be null because the record (that pandas created) is inconsistent.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems to make sense to me since the second level was never involved in the groupby.
If you have a change could you evaluate if this test behavior seems correct? It currently tries to match the equivalent
groupby.apply
version but I am not sure if there is a more sensible result.pandas/pandas/tests/window/test_groupby.py
Line 151 in 193ca73
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would the expected behavior of
frame1.cov(other=frame2)
be? Currently this argument doesn't exist, but if it did, I think the expectation would be to first align frame2 to frame1's index and then compute the covariance.Under this assumption, I think a more sensible result in the test linked to above would be to do the same alignment for each window during the rolling operation. This also has the benefit of making
be the same as
However if the answer to the first question is to have null values for rows in frame2 that don't exist in frame1, then the current behavior of the test above is the most sensible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I agree that it's probably make sense index alignment to happen between the group and other before computing covariance.