-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
fix Rolling for multi-index and reversed index. #28297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Does this close a particular issue? Also tests are the most critical part of any PR so would be great if you can add |
@WillAyd I added tests and updated whatsnew to reference the two issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good generally
can you update the top section with the issue references (use a closes #....) |
@jreback Fixed according to review comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm @jreback
""" | ||
if not self._on.is_monotonic: | ||
formatted = self.on or "index" | ||
if not (self._on.is_monotonic_increasing or self._on.is_monotonic_decreasing): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surprised this doesn't match the previous definition of just is_monotonic
. Would welcome that as a follow up to align the logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean updating is_monotonic
to return True also for monotonic decreasing? I tried that, but some of its callers rely on the fact that the sequence is also increasing and many tests failed afterwards. So I would rather not do that, especially not in this PR.
[date_range("20190101", periods=3), range(2)], names=["date", "seq"] | ||
), | ||
) | ||
result = df.rolling("10d", on=df.index.get_level_values("date")).sum() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a very odd way of calling this as it should be a scalar. why do you think we should allow this way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if I understand you. how about passing something like level = 0
to identify the datetime column of multi index to use for comparing rolling window?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right what I mean, is that this doesn't make rolling any more friendly to a Multi-index. we would need to add a level=
kwarg that can be used internall to create the on. Would take a followup PR for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments, ping on green
elif isinstance(self.obj, ABCDataFrame) and self.on in self.obj.columns: | ||
return Index(self.obj[self.on]) | ||
else: | ||
raise ValueError( | ||
"invalid on specified as {0}, " | ||
"must be a column (if DataFrame) " | ||
"must be a column (of DataFrame), an Index " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update the doc-string (and type annoation if its there) for on in Rolling
# non-monotonic | ||
df.index = reversed(df.index.tolist()) | ||
def test_non_monotonic_on(self): | ||
df = DataFrame( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the issue number as a comment
df = DataFrame( | ||
{"A": date_range("20130101", periods=5, freq="s"), "B": range(5)} | ||
) | ||
df.set_index("A", inplace=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't use inplace in tests at all, rather df = df.set_index(....)
[date_range("20190101", periods=3), range(2)], names=["date", "seq"] | ||
), | ||
) | ||
result = df.rolling("10d", on=df.index.get_level_values("date")).sum() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right what I mean, is that this doesn't make rolling any more friendly to a Multi-index. we would need to add a level=
kwarg that can be used internall to create the on. Would take a followup PR for that.
Thanks @leftys - very nice PR! |
df = DataFrame({"column": [3, 4, 4, 2, 1]}, index=reversed(index)) | ||
result = df.rolling("2s").min() | ||
expected = DataFrame( | ||
{"column": [3.0, 3.0, 3.0, 2.0, 1.0]}, index=reversed(index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that the 2 seconds time window is not respected, but the minimum is taken over the whole dataframe. To my understanding, the excpected results would be [3, 4, 2, 1, 1] (closed interval limits) or [3, 4, 2, 2, 1] (open interval end). Even if you consider the next 2 seconds in time order, i.e. the other way round, you would have [3, 3, 4, 2, 1].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue found here. By using df.rolling().apply(func), we can print the window inside func and it is the while dataframe till current offset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, the expected result is wrong. I debugged it a bit and found out that VariableWindowIndexer
indeed works only if the index is monotonic and increasing. I will create another PR to finally fix this.
EDIT: Here it comes: #32386
Fix Rolling operation for level of multi-index and descending time index (that is monotonic, but decreasing).
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff