-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: fix regression with SerieGrouper with Timestamp index (#42390) #42391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
return np.sum(series) | ||
|
||
grouper = libreduction.SeriesGrouper(obj, agg, labels, 1) | ||
result, counts = grouper.get_result() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls use assert_series_equal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, you mean assert_equal
? Because result
and counts
here are not Series
but np.array
. I modified to use assert_equal
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert_numpy_array_equal
pls let the question-asker hit the "resolve conversation" button
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I now use assert_numpy_array_equal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more to the point also do a full on test of the original example (the .agg
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more to the point also do a full on test of the original example (the .agg)
@philpep can you test the user-facing function here
@@ -275,6 +272,11 @@ cdef class SeriesGrouper(_BaseGrouper): | |||
islider.move(start, end) | |||
vslider.move(start, end) | |||
|
|||
if cached_index is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can't this be done outside the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it seems the first call of islider.move()
and vslider.move()
is needed before initializing the cache (which use slider buf
attribute), otherwise I get a series where size of index doesn't match size of values (see the non regression test).
I'm new to pandas code and didn't go in deep in this bug. It might be a better way to fix this, so expert eyes are welcome :) Especially I didn't not yet understand why this only occur with DatetimeIndex using UTC timezone (naive datetime don't trigger the bug).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a closer look. This file is pretty tricky.
return np.sum(series) | ||
|
||
grouper = libreduction.SeriesGrouper(obj, agg, labels, 1) | ||
result, counts = grouper.get_result() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more to the point also do a full on test of the original example (the .agg
)
…v#42390) This fixes a regression introduced in c355ed1 where cache is not initialized with correct state of islider and vslider. The first call of {v,i}slider.move() must be done before initializing the cache. On Timestamp index this trigger a "ValueError Length of values does not match length of index" Closes pandas-dev#42390 Signed-off-by: Philippe Pepiot <[email protected]>
OK, I think the underlying problem here is that we shouldn't be getting to libreduction at all when we have a dt64tz DTI. I think DatetimeIndex._has_complex_internals should be True for tzaware (possibly all of the datetimelike indexes, not totally sure) |
see #42390 (comment) |
status here? |
needs a release note and IIUC @jbrockmendel prefers to fix the underlying issue #42391 (comment) @jbrockmendel 1.3.2 scheduled for end of this week, can you take a look. (There are several open issues on 1.3.2 so I think will schedule 1.3.3 for 3 weeks after 1.3.2 so no pressure) |
The example in #42390 is fixed by changing DatetimeTimedeltaMixin._has_complex_internals to return True. This will be a less-invasive alternative to ripping out SeriesBinGrouper which i expect we'll do for 1.4 |
closing this one as stale. |
closes #42390
This fixes a regression introduced in c355ed1 where cache is not
initialized with correct state of islider and vslider.
On Timestamp index this trigger a "ValueError Length of values does not match length of index"