-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: fix regression with SerieGrouper with Timestamp index (#42390) #42391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -27,6 +27,23 @@ def test_series_grouper(): | |
tm.assert_almost_equal(counts, exp_counts) | ||
|
||
|
||
def test_series_grouper_timestamp(): | ||
# GH 42390 | ||
obj = Series([1], index=[pd.Timestamp("2018-01-16 00:00:00+00:00")], dtype=np.intp) | ||
labels = np.array([0], dtype=np.intp) | ||
|
||
def agg(series): | ||
# this should not raise | ||
if series.isna().values.all(): | ||
return None | ||
return np.sum(series) | ||
|
||
grouper = libreduction.SeriesGrouper(obj, agg, labels, 1) | ||
result, counts = grouper.get_result() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. pls use assert_series_equal There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi, you mean There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. assert_numpy_array_equal pls let the question-asker hit the "resolve conversation" button There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok. I now use assert_numpy_array_equal. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. more to the point also do a full on test of the original example (the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@philpep can you test the user-facing function here |
||
tm.assert_numpy_array_equal(result, np.array([1], dtype=object)) | ||
tm.assert_numpy_array_equal(counts, np.array([1], dtype=np.int64)) | ||
|
||
|
||
def test_series_grouper_result_length_difference(): | ||
# GH 40014 | ||
obj = Series(np.random.randn(10), dtype="float64") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can't this be done outside the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it seems the first call of
islider.move()
andvslider.move()
is needed before initializing the cache (which use sliderbuf
attribute), otherwise I get a series where size of index doesn't match size of values (see the non regression test).I'm new to pandas code and didn't go in deep in this bug. It might be a better way to fix this, so expert eyes are welcome :) Especially I didn't not yet understand why this only occur with DatetimeIndex using UTC timezone (naive datetime don't trigger the bug).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a closer look. This file is pretty tricky.