Skip to content

BUG: fix regression with SerieGrouper with Timestamp index (#42390) #42391

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 10 additions & 8 deletions pandas/_libs/reduction.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -165,10 +165,6 @@ cdef class SeriesBinGrouper(_BaseGrouper):

result = np.empty(self.ngroups, dtype='O')

cached_index, cached_series = self._init_dummy_series_and_index(
islider, vslider
)

start = 0
try:
for i in range(self.ngroups):
Expand All @@ -178,6 +174,11 @@ cdef class SeriesBinGrouper(_BaseGrouper):
islider.move(start, end)
vslider.move(start, end)

if cached_index is None:
cached_index, cached_series = self._init_dummy_series_and_index(
islider, vslider
)

self._update_cached_objs(
cached_series, cached_index, islider, vslider)

Expand Down Expand Up @@ -254,10 +255,6 @@ cdef class SeriesGrouper(_BaseGrouper):

result = np.empty(self.ngroups, dtype='O')

cached_index, cached_series = self._init_dummy_series_and_index(
islider, vslider
)

start = 0
try:
for i in range(n):
Expand All @@ -275,6 +272,11 @@ cdef class SeriesGrouper(_BaseGrouper):
islider.move(start, end)
vslider.move(start, end)

if cached_index is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't this be done outside the loop?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it seems the first call of islider.move() and vslider.move() is needed before initializing the cache (which use slider bufattribute), otherwise I get a series where size of index doesn't match size of values (see the non regression test).

I'm new to pandas code and didn't go in deep in this bug. It might be a better way to fix this, so expert eyes are welcome :) Especially I didn't not yet understand why this only occur with DatetimeIndex using UTC timezone (naive datetime don't trigger the bug).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a closer look. This file is pretty tricky.

cached_index, cached_series = self._init_dummy_series_and_index(
islider, vslider
)

self._update_cached_objs(
cached_series, cached_index, islider, vslider)

Expand Down
17 changes: 17 additions & 0 deletions pandas/tests/groupby/test_bin_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,23 @@ def test_series_grouper():
tm.assert_almost_equal(counts, exp_counts)


def test_series_grouper_timestamp():
# GH 42390
obj = Series([1], index=[pd.Timestamp("2018-01-16 00:00:00+00:00")], dtype=np.intp)
labels = np.array([0], dtype=np.intp)

def agg(series):
# this should not raise
if series.isna().values.all():
return None
return np.sum(series)

grouper = libreduction.SeriesGrouper(obj, agg, labels, 1)
result, counts = grouper.get_result()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls use assert_series_equal

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, you mean assert_equal ? Because result and counts here are not Series but np.array. I modified to use assert_equal instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert_numpy_array_equal

pls let the question-asker hit the "resolve conversation" button

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I now use assert_numpy_array_equal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more to the point also do a full on test of the original example (the .agg)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more to the point also do a full on test of the original example (the .agg)

@philpep can you test the user-facing function here

tm.assert_numpy_array_equal(result, np.array([1], dtype=object))
tm.assert_numpy_array_equal(counts, np.array([1], dtype=np.int64))


def test_series_grouper_result_length_difference():
# GH 40014
obj = Series(np.random.randn(10), dtype="float64")
Expand Down