BUG: fix regression with SerieGrouper with Timestamp index (#42390) #42391

philpep · 2021-07-05T17:07:02Z

This fixes a regression introduced in c355ed1 where cache is not
initialized with correct state of islider and vslider.

On Timestamp index this trigger a "ValueError Length of values does not match length of index"

jreback · 2021-07-05T17:11:38Z

pandas/tests/groupby/test_bin_groupby.py

+        return np.sum(series)
+
+    grouper = libreduction.SeriesGrouper(obj, agg, labels, 1)
+    result, counts = grouper.get_result()


pls use assert_series_equal

Hi, you mean assert_equal ? Because result and counts here are not Series but np.array. I modified to use assert_equal instead.

assert_numpy_array_equal

pls let the question-asker hit the "resolve conversation" button

Ok. I now use assert_numpy_array_equal.

more to the point also do a full on test of the original example (the .agg)

more to the point also do a full on test of the original example (the .agg)

@philpep can you test the user-facing function here

jbrockmendel · 2021-07-05T18:05:25Z

pandas/_libs/reduction.pyx

@@ -275,6 +272,11 @@ cdef class SeriesGrouper(_BaseGrouper):
                    islider.move(start, end)
                    vslider.move(start, end)

+                    if cached_index is None:


why can't this be done outside the loop?

Because it seems the first call of islider.move() and vslider.move() is needed before initializing the cache (which use slider bufattribute), otherwise I get a series where size of index doesn't match size of values (see the non regression test).

I'm new to pandas code and didn't go in deep in this bug. It might be a better way to fix this, so expert eyes are welcome :) Especially I didn't not yet understand why this only occur with DatetimeIndex using UTC timezone (naive datetime don't trigger the bug).

I'll take a closer look. This file is pretty tricky.

jreback · 2021-07-07T00:12:15Z

pandas/tests/groupby/test_bin_groupby.py

+        return np.sum(series)
+
+    grouper = libreduction.SeriesGrouper(obj, agg, labels, 1)
+    result, counts = grouper.get_result()


more to the point also do a full on test of the original example (the .agg)

…v#42390) This fixes a regression introduced in c355ed1 where cache is not initialized with correct state of islider and vslider. The first call of {v,i}slider.move() must be done before initializing the cache. On Timestamp index this trigger a "ValueError Length of values does not match length of index" Closes pandas-dev#42390 Signed-off-by: Philippe Pepiot <[email protected]>

jbrockmendel · 2021-07-26T18:02:03Z

OK, I think the underlying problem here is that we shouldn't be getting to libreduction at all when we have a dt64tz DTI. I think DatetimeIndex._has_complex_internals should be True for tzaware (possibly all of the datetimelike indexes, not totally sure)

simonjayhawkins · 2021-07-28T09:31:06Z

OK, I think the underlying problem here is that we shouldn't be getting to libreduction at all when we have a dt64tz DTI. I think DatetimeIndex._has_complex_internals should be True for tzaware (possibly all of the datetimelike indexes, not totally sure)

see #42390 (comment)

jreback · 2021-08-10T20:06:13Z

status here?

simonjayhawkins · 2021-08-11T10:32:36Z

needs a release note and IIUC @jbrockmendel prefers to fix the underlying issue #42391 (comment)

@jbrockmendel 1.3.2 scheduled for end of this week, can you take a look. (There are several open issues on 1.3.2 so I think will schedule 1.3.3 for 3 weeks after 1.3.2 so no pressure)

jbrockmendel · 2021-08-12T23:39:16Z

The example in #42390 is fixed by changing DatetimeTimedeltaMixin._has_complex_internals to return True. This will be a less-invasive alternative to ripping out SeriesBinGrouper which i expect we'll do for 1.4

simonjayhawkins · 2021-08-15T09:17:52Z

closing this one as stale.

philpep force-pushed the fix-42390 branch from e09091c to b0dc5ac Compare July 5, 2021 17:08

jreback requested changes Jul 5, 2021

View reviewed changes

philpep force-pushed the fix-42390 branch from b0dc5ac to c571d73 Compare July 5, 2021 17:29

jbrockmendel reviewed Jul 5, 2021

View reviewed changes

philpep force-pushed the fix-42390 branch from c571d73 to 152f42f Compare July 6, 2021 08:30

jreback requested changes Jul 7, 2021

View reviewed changes

philpep force-pushed the fix-42390 branch from 152f42f to f30cf93 Compare July 7, 2021 09:10

philpep force-pushed the fix-42390 branch from f30cf93 to 4fb825f Compare July 7, 2021 11:50

jreback added Groupby Regression Functionality that used to work in a prior pandas version Datetime Datetime data dtype labels Jul 28, 2021

jreback added this to the 1.3.2 milestone Jul 28, 2021

simonjayhawkins linked an issue Aug 9, 2021 that may be closed by this pull request

BUG: Regression on SeriesGrouper using Timestamp index with pandas 1.3.0 #42390

Closed

3 tasks

simonjayhawkins closed this Aug 15, 2021

simonjayhawkins mentioned this pull request Aug 15, 2021

BUG: Regression on SeriesGrouper using Timestamp index with pandas 1.3.0 #42390

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: fix regression with SerieGrouper with Timestamp index (#42390) #42391

BUG: fix regression with SerieGrouper with Timestamp index (#42390) #42391

philpep commented Jul 5, 2021 •

edited by simonjayhawkins

Loading

jreback Jul 5, 2021

philpep Jul 5, 2021

jbrockmendel Jul 5, 2021

philpep Jul 6, 2021

jreback Jul 7, 2021

jbrockmendel Aug 12, 2021

jbrockmendel Jul 5, 2021

philpep Jul 6, 2021

jbrockmendel Jul 6, 2021

jreback Jul 7, 2021

jbrockmendel commented Jul 26, 2021

simonjayhawkins commented Jul 28, 2021

jreback commented Aug 10, 2021

simonjayhawkins commented Aug 11, 2021

jbrockmendel commented Aug 12, 2021

simonjayhawkins commented Aug 15, 2021

BUG: fix regression with SerieGrouper with Timestamp index (#42390) #42391

BUG: fix regression with SerieGrouper with Timestamp index (#42390) #42391

Conversation

philpep commented Jul 5, 2021 • edited by simonjayhawkins Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Jul 26, 2021

simonjayhawkins commented Jul 28, 2021

jreback commented Aug 10, 2021

simonjayhawkins commented Aug 11, 2021

jbrockmendel commented Aug 12, 2021

simonjayhawkins commented Aug 15, 2021

philpep commented Jul 5, 2021 •

edited by simonjayhawkins

Loading