BUG: Fixes to FixedForwardWindowIndexer and GroupbyIndexer (#43267) #43291

dsm054 · 2021-08-30T01:02:42Z

closes BUG: crash in df.groupby.rolling.mean with forward window indexer #43267

This PR addresses three issues, two major and one minor:

(1) If you repeated the use of a FixedForwardWindowIndexer, the indexer would be mutated and a window_size of 0 would be used. This would lead to an empty end array which unsurprisingly led to segfaults on the second run. Similar issues could be produced with other BaseIndexer subclasses, this one was simply the loudest.

(2) The end array generated by FixedForwardWindowIndexer.get_window_bounds was the wrong length, leading to longer arrays than necessary. When there was only one involved, it didn't matter much: the extra ones were simply ignored. But in the case of a groupby, these end arrays were concatenated, meaning that end values for the first group could leak into the second, and so on.

(3) There were some typos for "indices".

dsm054 · 2021-08-30T01:11:34Z

@github-actions pre-commit

pep8speaks · 2021-08-30T01:49:19Z

Hello @dsm054! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-10-16 18:44:59 UTC

mroeschke · 2021-08-30T01:50:21Z

pandas/core/indexers/objects.py

@@ -93,7 +93,6 @@ def get_window_bounds(

        end = np.clip(end, 0, num_values)
        start = np.clip(start, 0, num_values)
-


Do you mind reverting these whitespace changes? It's somewhat distracting from the other changes in this PR

mroeschke · 2021-08-30T01:51:16Z

pandas/core/window/rolling.py

                start, end = window_indexer.get_window_bounds(
                    num_values=len(x),
                    min_periods=min_periods,
                    center=self.center,
                    closed=self.closed,
                )
+
+                # From get_window_bounds, those two should be equal in length of array
+                assert len(start) == len(end)


Could this assertion be moved to GroupbyIndexer.get_window_bounds instead?

I'm fine with putting one there too -- I was just mirroring existing practice -- but I think an independent check here has merit.

If start/end are broken here, result = calc(values) is probably going to crash a few lines below.

Since there's logic in _apply which calculates min_periods which is passed as an arg to get_window_bounds, we can dream up scenarios where everything would be fine at that level but broken here.

Okay. Putting this assert here is fine as well then

mroeschke · 2021-08-30T01:51:30Z

pandas/core/window/rolling.py

@@ -1429,6 +1435,10 @@ def cov_func(x, y):
                center=self.center,
                closed=self.closed,
            )
+
+            # From get_window_bounds, those two should be equal in length of array
+            assert len(start) == len(end)


Same (as well as below)

mroeschke · 2021-08-30T01:52:49Z

pandas/tests/window/test_base_indexer.py

+@pytest.mark.parametrize(
+    "df",
+    [
+        DataFrame({"a": [1, 1], "b": [0, 1]}),


Could you make the DataFrame call in the test method instead?

mroeschke · 2021-08-30T01:53:42Z

pandas/tests/window/test_base_indexer.py

+    start, end = indexer.get_window_bounds(num_values=num_values)
+
+    tm.assert_equal(start, np.array(expected_start, dtype="int64"))
+    tm.assert_equal(end, np.array(expected_end, dtype="int64"))


tm.assert_numpy_array_equal I believe

mroeschke · 2021-08-30T01:54:20Z

pandas/tests/window/test_rolling.py

+        ),
+    ],
+)
+def test_rolling_groupby_with_fixed_forward_specific(df, window_size, expected):


You can move these tests in pandas/tests/window/test_base_indexer.py

mroeschke · 2021-08-30T01:58:44Z

doc/source/whatsnew/v1.4.0.rst

@@ -355,7 +355,7 @@ Groupby/resample/rolling
 - Bug in :meth:`pandas.DataFrame.ewm`, where non-float64 dtypes were silently failing (:issue:`42452`)
 - Bug in :meth:`pandas.DataFrame.rolling` operation along rows (``axis=1``) incorrectly omits columns containing ``float16`` and ``float32`` (:issue:`41779`)
 - Bug in :meth:`Resampler.aggregate` did not allow the use of Named Aggregation (:issue:`32803`)
-
+- Bug in :class:`GroupbyIndexer` and :class:`FixedForwardWindowIndexer` leading to segfaults and incorrect windows (:issue:`43267`)


:class:`pandas.api.indexers.FixedForwardWindowIndexer`

instead of mentioning GroupbyIndexer, mention the groupby.rolling operation

Could you be more specific with "incorrect windows"

@mroeschke would these changes be okay for 1.3.3? #43267 (comment) or would we want to split this PR?

Sure this would be okay for 1.3.3. @dsm054 could you moved this note to 1.3.3?

dsm054 · 2021-08-30T12:34:36Z

@github-actions pre-commit

mroeschke · 2021-08-30T16:55:53Z

doc/source/whatsnew/v1.4.0.rst

@@ -355,7 +355,7 @@ Groupby/resample/rolling
 - Bug in :meth:`pandas.DataFrame.ewm`, where non-float64 dtypes were silently failing (:issue:`42452`)
 - Bug in :meth:`pandas.DataFrame.rolling` operation along rows (``axis=1``) incorrectly omits columns containing ``float16`` and ``float32`` (:issue:`41779`)
 - Bug in :meth:`Resampler.aggregate` did not allow the use of Named Aggregation (:issue:`32803`)
-
+- Bug in :meth:`pandas.DataFrame.groupby.rolling` and :class:`pandas.api.indexers.FixedForwardWindowIndexer` leading to segfaults and window endpoints being mixed across groups (:issue:`43267`)


Could you move this to the bug fix section in 1.3.3

planning the 1.3.3 release tomorrow. changing milestone to 1.3.4. (1.3.4 release notes won't be added to master until after 1.3.3 is released)

Moved to 1.3.4 per request

mroeschke · 2021-08-30T16:56:49Z

pandas/core/indexers/objects.py

@@ -341,18 +342,20 @@ def get_window_bounds(
            )
            start = start.astype(np.int64)
            end = end.astype(np.int64)
-            # Cannot use groupby_indicies as they might not be monotonic with the object
+            # From get_window_bounds, those two should be equal in length of array
+            assert len(start) == len(end)


Could you move this assert to before the return?

We could, but that would hide a category of bugs: say that there were two keys, and the start,end pairs were (3,4) and (4,3). After concatenation we'd see 7==7 and be happy.

I'd be fine with adding another check before the return.

pandas/core/window/rolling.py

pandas/core/indexers/objects.py

pandas/core/window/rolling.py

jreback · 2021-09-09T19:04:32Z

@dsm054 a few comments if you can update (and merge master)

simonjayhawkins · 2021-09-11T21:19:24Z

doc/source/whatsnew/v1.4.0.rst

@@ -355,7 +355,7 @@ Groupby/resample/rolling
 - Bug in :meth:`pandas.DataFrame.ewm`, where non-float64 dtypes were silently failing (:issue:`42452`)
 - Bug in :meth:`pandas.DataFrame.rolling` operation along rows (``axis=1``) incorrectly omits columns containing ``float16`` and ``float32`` (:issue:`41779`)
 - Bug in :meth:`Resampler.aggregate` did not allow the use of Named Aggregation (:issue:`32803`)
-
+- Bug in :meth:`pandas.DataFrame.groupby.rolling` and :class:`pandas.api.indexers.FixedForwardWindowIndexer` leading to segfaults and window endpoints being mixed across groups (:issue:`43267`)


planning the 1.3.3 release tomorrow. changing milestone to 1.3.4. (1.3.4 release notes won't be added to master until after 1.3.3 is released)

dsm054 · 2021-09-12T02:36:37Z

I've booked time tomorrow to update to master and address outstanding comments.

dsm054 · 2021-09-13T01:44:21Z

@github-actions pre-commit

…v#43267)

pandas/core/indexers/objects.py

dsm054 · 2021-09-22T01:33:50Z

@jreback: regarding the type problems, I'm talking about the mypy errors the build is preventing at the moment. (So far, it's taken far longer to deal with the overhead of the process than it did to debug and fix the problem in the first place. :-)

pandas/core/window/rolling.py

mroeschke · 2021-09-22T03:34:35Z

pandas/core/window/rolling.py

        elif self._win_freq_i8 is not None:
            rolling_indexer = VariableWindowIndexer
            window = self._win_freq_i8
        else:
            rolling_indexer = FixedWindowIndexer
            window = self.window
+


Could you undo these whitespace changes? (Seems to be in other places as well)

jreback · 2021-09-22T13:05:25Z

@dsm054

(So far, it's taken far longer to deal with the overhead of the process than it did to debug and fix the problem in the first place. :-)

absolutely. code fixes are easy, tests hard, typing hard-est

jreback · 2021-09-29T13:28:40Z

@dsm054 can you fixup / merge master when you can

simonjayhawkins · 2021-10-13T10:49:07Z

@mroeschke 1.3.4 is scheduled for the end of the week. will you have time to push this over the line?

if not, would prefer not to push to 1.3.5, as it will hopefully be the last in 1.3.x and would require another 1.3.x release if any issues, so could either push to 1.4 or close as stale.

mroeschke · 2021-10-13T16:50:46Z

@simonjayhawkins I may be able to finish this on the weekend, so let's assume this will target 1.4

…rdwindowindexer

mroeschke

LGTM @simonjayhawkins should be ready to merge

jreback

@mroeschke minor comment otherwise lgtm and merge when ready

jreback · 2021-10-16T00:03:13Z

pandas/core/indexers/objects.py

@@ -279,8 +278,8 @@ class GroupbyIndexer(BaseIndexer):
    def __init__(
        self,
        index_array: np.ndarray | None = None,
-        window_size: int = 0,
-        groupby_indicies: dict | None = None,
+        window_size: int | type[BaseIndexer] = 0,


hmm u can only set the default if this is an int

Suggested change

window_size: int | type[BaseIndexer] = 0,

window_size: int | BaseIndexer = 0,

and can remove an ignore.

simonjayhawkins · 2021-10-16T10:23:59Z

doc/source/whatsnew/v1.3.4.rst

@@ -33,6 +33,7 @@ Fixed regressions

 Bug fixes
 ~~~~~~~~~
+- Bug in :meth:`pandas.DataFrame.groupby.rolling` and :class:`pandas.api.indexers.FixedForwardWindowIndexer` leading to segfaults and window endpoints being mixed across groups (:issue:`43267`)


nit. for consistency

Suggested change

- Bug in :meth:`pandas.DataFrame.groupby.rolling` and :class:`pandas.api.indexers.FixedForwardWindowIndexer` leading to segfaults and window endpoints being mixed across groups (:issue:`43267`)

- Fixed bug in :meth:`pandas.DataFrame.groupby.rolling` and :class:`pandas.api.indexers.FixedForwardWindowIndexer` leading to segfaults and window endpoints being mixed across groups (:issue:`43267`)

pandas/core/indexers/objects.py

simonjayhawkins · 2021-10-16T11:33:00Z

pandas/core/indexers/objects.py

@@ -279,8 +278,8 @@ class GroupbyIndexer(BaseIndexer):
    def __init__(
        self,
        index_array: np.ndarray | None = None,
-        window_size: int = 0,
-        groupby_indicies: dict | None = None,
+        window_size: int | type[BaseIndexer] = 0,


Suggested change

window_size: int | type[BaseIndexer] = 0,

window_size: int | BaseIndexer = 0,

and can remove an ignore.

pandas/core/window/rolling.py

…rdwindowindexer

dsm054 · 2021-10-16T18:54:19Z

@mroeschke, @simonjayhawkins: thanks for picking this up! (long story)

simonjayhawkins · 2021-10-16T20:27:08Z

Thanks @dsm054 and @mroeschke

… and GroupbyIndexer (pandas-dev#43267)

…byIndexer (#43267) (#44061) Co-authored-by: DSM <[email protected]>

dsm054 force-pushed the repair_fixedforwardwindowindexer branch 2 times, most recently from 2755500 to 5900e00 Compare August 30, 2021 01:49

mroeschke reviewed Aug 30, 2021

View reviewed changes

simonjayhawkins linked an issue Aug 30, 2021 that may be closed by this pull request

BUG: crash in df.groupby.rolling.mean with forward window indexer #43267

Closed

3 tasks

dsm054 force-pushed the repair_fixedforwardwindowindexer branch from 5900e00 to 9fc7317 Compare August 30, 2021 12:32

mroeschke reviewed Aug 30, 2021

View reviewed changes

pandas/core/window/rolling.py Show resolved Hide resolved

simonjayhawkins added this to the 1.3.3 milestone Aug 30, 2021

simonjayhawkins added Groupby Segfault Non-Recoverable Error Window rolling, ewma, expanding labels Aug 30, 2021

jreback requested changes Sep 4, 2021

View reviewed changes

simonjayhawkins requested changes Sep 11, 2021

View reviewed changes

simonjayhawkins modified the milestones: 1.3.3, 1.3.4 Sep 11, 2021

dsm054 force-pushed the repair_fixedforwardwindowindexer branch from 53f4ac1 to 4b9a480 Compare September 13, 2021 01:40

BUG: Fixes to FixedForwardWindowIndexer and GroupbyIndexer (pandas-de…

928ba00

…v#43267)

jreback requested changes Sep 13, 2021

View reviewed changes

pandas/core/indexers/objects.py Show resolved Hide resolved

pandas/core/indexers/objects.py Show resolved Hide resolved

mroeschke reviewed Sep 22, 2021

View reviewed changes

pandas/core/window/rolling.py Show resolved Hide resolved

mroeschke reviewed Sep 22, 2021

View reviewed changes

Merge branch 'master' into repair_fixedforwardwindowindexer

25dc95b

mroeschke added 6 commits October 13, 2021 14:54

Undo whitespace changes

dd6ccb2

Fix type annotation of GroupbyIndexer

f609785

Ignore typing check for window_size since BaseIndexer is public

2efdfb9

Update objects.py

acd9605

Merge remote-tracking branch 'upstream/master' into repair_fixedforwa…

eb01640

…rdwindowindexer

Fix typing

27c6c33

mroeschke approved these changes Oct 15, 2021

View reviewed changes

jreback approved these changes Oct 16, 2021

View reviewed changes

simonjayhawkins reviewed Oct 16, 2021

View reviewed changes

simonjayhawkins mentioned this pull request Oct 16, 2021

RLS: 1.3.4 #43531

Closed

mroeschke added 2 commits October 16, 2021 11:34

Merge remote-tracking branch 'upstream/master' into repair_fixedforwa…

6ef4b90

…rdwindowindexer

Update per comments

04851f5

simonjayhawkins merged commit 776329f into pandas-dev:master Oct 16, 2021

This comment has been minimized.

Sign in to view

lumberbot-app bot added the Still Needs Manual Backport label Oct 16, 2021

simonjayhawkins pushed a commit to simonjayhawkins/pandas that referenced this pull request Oct 16, 2021

Backport PR pandas-dev#43291: BUG: Fixes to FixedForwardWindowIndexer…

11bc77d

… and GroupbyIndexer (pandas-dev#43267)

simonjayhawkins mentioned this pull request Oct 16, 2021

Backport PR #43291: BUG: Fixes to FixedForwardWindowIndexer and GroupbyIndexer (#43267) #44061

Merged

simonjayhawkins removed the Still Needs Manual Backport label Oct 16, 2021

simonjayhawkins added a commit that referenced this pull request Oct 16, 2021

Backport PR #43291: BUG: Fixes to FixedForwardWindowIndexer and Group…

a326408

…byIndexer (#43267) (#44061) Co-authored-by: DSM <[email protected]>

		@@ -93,7 +93,6 @@ def get_window_bounds(

		end = np.clip(end, 0, num_values)
		start = np.clip(start, 0, num_values)

	window_size: int \| type[BaseIndexer] = 0,
	window_size: int \| BaseIndexer = 0,

	- Bug in :meth:`pandas.DataFrame.groupby.rolling` and :class:`pandas.api.indexers.FixedForwardWindowIndexer` leading to segfaults and window endpoints being mixed across groups (:issue:`43267`)
	- Fixed bug in :meth:`pandas.DataFrame.groupby.rolling` and :class:`pandas.api.indexers.FixedForwardWindowIndexer` leading to segfaults and window endpoints being mixed across groups (:issue:`43267`)

Uh oh!

BUG: Fixes to FixedForwardWindowIndexer and GroupbyIndexer (#43267) #43291

BUG: Fixes to FixedForwardWindowIndexer and GroupbyIndexer (#43267) #43291

Uh oh!

Conversation

dsm054 commented Aug 30, 2021 • edited by mroeschke Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dsm054 commented Aug 30, 2021

Uh oh!

pep8speaks commented Aug 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2021-10-16 18:44:59 UTC

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mroeschke Aug 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dsm054 commented Aug 30, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jreback commented Sep 9, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dsm054 commented Sep 12, 2021

Uh oh!

dsm054 commented Sep 13, 2021

Uh oh!

Uh oh!

Uh oh!

dsm054 commented Sep 22, 2021

Uh oh!

Uh oh!

mroeschke Sep 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

dsm054 commented Aug 30, 2021 •

edited by mroeschke

Loading

pep8speaks commented Aug 30, 2021 •

edited

Loading

mroeschke Aug 30, 2021 •

edited

Loading

mroeschke Sep 22, 2021 •

edited

Loading