Skip to content

PERF: Use Indexers to implement groupby rolling #34052

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 40 commits into from
May 25, 2020

Conversation

mroeschke
Copy link
Member

@mroeschke mroeschke commented May 7, 2020

  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff

Currently, grouby.rolling is implemented essentially as groupby.apply(lambda x: x.rolling()) which can be potentially slow.

This PR implements groupby.rolling by calculating bounds with a GroupbyRollingIndxer and using the rolling aggregations in cython to compute the results.

@mroeschke
Copy link
Member Author

Here are preliminary benchmarks. The performance so far is fairly similar. I suspect that the fact that I have to reconstruct the resulting index is killing the performance

With this benchmark:

In [1]: n = 1000
   ...: df = pd.DataFrame({"A": [str(i) for i in range(n)] * 10, "B": list(range(n)) * 10})
   ...: g = df.groupby("A").rolling(window=2)
   ...: %timeit g.sum()
1.5 s ± 5.77 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) <-- PR
1.52 s ± 49 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) <-- master

@jreback
Copy link
Contributor

jreback commented May 13, 2020

try with a longer window as well
also profile this should be way faster

@mroeschke
Copy link
Member Author

I was accidentally still dispatching to the old implementation in this PR, here are the performance results with that removed

-- PR
In [2]: %timeit g.sum()
61.2 ms ± 4.61 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [3]: %timeit g.sum()
71.9 ms ± 10.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [4]: %timeit g.sum()
57.9 ms ± 714 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


-- master
In [2]: %timeit g.sum()
1.42 s ± 9.08 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [3]: %timeit g.sum()
1.44 s ± 26.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [4]: %timeit g.sum()
1.48 s ± 73.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@jreback jreback added Performance Memory or execution speed performance Window rolling, ewma, expanding labels May 13, 2020
@jreback jreback added this to the 1.1 milestone May 13, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you enhance the benchamarks for this (if we don't have?)

also if you can add a whatsnew note.

@pep8speaks
Copy link

pep8speaks commented May 14, 2020

Hello @mroeschke! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-05-21 04:47:41 UTC

@mroeschke
Copy link
Member Author

@jreback ready for another look

df = pd.DataFrame(
{"A": [str(i) for i in range(N)] * 10, "B": list(range(N)) * 10}
)
self.groupby_roll = df.groupby("A").rolling(window=2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a timebased one as well

@@ -611,7 +611,7 @@ Performance improvements
and :meth:`~pandas.core.groupby.groupby.Groupby.last` (:issue:`34178`)
- Performance improvement in :func:`factorize` for nullable (integer and boolean) dtypes (:issue:`33064`).
- Performance improvement in reductions (sum, prod, min, max) for nullable (integer and boolean) dtypes (:issue:`30982`, :issue:`33261`, :issue:`33442`).

- Performance improvement in ``groupby(..).rolling(..)`` (:issue:`34052`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a way of hitting the api here?

np.concatenate(list(self._groupby.grouper.indices.values()))
)

# filter out the on from the object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe better to call super()._create_blocks(obj) (e.g. add obj as an optional arg that defaults to self._selected_obj)

@mroeschke
Copy link
Member Author

Final benchmarks:

# rolling over 1000 groups
In [1]: In [1]: n = 1000
   ...:    ...: df = pd.DataFrame({"A": [str(i) for i in range(n)] * 10, "B": list(range(n)) * 10})
   ...:    ...: g = df.groupby("A").rolling(window=2)

-- master
In [2]: %timeit g.sum()
1.38 s ± 10.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [3]: %timeit g.sum()
1.4 s ± 13.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [4]: %timeit g.sum()
1.37 s ± 6.29 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

--- PR
In [2]: %timeit g.sum()
63.1 ms ± 396 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [3]: %timeit g.sum()
70.1 ms ± 8.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [4]: %timeit g.sum()
63.7 ms ± 471 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, some doc comment requests. ping on green.

center: Optional[bool] = None,
closed: Optional[str] = None,
) -> Tuple[np.ndarray, np.ndarray]:
start_arrays = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some comments here on what you are doing

@@ -147,12 +148,10 @@ def _validate_get_window_bounds_signature(window: BaseIndexer) -> None:
f"get_window_bounds"
)

def _create_blocks(self):
def _create_blocks(self, obj):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you can type: Union[Series, DataFrame] (I think we have an annoation for that).

groupby_keys = [grouping.name for grouping in self._groupby.grouper._groupings]
result_index_names = groupby_keys + grouped_index_name

result_index_data = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you can add some commments here on what you are doing

@property
def _constructor(self):
return Rolling

def _gotitem(self, key, ndim, subset=None):
def _create_blocks(self, obj):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you can type

@mroeschke
Copy link
Member Author

@jreback Ping all green

@jreback jreback merged commit bad52a9 into pandas-dev:master May 25, 2020
@jreback
Copy link
Contributor

jreback commented May 25, 2020

thanks @mroeschke very nice!

@ddofer
Copy link

ddofer commented Aug 4, 2020

Does this affect also "Expanding"?

@mroeschke
Copy link
Member Author

No this currently doesn't apply to expanding. PR's to make it apply to expanding welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Window rolling, ewma, expanding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants