WIP: ENH: Add numba engine to groupby apply #35445

mroeschke · 2020-07-29T04:42:05Z

closes ENH: Add engine keyword argument to groupby.apply to leverage Numba #31845
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Some notes:

The passed function must be a reduction op
The numba engine will drop the grouping column by default
Can only operate on numeric data and will return float64

Preliminary default timing:

(pandas-dev) matthewroeschke:pandas-mroeschke matthewroeschke$ ipython
Python 3.8.3 | packaged by conda-forge | (default, Jun  1 2020, 17:21:09)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.16.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: df_g = pd.DataFrame({'a': range(10**4), 'b': range(10**4), 'c': range(10**4)})

In [2]: def f(x):
   ...:     return np.sum(x) + 1
   ...:

In [3]: df_g.groupby('a').apply(f)
Out[3]:
          a      b      c
a
0         1      1      1
1         2      2      2
2         3      3      3
3         4      4      4
4         5      5      5
...     ...    ...    ...
9995   9996   9996   9996
9996   9997   9997   9997
9997   9998   9998   9998
9998   9999   9999   9999
9999  10000  10000  10000

[10000 rows x 3 columns]

In [4]: %timeit df_g.groupby('a').apply(f)
3.07 s ± 57.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [5]: df_g.groupby('a').apply(f, engine='numba', engine_kwargs={'parallel': True})
Out[5]:
            0        1
0         1.0      1.0
1         2.0      2.0
2         3.0      3.0
3         4.0      4.0
4         5.0      5.0
...       ...      ...
9995   9996.0   9996.0
9996   9997.0   9997.0
9997   9998.0   9998.0
9998   9999.0   9999.0
9999  10000.0  10000.0

[10000 rows x 2 columns]

In [6]: %timeit df_g.groupby('a').apply(f, engine='numba', engine_kwargs={'parallel': True})

510 ms ± 3.46 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

pep8speaks · 2020-07-29T04:42:09Z

Hello @mroeschke! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-08-05 18:13:49 UTC

Matt Roeschke added 2 commits July 28, 2020 21:35

Add numba engine for groupby apply

c66dcb1

Merge remote-tracking branch 'upstream/master' into numba_groupby_apply

4bc2b8e

Matt Roeschke added 6 commits August 1, 2020 19:35

Merge remote-tracking branch 'upstream/master' into numba_groupby_apply

06c9ac9

Black and wrap result

0963cf0

Merge remote-tracking branch 'upstream/master' into numba_groupby_apply

f6e17c8

Merge remote-tracking branch 'upstream/master' into numba_groupby_apply

7ed80e4

get *args to work

376ee8f

Merge remote-tracking branch 'upstream/master' into numba_groupby_apply

85fe9a0

mroeschke closed this Aug 16, 2020

mroeschke deleted the numba_groupby_apply branch November 12, 2020 05:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: ENH: Add numba engine to groupby apply #35445

WIP: ENH: Add numba engine to groupby apply #35445

mroeschke commented Jul 29, 2020 •

edited

Loading

pep8speaks commented Jul 29, 2020 •

edited

Loading

WIP: ENH: Add numba engine to groupby apply #35445

WIP: ENH: Add numba engine to groupby apply #35445

Conversation

mroeschke commented Jul 29, 2020 • edited Loading

pep8speaks commented Jul 29, 2020 • edited Loading

Comment last updated at 2020-08-05 18:13:49 UTC

mroeschke commented Jul 29, 2020 •

edited

Loading

pep8speaks commented Jul 29, 2020 •

edited

Loading