Skip to content

PERF: groupby iteration #51109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

jbrockmendel
Copy link
Member

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Lookups on ._constructor and finalize calls add up when iterating over many groups. This does a pre-iteration check to see if we can skip those lookups/calls.

Using the example from #46505

import string
import random

import pandas as pd
import numpy as np
from time import time

rows = 500000
temp = np.random.randint(10**6, size=(rows, 3))
symbols = string.ascii_uppercase + string.digits
string_col = [''.join(random.choices(symbols, k=16)) for _ in range(rows)]
res = np.concatenate((temp, np.array([string_col]).T), axis=1)

df = pd.DataFrame(res)

def _format(x):
    vals = x.values
    if len(vals) > 2:
        return '-'.join(map(str, vals[:-1]))
    return np.nan


gb = df.groupby(3, sort=False)
func = {"col1": (2, _format), "col2": (1, _format)}
%prun -s cumtime gb.agg(**func)

On main this takes 31.655s of which 21.520 are in _chop and 8.357 are in __finalize__. In this PR this takes 20.328s of which 11.237 are in _chop_fast. (still slower than 1.3.5 which was 6.553s

There's a reasonable argument to be made that having separate paths for subclasses is not ideal, which I'll leave to @jorisvandenbossche to make if he's interested.

@mroeschke mroeschke added Groupby Performance Memory or execution speed performance labels Feb 1, 2023
@WillAyd
Copy link
Member

WillAyd commented Feb 4, 2023

Without knowing the context of the regression I'm -.05 to doing this - I think it gets a little wonky over time to have overridden constructors like this scattered through the codebase

@jorisvandenbossche
Copy link
Member

Technically, this certainly looks fine (in theory, we might want to also checks flags in addition to attrs, i.e. that it has the default values). It's indeed mostly a trade-off about whether this performance improvement is worth the added complexity (personally I don't think this is that complex).

But for the timings, can you do those again with actual %timeit instead of %prun. Running with the profiler adds a generic overhead, and so can magnify differences. If #51784 turns out to be realistic, it might also be useful to see what this PR still gives in improvement on top of that.

@jbrockmendel
Copy link
Member Author

can you do those again with actual %timeit instead of %prun

In [2]: %timeit gb.agg(**func)
17.2 s ± 969 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  # <- main
9.93 s ± 268 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  # <- PR

we might want to also checks flags

Good point, will update.

Also can improve the Series case a little bit by pinning _name instead of passing name=ser.name, analogous to #51784.

@jorisvandenbossche
Copy link
Member

Also can improve the Series case a little bit by pinning _name instead of passing name=ser.name, analogous to #51784.

I don't think that's needed, the fastpath=True should already give the same effect IIRC.

@jbrockmendel
Copy link
Member Author

#51784 improved the performance on main on the order of 25%. Updated timings:

In [3]: %timeit gb.agg(**func)
13.5 s ± 472 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  # <- main
10.3 s ± 334 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  # <- PR

Makes this much less compelling.

@jbrockmendel
Copy link
Member Author

No traction here, closing.

@jbrockmendel jbrockmendel deleted the perf-dont-need-finalize branch April 11, 2023 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants