Skip to content

PERF: Regression in groupby ops from adding skipna #60870

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rhshadrach opened this issue Feb 6, 2025 · 2 comments · Fixed by #60871
Closed

PERF: Regression in groupby ops from adding skipna #60870

rhshadrach opened this issue Feb 6, 2025 · 2 comments · Fixed by #60871
Labels
Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@rhshadrach
Copy link
Member

pandas-dev/asv-runner#42

Due to #60752 - cc @snitish

@rhshadrach rhshadrach added Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version labels Feb 6, 2025
@rhshadrach rhshadrach added this to the 3.0 milestone Feb 6, 2025
@rhshadrach
Copy link
Member Author

rhshadrach commented Feb 6, 2025

Some performance degradation is unavoidable, but some of these seem larger than I had expected. Might be worth a look into. E.g.

https://rhshadrach.github.io/asv-runner/#groupby.GroupByCythonAgg.time_frame_agg?p-dtype=%27float64%27&p-method=%27max%27

@snitish
Copy link
Member

snitish commented Feb 6, 2025

Thanks for flagging @rhshadrach. I identified the cause of the performance degradation.

https://github.com/snitish/pandas/blob/0fc49df08fb81233750a3007bc8b5b2cd5b5e675/pandas/_libs/groupby.pyx#L1910-L1916

The isna_result = ... lines can be moved into the if not skipna condition as this flag is only used when skipna is False. That seems to improve the run time quite a bit. Will create a PR with the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants