Skip to content

Performance regression in frame_methods.Apply.time_apply_ref_by_name #35047

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomAugspurger opened this issue Jun 29, 2020 · 2 comments · Fixed by #35166
Closed

Performance regression in frame_methods.Apply.time_apply_ref_by_name #35047

TomAugspurger opened this issue Jun 29, 2020 · 2 comments · Fixed by #35166
Labels
Blocker Blocking issue or pull request for an upcoming release Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@TomAugspurger TomAugspurger added Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version Blocker Blocking issue or pull request for an upcoming release labels Jun 29, 2020
@TomAugspurger TomAugspurger added this to the 1.1 milestone Jun 29, 2020
@jbrockmendel
Copy link
Member

I measured 91802a9 as perf-neutral locally. If that doesnt hold elsewhere, might as well revert.

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Jul 2, 2020

I don't know if it's an artifact of asv / line-profiler, but it looks like a lot of time is spent in option_context.__enter__ and option_context.__exit__ (~ 40% of apply_series_generator together):

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   266                                               def apply_series_generator(self, partial_result=None) -> Tuple[ResType, "Index"]:
...
   296      1000      24406.0     24.4     34.5                  with option_context("mode.chained_assignment", None):

Do you have time to explore that? Either optimizing option_context, or making an internal version that skips expensive checks, or (if the contextmanager machinery itself is too expensive) a private setitem that skips chained_assignment checks?

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Jul 7, 2020
Set the option once, rather than in the loop.

Closes pandas-dev#35047
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocker Blocking issue or pull request for an upcoming release Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants