Performance regression in frame_methods.Apply.time_apply_ref_by_name #35047

TomAugspurger · 2020-06-29T01:39:29Z

https://pandas.pydata.org/speed/pandas/index.html#frame_methods.Apply.time_apply_ref_by_name?commits=91802a9ae400830f9eaadd395f6a9b40cdd92ee5

91802a9 (cc @jbrockmendel)

Also affects other benchmarks that probably use apply: https://pandas.pydata.org/speed/pandas/index.html#frame_methods.Nunique.time_frame_nunique?commits=91802a9ae400830f9eaadd395f6a9b40cdd92ee5.

The text was updated successfully, but these errors were encountered:

jbrockmendel · 2020-06-29T02:31:43Z

I measured 91802a9 as perf-neutral locally. If that doesnt hold elsewhere, might as well revert.

TomAugspurger · 2020-07-02T20:54:42Z

I don't know if it's an artifact of asv / line-profiler, but it looks like a lot of time is spent in option_context.__enter__ and option_context.__exit__ (~ 40% of apply_series_generator together):

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   266                                               def apply_series_generator(self, partial_result=None) -> Tuple[ResType, "Index"]:
...
   296      1000      24406.0     24.4     34.5                  with option_context("mode.chained_assignment", None):

Do you have time to explore that? Either optimizing option_context, or making an internal version that skips expensive checks, or (if the contextmanager machinery itself is too expensive) a private setitem that skips chained_assignment checks?

Set the option once, rather than in the loop. Closes pandas-dev#35047

TomAugspurger added Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version Blocker Blocking issue or pull request for an upcoming release labels Jun 29, 2020

TomAugspurger added this to the 1.1 milestone Jun 29, 2020

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Jul 7, 2020

Fixed Series.apply performance regression

fcc44d5

Set the option once, rather than in the loop. Closes pandas-dev#35047

TomAugspurger mentioned this issue Jul 7, 2020

Fixed Series.apply performance regression #35166

Merged

jreback closed this as completed in #35166 Jul 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression in frame_methods.Apply.time_apply_ref_by_name #35047

Performance regression in frame_methods.Apply.time_apply_ref_by_name #35047

TomAugspurger commented Jun 29, 2020 •

edited

Loading

jbrockmendel commented Jun 29, 2020

TomAugspurger commented Jul 2, 2020 •

edited

Loading

Performance regression in frame_methods.Apply.time_apply_ref_by_name #35047

Performance regression in frame_methods.Apply.time_apply_ref_by_name #35047

Comments

TomAugspurger commented Jun 29, 2020 • edited Loading

jbrockmendel commented Jun 29, 2020

TomAugspurger commented Jul 2, 2020 • edited Loading

TomAugspurger commented Jun 29, 2020 •

edited

Loading

TomAugspurger commented Jul 2, 2020 •

edited

Loading