Performance regression in DataFrame reduction ops #38592

jorisvandenbossche · 2020-12-20T17:00:48Z

The stat_ops.FrameOps are generally showing a slowdown since a few days, see eg https://pandas.pydata.org/speed/pandas/#stat_ops.FrameOps.time_op?python=3.8&Cython=0.29.21&p-op='mean'&p-dtype='int'&p-axis=0&commits=8dbb593d-7043f8fa

The indicated range is 8dbb593...7043f8f, but don't directly see which commit might be the cause cc @jbrockmendel

The text was updated successfully, but these errors were encountered:

IngErnestoAlvarez · 2020-12-21T00:04:39Z

I think it's in this commit: 7043f8f.
Where astype function was deleted and replaced with astype_nansafe on "pandas/core/indexes/base.py::Index::astype" method.
Before, it only calls astype_nansafe if this condition was true is_integer_dtype(dtype) and not is_extension_array_dtype(dtype).
Now it calls it every time.
That can be the issue. "pandas/core/indexes/numeric.py::NumericIndex::astype" method did manage to know when to call astype_nansafe and when to call directly super().astype. Now that it's deleted, it always call astype_nansafe.

jorisvandenbossche · 2020-12-28T09:20:30Z

That commit has been reverted, but that didn't seem to have solved the performance regression.

@jbrockmendel can you look into this?

jbrockmendel · 2020-12-28T16:01:00Z

That commit has been reverted, but that didn't seem to have solved the performance regression.

does this mean we can un-revert? its a nice little cleanup

can you look into this?

cant make any promises at the moment, but ill keep a tab open for this so it doesnt fall off the radar screen entirely.

jorisvandenbossche · 2020-12-28T18:17:44Z

does this mean we can un-revert?

No, it was reverted because of a behaviour regression -> #38607

cant make any promises at the moment, but ill keep a tab open for this so it doesnt fall off the radar screen entirely.

If you can't look at it on the short term, I would propose to revert the change for now, until you have time (otherwise later it might become much more difficult to do that, if it turns out to be needed).
From quickly testing the different potential commits in the range, it seems #38507 is the cause of the slowdown.

jorisvandenbossche · 2020-12-28T18:42:26Z

Ah, I think the reason is that it is now doing a non-copy ravel, while before the copying ravel ensured we would have good layout for a performant reduction (while this is not the case when the DataFrame is created from a 2D numpy array).

jbrockmendel · 2020-12-28T19:08:06Z

Huh, and to think I made a non-copying ravel specifically with perf in mind. you learn something every day

simonjayhawkins · 2021-06-11T13:02:42Z

@jbrockmendel @jorisvandenbossche Any idea what the status is here?

jbrockmendel · 2021-06-11T17:09:35Z

Fixed by #41911/#41924

simonjayhawkins · 2021-06-11T18:34:45Z

Fixed by #41911/#41924

These are for the recent perf regression? This issue is quite old.

jbrockmendel · 2021-06-11T18:35:52Z

my mistake, i thought this was the one discussed on the call on wednesday. no idea off the top of my head what the status is here.

simonjayhawkins · 2021-06-11T18:39:46Z

yeah, no idea either. will leave the 1.3 milestone for now since this was a regression on master early on when we started 1.3. need further investigation

jorisvandenbossche · 2021-06-25T14:43:55Z

This is still causing a regression in the reductions (for integer dtypes) compared to 1.2.x.
Just re-ran some benchmarks, and it's giving a 2.5 to 3.5 slowdown for sum and mean.

jorisvandenbossche · 2021-06-25T14:48:17Z

Now, based on my comment above (#38592 (comment)), this might depend on the memory layout of the DataFrame, and thus on how the DataFrame was created.

But so we maybe should remove some of the extra code added in #38507 to ensure the memory layout is preserved in the ravel() call, since that doesn't seem to be helpful (at least based on this single benchmark ..)

jorisvandenbossche added Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version Reduction Operations sum, mean, min, max, etc. labels Dec 20, 2020

jorisvandenbossche added this to the 1.3 milestone Dec 20, 2020

simonjayhawkins mentioned this issue Dec 21, 2020

Revert "REF: use astype_nansafe in Index.astype" #38610

Merged

simonjayhawkins modified the milestones: 1.3, 1.3.1 Jun 30, 2021

simonjayhawkins mentioned this issue Jul 4, 2021

RLS: 1.3.1 #42343

Closed

This was referenced Jul 10, 2021

BUG: astype() on an integer DataFrame changes the order of data #42396

Closed

PERF/REGR: astype changing order of some 2d data #42475

Merged

jreback closed this as completed in #42475 Jul 13, 2021

ggold7046 mentioned this issue Aug 10, 2023

Modified doc/make.py to run sphinx-build -b linkcheck #54265

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression in DataFrame reduction ops #38592

Performance regression in DataFrame reduction ops #38592

jorisvandenbossche commented Dec 20, 2020

IngErnestoAlvarez commented Dec 21, 2020

jorisvandenbossche commented Dec 28, 2020

jbrockmendel commented Dec 28, 2020

jorisvandenbossche commented Dec 28, 2020

jorisvandenbossche commented Dec 28, 2020

jbrockmendel commented Dec 28, 2020

simonjayhawkins commented Jun 11, 2021

jbrockmendel commented Jun 11, 2021

simonjayhawkins commented Jun 11, 2021

jbrockmendel commented Jun 11, 2021

simonjayhawkins commented Jun 11, 2021

jorisvandenbossche commented Jun 25, 2021

jorisvandenbossche commented Jun 25, 2021

Performance regression in DataFrame reduction ops #38592

Performance regression in DataFrame reduction ops #38592

Comments

jorisvandenbossche commented Dec 20, 2020

IngErnestoAlvarez commented Dec 21, 2020

jorisvandenbossche commented Dec 28, 2020

jbrockmendel commented Dec 28, 2020

jorisvandenbossche commented Dec 28, 2020

jorisvandenbossche commented Dec 28, 2020

jbrockmendel commented Dec 28, 2020

simonjayhawkins commented Jun 11, 2021

jbrockmendel commented Jun 11, 2021

simonjayhawkins commented Jun 11, 2021

jbrockmendel commented Jun 11, 2021

simonjayhawkins commented Jun 11, 2021

jorisvandenbossche commented Jun 25, 2021

jorisvandenbossche commented Jun 25, 2021