Skip to content

PERF: use non-copying path for Groupby.skew #52104

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Mar 31, 2023

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Mar 21, 2023

It wouldn't surprise me if there are contiguity-related lookup-optimizations that could be done in group_skew.

import pandas as pd
import numpy as np

nrows = 10**5
ngroups = 5
ncols = 4
nan_frac = 0.05

arr = np.random.randn(nrows, ncols)
arr[np.random.random(nrows) < nan_frac] = np.nan

df = pd.DataFrame(arr)
grps = np.random.randint(0, ngroups, size=nrows)
gb = df.groupby(grps)

%timeit result = gb.skew()
7.75 ms ± 25.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <-main
1.78 ms ± 115 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)  # <- PR

[same with ngroups=50]
%timeit result = gb.skew()
23.9 ms ± 101 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)  # <- main
1.81 ms ± 51 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)  # <- PR

@mroeschke mroeschke added Groupby Reduction Operations sum, mean, min, max, etc. labels Mar 21, 2023
@jbrockmendel jbrockmendel added the Performance Memory or execution speed performance label Mar 21, 2023
@jbrockmendel
Copy link
Member Author

@WillAyd any guesses why npdev wouldn't like

out[i, j] = <float64_t>(
                        (ct * (ct - 1) ** 0.5 / (ct - 2)) * (M3[i, j] / M2[i, j]**1.5)
                    )

The error in the build log

pandas/_libs/groupby.c:37389:15: error: aggregate value used where a floating-point was expected
37389 |               *((__pyx_t_5numpy_float64_t *) ( /* dim=1 */ ((char *) (((__pyx_t_5numpy_float64_t *) ( /* dim=0 */ (__pyx_v_out.data + __pyx_t_18 * __pyx_v_out.strides[0]) )) + __pyx_t_19)) )) = ((__pyx_t_5numpy_float64_t)__Pyx_c_prod_npy_float64(__Pyx_c_quot_npy_float64(__Pyx_c_prod_npy_float64(__pyx_t_npy_float64_complex_from_parts(__pyx_v_ct, 0), __pyx_t_npy_float64_complex_from_parts(__Pyx_CREAL(__pyx_t_20), __Pyx_CIMAG(__pyx_t_20))), __pyx_t_npy_float64_complex_from_parts((__pyx_v_ct - 2.0), 0)), __Pyx_c_quot_npy_float64(__pyx_t_npy_float64_complex_from_parts((*((__pyx_t_5numpy_float64_t *) ( /* dim=1 */ ((char *) (((__pyx_t_5numpy_float64_t *) ( /* dim=0 */ (__pyx_v_M3.data + __pyx_t_13 * __pyx_v_M3.strides[0]) )) + __pyx_t_17)) ))), 0), __pyx_t_npy_float64_complex_from_parts(__Pyx_CREAL(__pyx_t_21), __Pyx_CIMAG(__pyx_t_21)))));

I'd guess the **1.5 is returning something with a complex type and the complaint is about casting it back to float, but no idea why that would only be an issue on npdev

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glancing at your error I think you are right. Looks like there are real and imaginary parts to the __pyx_t_20 and __pyx_t_21 variables; you can try looking back in the function at what they are to see.

The cast is a separate thing from the error; things are value at an intermediate expression before the cast would even take place. I would get rid of the cast altogether though; casts don't do anything magical in C. They are simply an indicator that "we know better" than the compiler, which is rarely true

@WillAyd
Copy link
Member

WillAyd commented Mar 23, 2023

It looks like Cython 3.0 has a new cpow directive that I think is responsible for this behavior. Might be ok to set that to True

@jbrockmendel
Copy link
Member Author

IIUC the cpow thing has to wait until we use cython3. anything else here?

elif M2[i, j] == 0:
out[i, j] = 0
else:
m2_15 = <float64_t>(M2[i, j] ** 1.5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with the rules for casting a complex to a float value, but seems potentially dangerous to do this. Is there no explicit way to get the real part from the complex and make sure the imaginary is safely discardable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no idea, and google is no help. cc @seberg thoughts?

Copy link
Contributor

@seberg seberg Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it, all of this is already typed as float64, no? So this uses pow which should already be float64 only. Isn't the cast unnecessary anyway?

Otherwise, yes .real should exist, or .real() if its C++ (which I don't think it is). The cast would be safe, but a bit odd. (not 100%, but then I am not sure it shouldn't be a compile error)...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the cast we got build failures on the npdev build

pandas/_libs/groupby.c:37389:15: error: aggregate value used where a floating-point was expected
37389 |               *((__pyx_t_5numpy_float64_t *) ( /* dim=1 */ ((char *) (((__pyx_t_5numpy_float64_t *) ( /* dim=0 */ (__pyx_v_out.data + __pyx_t_18 * __pyx_v_out.strides[0]) )) + __pyx_t_19)) )) = ((__pyx_t_5numpy_float64_t)__Pyx_c_prod_npy_float64(__Pyx_c_quot_npy_float64(__Pyx_c_prod_npy_float64(__pyx_t_npy_float64_complex_from_parts(__pyx_v_ct, 0), __pyx_t_npy_float64_complex_from_parts(__Pyx_CREAL(__pyx_t_20), __Pyx_CIMAG(__pyx_t_20))), __pyx_t_npy_float64_complex_from_parts((__pyx_v_ct - 2.0), 0)), __Pyx_c_quot_npy_float64(__pyx_t_npy_float64_complex_from_parts((*((__pyx_t_5numpy_float64_t *) ( /* dim=1 */ ((char *) (((__pyx_t_5numpy_float64_t *) ( /* dim=0 */ (__pyx_v_M3.data + __pyx_t_13 * __pyx_v_M3.strides[0]) )) + __pyx_t_17)) ))), 0), __pyx_t_npy_float64_complex_from_parts(__Pyx_CREAL(__pyx_t_21), __Pyx_CIMAG(__pyx_t_21)))));
      |               ^

Copy link
Contributor

@seberg seberg Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I had checked this line, but the problem is only in the line below :).

This must have to do with cython 3 would be my guess? It uses the Python power definition which converts to complex, but you probably want a float power anyway.

Maybe try: <float64_t>0.5, maybe that makes it use C power (removing the need to cast). Otherwise, maybe we can tell cython3 to use C power?

EDIT: Nvm which line... it must be the Cython version only likely...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like @cython.cpow is available in cython 0.29.x, let's see if using that solves this..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like that did it!

import pandas._testing as tm


def test_groupby_skew_equivalence():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we do this for the other functions? I like the idea; seems like it would be good to consolidate this with others in a follow up

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

groupby tests are not super well-organized, would be nice to improve someday

@mroeschke mroeschke added this to the 2.1 milestone Mar 31, 2023
@mroeschke mroeschke merged commit 233bd83 into pandas-dev:main Mar 31, 2023
@mroeschke
Copy link
Member

Thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the perf-skew branch March 31, 2023 18:33
topper-123 pushed a commit to topper-123/pandas that referenced this pull request Apr 1, 2023
* PERF: use non-copying path for Groupby.skew

* DFGB

* update tests

* troubleshoot 32 bit builds

* 32bit build

* troubleshoot npdev build

* troubleshoot npdev build

* troubleshoot

* troubleshoot victory

* troubleshoot

* group_skew in groupby.pyi

* cython.cpow
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Performance Memory or execution speed performance Reduction Operations sum, mean, min, max, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants