Skip to content

PERF: use c-division in nancorr #47518

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 3, 2022
Merged

Conversation

MarcoGorelli
Copy link
Member

  • closes #xxxx (Replace xxxx with the Github issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

There's three divisions in this function that can safely be done at the C-level:

    meanx += 1 / nobs * dx
    meany += 1 / nobs * dy

because nobs is at least 1 when we get here (it starts at 0, nobs+=1 is called a few lines
above, and nobs can't decrease), and

    result[xi, yi] = result[yi, xi] = covxy / divisor

because divisor is checked to not be equal to 0 in the preceding line.

This makes a noticeable difference, example:

arr1 = np.random.randn(100, 100)

On main:

%%timeit
nancorr(arr1)

5.39 ms ± 21.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

On this branch:

%%timeit
nancorr(arr1)

3.86 ms ± 58.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

I didn't find much difference in nancorr_spearman, I think because that one calls
the Python function rank_1d within the loop, so that might be a bigger bottleneck there

@MarcoGorelli MarcoGorelli added the Performance Memory or execution speed performance label Jun 27, 2022
@MarcoGorelli
Copy link
Member Author

MarcoGorelli commented Jun 27, 2022

nvm, looks like this changes results (failing tests) - not sure I see why, will look into it, but closing for now

EDIT ok might have fixed this now, just a matter of marking the numerator in the division as float

running the same timing test as above now gives:

main:

5.68 ms ± 123 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

here:

4.97 ms ± 60.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

@MarcoGorelli MarcoGorelli reopened this Jun 27, 2022
@jreback jreback added this to the 1.5 milestone Jul 3, 2022
@jreback jreback merged commit 1924be3 into pandas-dev:main Jul 3, 2022
@jreback
Copy link
Contributor

jreback commented Jul 3, 2022

thanks @MarcoGorelli

yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants