Skip to content

PERF: contiguity, less gil in join algos #42057

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 17, 2021

Conversation

mzeitlin11
Copy link
Member

Broken off a branch working towards #13745

This doesn't have noticeable user-facing impact on its own since this is a smaller part of the merge operation. Some timings:

import numpy as np
import pandas._libs.join as libjoin
np.random.seed(0)
arr1 = np.random.randint(0, 100, 100000)
arr2 = np.random.randint(0, 100, 100000)

Master:

In [2]: %timeit libjoin.inner_join(arr1, arr2, 100)
1.19 s ± 17.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [3]: %timeit libjoin.left_outer_join(arr1, arr2, 100)
1.22 s ± 16.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [4]: %timeit libjoin.full_outer_join(arr1, arr2, 100)
1.26 s ± 33.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

This pr:

In [2]: %timeit libjoin.inner_join(arr1, arr2, 100)
729 ms ± 17.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [3]: %timeit libjoin.left_outer_join(arr1, arr2, 100)
714 ms ± 11 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [4]: %timeit libjoin.full_outer_join(arr1, arr2, 100)
715 ms ± 6.99 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@mzeitlin11 mzeitlin11 added Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jun 16, 2021
@jreback
Copy link
Contributor

jreback commented Jun 16, 2021

wow!

Copy link
Member

@jbrockmendel jbrockmendel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jreback jreback added this to the 1.3 milestone Jun 17, 2021
@jreback
Copy link
Contributor

jreback commented Jun 17, 2021

migth as well backport this

@jreback
Copy link
Contributor

jreback commented Jun 17, 2021

@meeseeksdev backport 1.3.x

@lumberbot-app
Copy link

lumberbot-app bot commented Jun 17, 2021

Something went wrong ... Please have a look at my logs.

jreback pushed a commit that referenced this pull request Jun 17, 2021
@mzeitlin11 mzeitlin11 deleted the get_result_indexer branch June 17, 2021 17:07
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants