Skip to content

PERF: join empty frame #46015

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 22, 2022
Merged

Conversation

lukemanley
Copy link
Member

@lukemanley lukemanley commented Feb 16, 2022

Similar to #45838, but for DataFrame.join.

DataFrame.join already had a fast path for joining with an empty frame but it only covered a limited set of cases. This PR makes the fast path a bit faster and covers additional cases.

import pandas as pd 
import numpy as np

N = 10_000_000

df = pd.DataFrame({'A': np.arange(N)})
df_empty = pd.DataFrame(columns=['B', 'C'], dtype='int64')

%timeit df.join(df_empty, how='inner')
932 ms ± 83.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)       <- main
285 µs ± 4.29 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)  <- PR

@lukemanley lukemanley added Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Feb 16, 2022
@jreback jreback added this to the 1.5 milestone Feb 16, 2022
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have sufficient asv's to cover this?

@@ -4544,12 +4544,22 @@ def join(

if len(other) == 0 and how in ("left", "outer"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make a if len(other) and if len(self) clause then sub if's here for the cases

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@lukemanley
Copy link
Member Author

do we have sufficient asv's to cover this?

Added an asv to cover this.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm ping on green

@lukemanley
Copy link
Member Author

@jreback - greenish, errors look unrelated

@mroeschke
Copy link
Member

Could you merge in main one more time? (failures look unrelated but good to be sure)

@lukemanley
Copy link
Member Author

@mroeschke - merged main. greenish again, let me know if you think the error is related

@mroeschke mroeschke merged commit aafa7a9 into pandas-dev:main Feb 22, 2022
@mroeschke
Copy link
Member

Thanks @lukemanley. The failure was unrelated

@lukemanley lukemanley deleted the join-empty-fastpath branch March 2, 2022 01:13
yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
* faster joins when left and/or right is empty

* whatsnew

* cleanup

* add asv for joining with empty frame

* asv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants