Skip to content

PERF/REF: concat #52672

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 15, 2023
Merged

PERF/REF: concat #52672

merged 2 commits into from
Apr 15, 2023

Conversation

jbrockmendel
Copy link
Member

Motivated by #50652, which is a near-worst-case for our current concat implementation. This gets about a 10-15% bump on the case I've been profiling (num_rows=100, num_cols=50_000, num_dfs=7). I'm exploring other approaches that I expect to get much bigger gains from.

The big upside here is the refactor. I find the current combine_concat_plans really hard to grok. Getting a single "plan" upfront rather than constructing multiple plans is much simpler.

@mroeschke mroeschke added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Performance Memory or execution speed performance labels Apr 15, 2023
@mroeschke mroeschke added this to the 2.1 milestone Apr 15, 2023
@mroeschke mroeschke merged commit b661313 into pandas-dev:main Apr 15, 2023
@mroeschke
Copy link
Member

Thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the perf-join_unit branch April 15, 2023 01:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants