-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REF: avoid internals in merge code #48082
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REF: avoid internals in merge code #48082
Conversation
Thanks @jbrockmendel |
I haven't yet investigated why, but this has broken a lot of tests in geopandas' CI against pandas main (so if we want to do an RC tomorrow, the easiest might be to revert this for now, since it's only a refactor?) Will try to make a minimal reproducer |
I can't reproduce this without geopandas, but the issue boils down to that --- a/pandas/core/reshape/merge.py
+++ b/pandas/core/reshape/merge.py
@@ -763,9 +763,12 @@ class _MergeOperation:
right.index = join_index
from pandas import concat
+ left.columns = llabels
+ right.columns = rlabel
result = concat([left, right], axis=1, copy=copy)
- result.columns = llabels.append(rlabels)
return result
One reproducer with geopandas is:
|
I have no objections to the suggested patch. Can you open an issue in geopandas to figure out how they are relying on our internals and ween off of it (xref #40226) |
To be clear, geopandas is not directly relying on the internals (we are not calling an internal method), but we are indirectly relying on the logic here, because we disallow dataframes with duplicate geometry columns (and thus the fact whether the suffixes are applied before or after the concat step matters). Opened #48416 with the patch proposed above. |
It looks like this caused a performance regression in our merge benchmarks, https://asv-runner.github.io/asv-collection/pandas/#join_merge.JoinEmpty.time_inner_join_right_empty This is the most relevant commit in the timespan when the regression occurred |
Bummer. Not ideal, but won't be the end of the world if we have to revert. |
* REF: avoid internals in merge code * fix condition * use reindex_indexer
After this, concatenate_block_managers is only used in one place.