REF: avoid internals in merge code #48082

jbrockmendel · 2022-08-15T01:53:35Z

After this, concatenate_block_managers is only used in one place.

mroeschke · 2022-08-17T00:55:10Z

jorisvandenbossche · 2022-08-21T07:36:01Z

I haven't yet investigated why, but this has broken a lot of tests in geopandas' CI against pandas main (so if we want to do an RC tomorrow, the easiest might be to revert this for now, since it's only a refactor?)

Will try to make a minimal reproducer

jorisvandenbossche · 2022-08-21T08:33:49Z

I can't reproduce this without geopandas, but the issue boils down to that merge now calls pd.concat (and thus _constructor / __finalize__) before adding suffices for duplicated column names.
So maybe this might be sufficient:

--- a/pandas/core/reshape/merge.py
+++ b/pandas/core/reshape/merge.py
@@ -763,9 +763,12 @@ class _MergeOperation:
         right.index = join_index
 
         from pandas import concat

+        left.columns = llabels
+        right.columns = rlabel
         result = concat([left, right], axis=1, copy=copy)
-        result.columns = llabels.append(rlabels)
         return result

One reproducer with geopandas is:

import geopandas
df1 = geopandas.GeoDataFrame({"key": [1, 2, 3]}, geometry=geopandas.points_from_xy([1, 2, 3], [1, 2, 3]))
df2 = geopandas.GeoDataFrame({"key": [2, 3, 4]}, geometry=geopandas.points_from_xy([1, 2, 3], [1, 2, 3]))

df1.merge(df2, on="key")

jbrockmendel · 2022-08-21T14:39:01Z

I have no objections to the suggested patch. Can you open an issue in geopandas to figure out how they are relying on our internals and ween off of it (xref #40226)

jorisvandenbossche · 2022-09-06T18:06:45Z

Can you open an issue in geopandas to figure out how they are relying on our internals and ween off of it (xref #40226)

To be clear, geopandas is not directly relying on the internals (we are not calling an internal method), but we are indirectly relying on the logic here, because we disallow dataframes with duplicate geometry columns (and thus the fact whether the suffixes are applied before or after the concat step matters).

Opened #48416 with the patch proposed above.

phofl · 2022-09-17T11:30:08Z

It looks like this caused a performance regression in our merge benchmarks, https://asv-runner.github.io/asv-collection/pandas/#join_merge.JoinEmpty.time_inner_join_right_empty

This is the most relevant commit in the timespan when the regression occurred

jbrockmendel · 2022-09-17T18:56:44Z

Bummer. Not ideal, but won't be the end of the world if we have to revert.

* REF: avoid internals in merge code * fix condition * use reindex_indexer

jbrockmendel added 3 commits August 14, 2022 18:52

REF: avoid internals in merge code

e75f870

fix condition

f2b219b

use reindex_indexer

a6d1847

mroeschke added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Internals Related to non-user accessible pandas implementation labels Aug 16, 2022

Merge branch 'main' into ref-concat-internals-ax0

6932fa9

mroeschke approved these changes Aug 17, 2022

View reviewed changes

mroeschke added this to the 1.5 milestone Aug 17, 2022

mroeschke merged commit 5d115bb into pandas-dev:main Aug 17, 2022

jbrockmendel deleted the ref-concat-internals-ax0 branch August 17, 2022 00:59

jorisvandenbossche mentioned this pull request Aug 23, 2022

REGR: preserve reindexed array object (instead of creating new array) for concat with all-NA array #47762

Merged

m-richards mentioned this pull request Sep 4, 2022

BUG: overlay union raises ValueError with pandas 1.5.0rc0 geopandas/geopandas#2544

Closed

3 tasks

zaneselvans mentioned this pull request Sep 4, 2022

Make PUDL compatible with Pandas 1.5 catalyst-cooperative/pudl#1901

Closed

9 tasks

jorisvandenbossche mentioned this pull request Sep 6, 2022

REF: ensure to apply suffixes before concat step in merge code #48416

Merged

noatamir pushed a commit to noatamir/pandas that referenced this pull request Nov 9, 2022

REF: avoid internals in merge code (pandas-dev#48082)

96a0266

* REF: avoid internals in merge code * fix condition * use reindex_indexer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REF: avoid internals in merge code #48082

REF: avoid internals in merge code #48082

jbrockmendel commented Aug 15, 2022

mroeschke commented Aug 17, 2022

jorisvandenbossche commented Aug 21, 2022

jorisvandenbossche commented Aug 21, 2022

jbrockmendel commented Aug 21, 2022

jorisvandenbossche commented Sep 6, 2022

phofl commented Sep 17, 2022

jbrockmendel commented Sep 17, 2022

REF: avoid internals in merge code #48082

REF: avoid internals in merge code #48082

Conversation

jbrockmendel commented Aug 15, 2022

mroeschke commented Aug 17, 2022

jorisvandenbossche commented Aug 21, 2022

jorisvandenbossche commented Aug 21, 2022

jbrockmendel commented Aug 21, 2022

jorisvandenbossche commented Sep 6, 2022

phofl commented Sep 17, 2022

jbrockmendel commented Sep 17, 2022