BUG: Merge with str/StringDtype keys and multiindex #43785

benoit9126 · 2021-09-28T17:50:58Z

closes BUG: Merge with EA and MultiIndex #43734
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

jreback · 2021-09-29T23:21:01Z

pandas/tests/reshape/merge/test_merge.py

@@ -1885,6 +1885,17 @@ def test_dtype_on_merged_different(self, change, join_type, left, right):
        )
        tm.assert_series_equal(result, expected)

+        # GH 43734 Avoid the use of `assign` with multiindex


pls make a new test

I created a new test and I parametrized the Python string data type (as you proposed) and the join type (as it was in the test I originally inserted this fragment of code)

jreback · 2021-09-29T23:22:57Z

pandas/tests/reshape/merge/test_merge.py

@@ -1885,6 +1885,17 @@ def test_dtype_on_merged_different(self, change, join_type, left, right):
        )
        tm.assert_series_equal(result, expected)

+        # GH 43734 Avoid the use of `assign` with multiindex
+        right.columns = MultiIndex.from_tuples([("lvl0", x) for x in right.columns])
+        left.columns = MultiIndex.from_tuples([("lvl0", x) for x in left.columns])


this doesn't appear to actually be testing the OP, e.g. that's where the dtype is str vs StringDtype(). can you parameterize and show the dataframe construction.

phofl · 2021-10-03T20:57:23Z

doc/source/whatsnew/v1.4.0.rst

@@ -502,6 +502,7 @@ Reshaping
 - Bug in :func:`concat` of ``bool`` and ``boolean`` dtypes resulting in ``object`` dtype instead of ``boolean`` dtype (:issue:`42800`)
 - Bug in :func:`crosstab` when inputs are are categorical Series, there are categories that are not present in one or both of the Series, and ``margins=True``. Previously the margin value for missing categories was ``NaN``. It is now correctly reported as 0 (:issue:`43505`)
 - Bug in :func:`concat` would fail when the ``objs`` argument all had the same index and the ``keys`` argument contained duplicates (:issue:`43595`)
+- Fixed bug in :meth:`merge` with multi-index as column index for the ``on`` argument returning an error when assigning a column internally (:issue:`43734`)


We use func for merge

also :class:MultiIndex

pandas/core/reshape/merge.py

pandas/tests/reshape/merge/test_merge.py

phofl · 2021-10-04T21:13:03Z

pandas/tests/reshape/merge/test_merge.py

+    assert (df2.dtypes == np.dtype("O")).all()
+
+    # Check the expected types for the merged data frame
+    result = merged.dtypes.sort_index()


Don't use sort, just define expected correctly

I picked this way of doing in the class TestMergeCategorical (in the same file). Done.

phofl · 2021-10-04T21:13:14Z

pandas/tests/reshape/merge/test_merge.py

+    merged = merge(left=df1, right=df2, on=[("lvl0", "lvl1-a")], how=join_type)
+
+    # No change in df1 and df2 types
+    assert (df1.dtypes == pd.StringDtype()).all()


Check whole DataFrames

jreback · 2021-10-11T22:28:39Z

pandas/tests/reshape/merge/test_merge.py

@@ -2598,3 +2598,38 @@ def test_merge_outer_with_NaN(dtype):
        dtype=dtype,
    )
    tm.assert_frame_equal(result, expected)
+
+
+@pytest.mark.parametrize("string_dtype", tm.STRING_DTYPES)


string_dtype is already a fixture so kill this line (and will still work).

Sorry, I did not see it. I will simplify the code.

pandas/tests/reshape/merge/test_merge.py

jreback · 2021-10-16T15:37:55Z

thanks @benoit9126 very nice

benoit9126 changed the title ~~ENH: Merge with str/StringDtype keys and multiindex~~ BUG: Merge with str/StringDtype keys and multiindex Sep 28, 2021

jreback requested changes Sep 29, 2021

View reviewed changes

jreback added Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Sep 29, 2021

phofl reviewed Oct 3, 2021

View reviewed changes

pandas/core/reshape/merge.py Show resolved Hide resolved

phofl reviewed Oct 3, 2021

View reviewed changes

pandas/tests/reshape/merge/test_merge.py Outdated Show resolved Hide resolved

phofl reviewed Oct 4, 2021

View reviewed changes

benoit9126 requested review from jreback and phofl October 11, 2021 12:02

jreback added this to the 1.4 milestone Oct 11, 2021

jreback requested changes Oct 11, 2021

View reviewed changes

BUG: Merge with str/StringDtype keys and multiindex (#43734)

620c8a4

benoit9126 requested a review from jreback October 15, 2021 17:37

jreback approved these changes Oct 16, 2021

View reviewed changes

jreback merged commit 6c35a62 into pandas-dev:master Oct 16, 2021

benoit9126 deleted the merge_ea_multiindex branch October 18, 2021 11:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Merge with str/StringDtype keys and multiindex #43785

BUG: Merge with str/StringDtype keys and multiindex #43785

benoit9126 commented Sep 28, 2021

jreback Sep 29, 2021

benoit9126 Sep 30, 2021

jreback Sep 29, 2021

phofl Oct 3, 2021

phofl Oct 3, 2021

phofl Oct 4, 2021

benoit9126 Oct 5, 2021

phofl Oct 4, 2021

benoit9126 Oct 5, 2021

jreback Oct 11, 2021

benoit9126 Oct 13, 2021

jreback commented Oct 16, 2021

BUG: Merge with str/StringDtype keys and multiindex #43785

BUG: Merge with str/StringDtype keys and multiindex #43785

Conversation

benoit9126 commented Sep 28, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Oct 16, 2021