Skip to content

BUG: Merge with str/StringDtype keys and multiindex #43785

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 16, 2021
Merged

BUG: Merge with str/StringDtype keys and multiindex #43785

merged 1 commit into from
Oct 16, 2021

Conversation

benoit9126
Copy link
Contributor

@benoit9126 benoit9126 changed the title ENH: Merge with str/StringDtype keys and multiindex BUG: Merge with str/StringDtype keys and multiindex Sep 28, 2021
@@ -1885,6 +1885,17 @@ def test_dtype_on_merged_different(self, change, join_type, left, right):
)
tm.assert_series_equal(result, expected)

# GH 43734 Avoid the use of `assign` with multiindex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls make a new test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a new test and I parametrized the Python string data type (as you proposed) and the join type (as it was in the test I originally inserted this fragment of code)

@@ -1885,6 +1885,17 @@ def test_dtype_on_merged_different(self, change, join_type, left, right):
)
tm.assert_series_equal(result, expected)

# GH 43734 Avoid the use of `assign` with multiindex
right.columns = MultiIndex.from_tuples([("lvl0", x) for x in right.columns])
left.columns = MultiIndex.from_tuples([("lvl0", x) for x in left.columns])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't appear to actually be testing the OP, e.g. that's where the dtype is str vs StringDtype(). can you parameterize and show the dataframe construction.

@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Sep 29, 2021
@@ -502,6 +502,7 @@ Reshaping
- Bug in :func:`concat` of ``bool`` and ``boolean`` dtypes resulting in ``object`` dtype instead of ``boolean`` dtype (:issue:`42800`)
- Bug in :func:`crosstab` when inputs are are categorical Series, there are categories that are not present in one or both of the Series, and ``margins=True``. Previously the margin value for missing categories was ``NaN``. It is now correctly reported as 0 (:issue:`43505`)
- Bug in :func:`concat` would fail when the ``objs`` argument all had the same index and the ``keys`` argument contained duplicates (:issue:`43595`)
- Fixed bug in :meth:`merge` with multi-index as column index for the ``on`` argument returning an error when assigning a column internally (:issue:`43734`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use func for merge

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also :class:MultiIndex

assert (df2.dtypes == np.dtype("O")).all()

# Check the expected types for the merged data frame
result = merged.dtypes.sort_index()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use sort, just define expected correctly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I picked this way of doing in the class TestMergeCategorical (in the same file). Done.

merged = merge(left=df1, right=df2, on=[("lvl0", "lvl1-a")], how=join_type)

# No change in df1 and df2 types
assert (df1.dtypes == pd.StringDtype()).all()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check whole DataFrames

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@benoit9126 benoit9126 requested review from jreback and phofl October 11, 2021 12:02
@jreback jreback added this to the 1.4 milestone Oct 11, 2021
@@ -2598,3 +2598,38 @@ def test_merge_outer_with_NaN(dtype):
dtype=dtype,
)
tm.assert_frame_equal(result, expected)


@pytest.mark.parametrize("string_dtype", tm.STRING_DTYPES)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string_dtype is already a fixture so kill this line (and will still work).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I did not see it. I will simplify the code.

@benoit9126 benoit9126 requested a review from jreback October 15, 2021 17:37
@jreback jreback merged commit 6c35a62 into pandas-dev:master Oct 16, 2021
@jreback
Copy link
Contributor

jreback commented Oct 16, 2021

thanks @benoit9126 very nice

@benoit9126 benoit9126 deleted the merge_ea_multiindex branch October 18, 2021 11:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Merge with EA and MultiIndex
3 participants