-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Merge with str/StringDtype keys and multiindex #43785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Merge with str/StringDtype keys and multiindex #43785
Conversation
benoit9126
commented
Sep 28, 2021
- closes BUG: Merge with EA and MultiIndex #43734
- tests added / passed
- Ensure all linting tests pass, see here for how to run them
- whatsnew entry
@@ -1885,6 +1885,17 @@ def test_dtype_on_merged_different(self, change, join_type, left, right): | |||
) | |||
tm.assert_series_equal(result, expected) | |||
|
|||
# GH 43734 Avoid the use of `assign` with multiindex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls make a new test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created a new test and I parametrized the Python string data type (as you proposed) and the join type (as it was in the test I originally inserted this fragment of code)
@@ -1885,6 +1885,17 @@ def test_dtype_on_merged_different(self, change, join_type, left, right): | |||
) | |||
tm.assert_series_equal(result, expected) | |||
|
|||
# GH 43734 Avoid the use of `assign` with multiindex | |||
right.columns = MultiIndex.from_tuples([("lvl0", x) for x in right.columns]) | |||
left.columns = MultiIndex.from_tuples([("lvl0", x) for x in left.columns]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't appear to actually be testing the OP, e.g. that's where the dtype is str vs StringDtype(). can you parameterize and show the dataframe construction.
doc/source/whatsnew/v1.4.0.rst
Outdated
@@ -502,6 +502,7 @@ Reshaping | |||
- Bug in :func:`concat` of ``bool`` and ``boolean`` dtypes resulting in ``object`` dtype instead of ``boolean`` dtype (:issue:`42800`) | |||
- Bug in :func:`crosstab` when inputs are are categorical Series, there are categories that are not present in one or both of the Series, and ``margins=True``. Previously the margin value for missing categories was ``NaN``. It is now correctly reported as 0 (:issue:`43505`) | |||
- Bug in :func:`concat` would fail when the ``objs`` argument all had the same index and the ``keys`` argument contained duplicates (:issue:`43595`) | |||
- Fixed bug in :meth:`merge` with multi-index as column index for the ``on`` argument returning an error when assigning a column internally (:issue:`43734`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use func for merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also :class:MultiIndex
assert (df2.dtypes == np.dtype("O")).all() | ||
|
||
# Check the expected types for the merged data frame | ||
result = merged.dtypes.sort_index() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use sort, just define expected correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I picked this way of doing in the class TestMergeCategorical
(in the same file). Done.
merged = merge(left=df1, right=df2, on=[("lvl0", "lvl1-a")], how=join_type) | ||
|
||
# No change in df1 and df2 types | ||
assert (df1.dtypes == pd.StringDtype()).all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check whole DataFrames
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -2598,3 +2598,38 @@ def test_merge_outer_with_NaN(dtype): | |||
dtype=dtype, | |||
) | |||
tm.assert_frame_equal(result, expected) | |||
|
|||
|
|||
@pytest.mark.parametrize("string_dtype", tm.STRING_DTYPES) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string_dtype is already a fixture so kill this line (and will still work).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I did not see it. I will simplify the code.
thanks @benoit9126 very nice |