Skip to content

BUG: Outer/right merge with EA dtypes cast to object #43152

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Sep 3, 2021

Conversation

debnathshoham
Copy link
Member

@jreback jreback added Bug ExtensionArray Extending pandas with custom dtypes or arrays. Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Aug 21, 2021
@jreback jreback added this to the 1.3.3 milestone Aug 21, 2021
@jreback
Copy link
Contributor

jreback commented Aug 21, 2021

https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=64682&view=logs&j=bf526160-cc38-5046-7b49-5141c64a483a&t=6a92c4fe-2b0b-5acd-cfba-e7901042d9bd is failing. maybe restrict the test to only EA types e.g. any_float_dtype and any_int_ea_dtype

@debnathshoham
Copy link
Member Author

debnathshoham commented Aug 21, 2021

Hi @jreback - The failure is somehow in TestMerge.test_handle_join_key_pass_array. And not the ones I added. Surprisingly, its not failing in my local

@mzeitlin11
Copy link
Member

Hi @jreback - The failure is somehow in TestMerge.test_handle_join_key_pass_array. And not the ones I added. Surprisingly, its not failing in my local

Since these are windows and 32-bit failures, would guess it has to do with these platforms defaulting to i32 instead of i64

@@ -17,7 +17,7 @@ Fixed regressions
- Fixed regression in :class:`DataFrame` constructor failing to broadcast for defined :class:`Index` and len one list of :class:`Timestamp` (:issue:`42810`)
- Performance regression in :meth:`core.window.ewm.ExponentialMovingWindow.mean` (:issue:`42333`)
- Fixed regression in :meth:`.GroupBy.agg` incorrectly raising in some cases (:issue:`42390`)
-
- Fixed regression in :meth:`merge` where columns with ``ExtensionDtype`` was cast to ``object`` in ``left`` and ``outer`` merge (:issue:`40073`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe specify that the cast only occurs when those columns were being merged on?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC what happened here was that left and outer were always broken, so fixing those could be moved down into the bug fix section of the whatsnew.

Then the regression portion could just focus on right

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad, this bug was pertaining to right(regression) and outer (had mentioned in the PR subject), also discussed in the bug comments.
I will keep right in fixed regression, and move outer down to bug fix.

@debnathshoham
Copy link
Member Author

Since these are windows and 32-bit failures, would guess it has to do with these platforms defaulting to i32 instead of i64

If these are defaulting to int32 instead of int64, shouldn't the assertion pass? (the way both defaults to int64 in my local).

@@ -364,6 +369,9 @@ def test_merge_join_key_dtype_cast(self):
df = merge(df1, df2, left_on=lkey, right_on=rkey, how="outer")
assert df["key_0"].dtype == "int64"

@pytest.mark.xfail(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? what is this for

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not able to figure why this test is failing in windows and 32bit in azure pipelines here.

Was just wondering if xfailing is an option..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note generally, is the comparison failing? construct the initial dataframe with int64 (rather than int) should help.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that worked. Thanks!

Let me know if this looks fine..

@jreback
Copy link
Contributor

jreback commented Sep 1, 2021

@mzeitlin11 if you can give a quick relook

Copy link
Member

@mzeitlin11 mzeitlin11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one optional whatsnew nit

@jreback jreback merged commit 3e3ffef into pandas-dev:master Sep 3, 2021
@jreback
Copy link
Contributor

jreback commented Sep 3, 2021

thanks @debnathshoham

@jreback
Copy link
Contributor

jreback commented Sep 3, 2021

@meeseeksdev backport 1.3.x

@lumberbot-app
Copy link

lumberbot-app bot commented Sep 3, 2021

Something went wrong ... Please have a look at my logs.

@debnathshoham debnathshoham deleted the gh40073 branch September 4, 2021 02:34
jreback pushed a commit that referenced this pull request Sep 4, 2021
feefladder pushed a commit to feefladder/pandas that referenced this pull request Sep 7, 2021
debnathshoham added a commit to debnathshoham/pandas that referenced this pull request Sep 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug ExtensionArray Extending pandas with custom dtypes or arrays. Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Outer merge casts EA dtype to object
3 participants