Skip to content

REF: Simplify Index.union #41773

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 2, 2021
Merged

REF: Simplify Index.union #41773

merged 4 commits into from
Jun 2, 2021

Conversation

topper-123
Copy link
Contributor

@topper-123 topper-123 commented Jun 2, 2021

This is the Index.union part of #41153. This helps simplify that PR.

The special casing in Index.union is currently active if both are numeric. After #41153 it should only be special cased if one dtype is uint64 and other a signed int. So after this and #41153:

  • int8 & uint32 -> int64
  • [u]int64 & float64 -> float64
  • int64 & uint64 -> object
  • int8 & uint64 -> object

The first and second case is handled correctly by find_common_type, but the others aren't currently.

This PR changes no functionality itself, but prepares for the changes in #41153, where we want e.g. NumericIndex[int8] .union(NumericIndex[uint32]) to give NumericIndex[int64] and not Index[object].

"""
Find a common data type among the given dtypes.

Parameters
----------
types : list of dtypes
strict_uint64 : if True, object dtype is returned if uint64 and signed int present.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we not always do this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW ive determined that this is the fix for #41775, though the place for the fix is in Index._find_common_type_compat

@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Index Related to the Index class or subclasses labels Jun 2, 2021
# Now it's:
# * float | [u]int -> float
# * uint64 | signed int -> object
# We may change union(float | [u]int) to go to object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree this logic doesn't belong here, but suggest putting it in Index._find_common_type_compat instead of higher up, to keep the Index casting logic in one place

@topper-123
Copy link
Contributor Author

I’ve updated to find_common_type_compat.

@jreback jreback added this to the 1.3 milestone Jun 2, 2021
@jreback
Copy link
Contributor

jreback commented Jun 2, 2021

looks fine can you rebase

if self.dtype == "uint64" or target_dtype == "uint64":
if is_signed_integer_dtype(self.dtype) or is_signed_integer_dtype(
target_dtype
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in principle we could check signed_one.min() and if its non-negative could avoid the object-dtype cast. not for this PR.

@jreback
Copy link
Contributor

jreback commented Jun 2, 2021

@topper-123 ping when ready here (then @jbrockmendel can rebase other)

@topper-123
Copy link
Contributor Author

I’ve rebased.

@jreback jreback merged commit dc4fddc into pandas-dev:master Jun 2, 2021
@jreback
Copy link
Contributor

jreback commented Jun 2, 2021

thanks @topper-123

@topper-123 topper-123 deleted the cln_union branch June 2, 2021 22:01
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Index Related to the Index class or subclasses
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants