-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REGR: Index.union preserve numerical index for range/float and int/float combinations #26778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yeah, I'll look into it tonight |
this follows our general rule which is that dissimilar cast to object are you proposing a rule that we cast to float if float and numbering ? |
That is not the "general" rule, in case of numerical indices. Eg (released pandas):
Or at least, was not the general rule, as I see that the above example also changed on master. So @topper-123 it is not specific to RangeIndex (I was thinking it might have been related to some of the changes to it you have been doing recently).
For me, that seems very logical that int + float = float (for set ops), and this is also what we have been doing up to recently. |
Possibly related to #23538 |
If this has been done by me, it has not been on purpose. I agrre It is more logical and more efficient that unions of numerical indices return a numerical index rather than an object index |
http://pandas-docs.github.io/pandas-docs-travis/whatsnew/v0.25.0.html#incompatible-index-type-unions not averse to coercing to float if we have a float for one |
Yes, that's the PR I was pointing at. |
I've bisected this and the cause of the change shown by @jorisvandenbossche is indeed #23538 (commit 20d0ad1). I also agree this is a blocker a s this should not return a object dtype, as that will likely cause slowness and cause memory-usage to increase. |
cc ArtinSarraf would you be interested to look into this? |
Int64Index.union(Float64Index) -> float64 introduces a slight disagreement with how we decided to handle union(int, uint) in that PR. I think we have a choice between
We can pick two. With With Is that what we want? With index labels, perhaps full precision is more important? |
Personally, I think the easiest logic to explain would be to use the "normal" dtype promotion rules that we also use in eg Index/Series.append or concat. A user can always cast to object to retain full information. I understand the argument that for index the exact value is more important. But shouldn't that then be the same for |
Yeah, it's fuzzy, especially when we consider backwards compatibility. My preference long-term would for append, union, etc. to preserve labels. You wouldn't normally expect an In [34]: a = pd.UInt64Index([2 ** 64 - 1])
In [35]: a.append(pd.Index([0.1]))
Out[35]: Float64Index([1.8446744073709552e+19, 0.1], dtype='float64') But, for now we care about backwards compatibility. To match 0.24.x, we should have
I'm worried that's an unsafe default, though if we've gotten this far, maybe it's not such a big deal. |
OK here's my proposal.
AFAICT the changes from 0.24.x are that
I think that's fine for 0.25. But I would be OK with revisiting things to make them consistent across dtypes. |
This restores the 0.24.x behavior of Index.union(other) between Float and (U)Int indexes. These are now floating dtype. left | right | output of left.union(right) ----- | ----- | ------ int |float | float64 int |uint | object float | uint | float64 pandas-dev#26778 (comment) Closes pandas-dev#26778
Recent change in master:
the dtype should be float.
cc @topper-123
The text was updated successfully, but these errors were encountered: