-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REF: NumericIndex.union determine casting dtype using concat logic #37258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This logic lives in For the float/int combo, that already does the same as the "special case" mentioned above. Not sure if there would actually be differences in behaviour right now when switching to use this logic. |
Looks like we have 7 tests in tests.indexes.test_setops that fail after this change. The tests expect object dtype but instead get float64. I'll need to look more carefully to check that these make sense. If perf impact weren't an issue, the id want to do min/max checks to ensure that the casting being done is non-lossy. i guess the advantage of the current logic is that it is always non-lossy. |
Can you give an example of such a failing test where it was before object dtype and now float? For which dtypes does this happen? (Doing a union of int index with float index already gives float index on master) |
The affected tests are almost all in tests.indexes.test_setops. A simple example:
|
Ah, so the int/uint combo ((u)int/float combo already gives float). We discussed this before, specifically for indices, see eg #26778 (comment) and comments below. So the current behaviour of So if we decide to keep this difference, then we will need to special case this in the Index set ops code, and it won't be able to fully use the |
IIRC from #26778, my preference is for setops between our label containers
to not lose precision. So for
```
In [5]: pd.Index([2 ** 63 - 1]) | pd.Index([1.0])
Out[5]: Float64Index([1.0, 9.223372036854776e+18], dtype='float64')
```
I would prefer object dtype. Likewise for Int64Index([]) | UInt64Index([]).
…On Tue, Nov 3, 2020 at 8:25 AM Joris Van den Bossche < ***@***.***> wrote:
Ah, so the int/uint combo ((u)int/float combo already gives float).
We discussed this before, specifically for indices, see eg #26778
(comment)
<#26778 (comment)>
and comments below. So the current behaviour of int + uint -> object *for
index* is intentionally different from the behaviour for similar dtypes
with Series.
So if we decide to keep this difference, then we will need to special case
this in the Index set ops code, and it won't be able to *fully* use the
find_common_type logic.
cc @TomAugspurger <https://github.com/TomAugspurger>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#37258 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKAOIVA6KO5IKP6LWP5NTTSOAHERANCNFSM4SWOG3FQ>
.
|
Thats reasonable. Any reason why this this preference would only apply to setops or only for Index? |
These casting values do not seem correct. import pandas as pd
import numpy as np
a = pd.DataFrame([[1],[2]],index=pd.NumericIndex(np.arange(2),dtype="i4"))
b = pd.DataFrame([[3],[4]],index=pd.NumericIndex(np.arange(2,4),dtype="i4"))
pd.concat([a,b],axis=0).index
|
ATM NumericIndex._union has special casting rules for what dtype it casts to when unioning with another NumericIndex of a different dtype.
Instead, we should re-use code (not sure where this lives, cc @jorisvandenbossche ?) from concat (maybe even Block.putmask?) to determing the appropriate upcast dtype.
The text was updated successfully, but these errors were encountered: