REGR: Index.union preserve numerical index for range/float and int/float combinations #26778

jorisvandenbossche · 2019-06-11T08:39:16Z

Recent change in master:

In [11]: index = pd.RangeIndex(0, 10)

In [12]: index.union(pd.Index([0.5, 1.5]))
Out[12]: Index([0.0, 0.5, 1.0, 1.5, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0], dtype='object')

the dtype should be float.

cc @topper-123

The text was updated successfully, but these errors were encountered:

topper-123 · 2019-06-11T09:58:58Z

Yeah, I'll look into it tonight

jreback · 2019-06-11T10:07:17Z

this follows our general rule which is that dissimilar cast to object

are you proposing a rule that we cast to float if float and numbering ?

jorisvandenbossche · 2019-06-11T10:12:51Z

this follows our general rule which is that dissimilar cast to object

That is not the "general" rule, in case of numerical indices. Eg (released pandas):

In [1]: pd.Index([1, 2, 3]).union(pd.Index([.5, 1.5]))
Out[1]: Float64Index([0.5, 1.0, 1.5, 2.0, 3.0], dtype='float64')

Or at least, was not the general rule, as I see that the above example also changed on master. So @topper-123 it is not specific to RangeIndex (I was thinking it might have been related to some of the changes to it you have been doing recently).

are you proposing a rule that we cast to float if float and numbering ?

For me, that seems very logical that int + float = float (for set ops), and this is also what we have been doing up to recently.

jorisvandenbossche · 2019-06-11T10:14:35Z

Possibly related to #23538

topper-123 · 2019-06-11T10:22:45Z

If this has been done by me, it has not been on purpose. I agrre It is more logical and more efficient that unions of numerical indices return a numerical index rather than an object index

jreback · 2019-06-11T10:40:40Z

http://pandas-docs.github.io/pandas-docs-travis/whatsnew/v0.25.0.html#incompatible-index-type-unions

not averse to coercing to float if we have a float for one
we already coerce between range / int indices

jorisvandenbossche · 2019-06-11T11:09:43Z

Yes, that's the PR I was pointing at.
Updating the issue to reflect the discussion.

topper-123 · 2019-06-11T15:45:57Z

I've bisected this and the cause of the change shown by @jorisvandenbossche is indeed #23538 (commit 20d0ad1).

I also agree this is a blocker a s this should not return a object dtype, as that will likely cause slowness and cause memory-usage to increase.

jorisvandenbossche · 2019-06-14T06:49:02Z

cc ArtinSarraf would you be interested to look into this?

TomAugspurger · 2019-06-24T14:46:11Z

Int64Index.union(Float64Index) -> float64 introduces a slight disagreement with how we decided to handle union(int, uint) in that PR. I think we have a choice between

dtype stability of the output
full precision
speed

We can pick two.

With union(int64, uint64) -> object we chose 1 and 2, sacrificing speed.

With union(int64, float64) -> float64 we're choosing 1 and 3, sacrificing precision of integers that cannot be represented as floating point values.

Is that what we want? With index labels, perhaps full precision is more important?

jorisvandenbossche · 2019-06-24T16:00:15Z

Personally, I think the easiest logic to explain would be to use the "normal" dtype promotion rules that we also use in eg Index/Series.append or concat.

A user can always cast to object to retain full information.

I understand the argument that for index the exact value is more important. But shouldn't that then be the same for Index.append as well? And where to draw the line? Also object for int + float?

TomAugspurger · 2019-06-24T16:43:58Z

Yeah, it's fuzzy, especially when we consider backwards compatibility. My preference long-term would for append, union, etc. to preserve labels.

You wouldn't normally expect an append to change other values in a container:

In [34]: a = pd.UInt64Index([2 ** 64 - 1])

In [35]: a.append(pd.Index([0.1]))
Out[35]: Float64Index([1.8446744073709552e+19, 0.1], dtype='float64')

But, for now we care about backwards compatibility. To match 0.24.x, we should have

union(int, float) -> float
union(unit, int) -> object
- In 0.24.x, this was uint64 if possible to safely cast, object otherwise. Just doing object for now.

A user can always cast to object to retain full information.

I'm worried that's an unsafe default, though if we've gotten this far, maybe it's not such a big deal.

TomAugspurger · 2019-06-24T17:38:52Z

OK here's my proposal.

left	right	output of left.union(right)
int	float	float64
int	uint	object
float	uint	float64

AFAICT the changes from 0.24.x are that

Int64Index | UInt64Index -> Index[object] always, rather than if there's overflow
(left | right).dtype == (right | left).dtype (the original PR)

I think that's fine for 0.25. But I would be OK with revisiting things to make them consistent across dtypes.

This restores the 0.24.x behavior of Index.union(other) between Float and (U)Int indexes. These are now floating dtype. left | right | output of left.union(right) ----- | ----- | ------ int |float | float64 int |uint | object float | uint | float64 pandas-dev#26778 (comment) Closes pandas-dev#26778

jorisvandenbossche added this to the 0.25.0 milestone Jun 11, 2019

jorisvandenbossche changed the title ~~REGR: RangeIndex.union converts to object dtype instead of float~~ REGR: Index.union preserve numerical index for range/float int/float combinations Jun 11, 2019

jorisvandenbossche changed the title ~~REGR: Index.union preserve numerical index for range/float int/float combinations~~ REGR: Index.union preserve numerical index for range/float and int/float combinations Jun 11, 2019

jorisvandenbossche added Blocker Blocking issue or pull request for an upcoming release Index Related to the Index class or subclasses labels Jun 11, 2019

jorisvandenbossche mentioned this issue Jun 14, 2019

DOC: remove temporary okwarning/okexcept #26843

Closed

3 tasks

TomAugspurger mentioned this issue Jun 25, 2019

API/REGR: Convert to float for index union #27034

Merged

jreback closed this as completed in #27034 Jun 27, 2019

zogzog mentioned this issue Aug 20, 2019

REGR: Index.union should pick correct dtype for combinations of tstamps with different tzones #28034

Closed

jorisvandenbossche mentioned this issue Nov 3, 2020

REF: NumericIndex.union determine casting dtype using concat logic #37258

Open

yuri-stepanov mentioned this issue Jan 12, 2022

REF: de-duplicate Index._find_common_type_compat and dtypes.cast.find_result_type #45149

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REGR: Index.union preserve numerical index for range/float and int/float combinations #26778

REGR: Index.union preserve numerical index for range/float and int/float combinations #26778

jorisvandenbossche commented Jun 11, 2019

topper-123 commented Jun 11, 2019

jreback commented Jun 11, 2019

jorisvandenbossche commented Jun 11, 2019 •

edited

Loading

jorisvandenbossche commented Jun 11, 2019 •

edited

Loading

topper-123 commented Jun 11, 2019

jreback commented Jun 11, 2019

jorisvandenbossche commented Jun 11, 2019

topper-123 commented Jun 11, 2019

jorisvandenbossche commented Jun 14, 2019

TomAugspurger commented Jun 24, 2019

jorisvandenbossche commented Jun 24, 2019

TomAugspurger commented Jun 24, 2019

TomAugspurger commented Jun 24, 2019

REGR: Index.union preserve numerical index for range/float and int/float combinations #26778

REGR: Index.union preserve numerical index for range/float and int/float combinations #26778

Comments

jorisvandenbossche commented Jun 11, 2019

topper-123 commented Jun 11, 2019

jreback commented Jun 11, 2019

jorisvandenbossche commented Jun 11, 2019 • edited Loading

jorisvandenbossche commented Jun 11, 2019 • edited Loading

topper-123 commented Jun 11, 2019

jreback commented Jun 11, 2019

jorisvandenbossche commented Jun 11, 2019

topper-123 commented Jun 11, 2019

jorisvandenbossche commented Jun 14, 2019

TomAugspurger commented Jun 24, 2019

jorisvandenbossche commented Jun 24, 2019

TomAugspurger commented Jun 24, 2019

TomAugspurger commented Jun 24, 2019

jorisvandenbossche commented Jun 11, 2019 •

edited

Loading

jorisvandenbossche commented Jun 11, 2019 •

edited

Loading