Skip to content

Diff low precision ints #45609

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

lukemanley
Copy link
Member

@lukemanley lukemanley commented Jan 25, 2022

Submitting this PR as a potential option. This changes the diff method to upcast int8/int16 to float64 similar to shift and other methods. It avoids the int8/int16 > float64 cython issue referenced in the code by doing an initial cast from int8/int16 > int32. Note the tests were updated to now expect float64 in a few cases.

If the existing behavior is preferred, I'm happy to close this. I thought the consistency and fewer special cases elsewhere might be worth the change here.

@lukemanley lukemanley mentioned this pull request Jan 25, 2022
4 tasks
@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Groupby labels Jan 28, 2022
@jreback
Copy link
Contributor

jreback commented Jan 28, 2022

@lukemanley can you add a whatsnew note in 1.5

cc @jbrockmendel @rhshadrach any objections?

@jbrockmendel
Copy link
Member

im ambivalent. being consistent with shift would be nice, but a) i dont like the extra copy this makes and b) users using small-itemsize dtypes probably want small-itemsize results

@rhshadrach
Copy link
Member

Everything being equal, I'd rather see shift support lower precision dtypes, but perhaps that isn't currently feasible? I haven't looked.

@lukemanley
Copy link
Member Author

im ambivalent. being consistent with shift would be nice, but a) i dont like the extra copy this makes and b) users using small-itemsize dtypes probably want small-itemsize results

Agreed, the extra copy is unfortunate.

Everything being equal, I'd rather see shift support lower precision dtypes, but perhaps that isn't currently feasible? I haven't looked.

I think there are a number of examples beyond shift. When upcasting, most methods seem to upcast low precision ints to float64 even if a lower precision like float32 could suffice. It seems diff stands alone in this aspect. Here are two other examples where low precision ints are upcast to float64:

>>> pd.Series([1, 2, 3], dtype='int8').reindex(range(4))
0    1.0
1    2.0
2    3.0
3    NaN
dtype: float64

>>> pd.Series([1, 2, 3], dtype='int8').rolling(1).max()
0    1.0
1    2.0
2    3.0
dtype: float64

@jreback jreback added this to the 1.5 milestone Jan 31, 2022
@jreback
Copy link
Contributor

jreback commented Jan 31, 2022

can you merge master here

@lukemanley
Copy link
Member Author

Merged main, errors look unrelated

@rhshadrach
Copy link
Member

I think there are a number of examples beyond shift. When upcasting, most methods seem to upcast low precision ints to float64 even if a lower precision like float32 could suffice. It seems diff stands alone in this aspect.

I think the consensus is that we'd like to see better support for lower precision ops. We aren't going to get there all at once, so there will be inconsistencies in support for quite some time. And we certainly aren't going to get there by taking ops that do support lower precision and removing that support for the purpose of consistency. For that reason, I'm -1 here.

@jreback
Copy link
Contributor

jreback commented Jan 31, 2022

I think there are a number of examples beyond shift. When upcasting, most methods seem to upcast low precision ints to float64 even if a lower precision like float32 could suffice. It seems diff stands alone in this aspect.

I think the consensus is that we'd like to see better support for lower precision ops. We aren't going to get there all at once, so there will be inconsistencies in support for quite some time. And we certainly aren't going to get there by taking ops that do support lower precision and removing that support for the purpose of consistency. For that reason, I'm -1 here.

so generally I agree with you. but this little inconsistency is troubling. wouldn't it be better to make a systematic change to support this rather than doing it piecemeal?

@rhshadrach
Copy link
Member

wouldn't it be better to make a systematic change to support this rather than doing it piecemeal?

If this were possible to do all in one go, I'm all for it. But I don't think that is likely - there are a ton of ops, including indexing, that need changed.

@lukemanley
Copy link
Member Author

I'm happy to close this one if the preference is to keep the current behavior.

If the current behavior is preferred, diff could treat uint8 and uint16 similarly and upcast to float32 rather than float64:

import numpy as np
import pandas as pd

df = pd.DataFrame({
    t: np.arange(3, dtype=t) 
    for t in ['int8', 'int16', 'uint8', 'uint16']
})

df.diff(1).dtypes
int8      float32
int16     float32
uint8     float64
uint16    float64
dtype: object

@jreback
Copy link
Contributor

jreback commented Feb 26, 2022

ok i guess let's close this and work on making more functions preserve dtypes.

@jreback jreback closed this Feb 26, 2022
@lukemanley lukemanley deleted the diff-low-precision-ints branch March 20, 2022 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Groupby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: inconsistent upcasting dtypes between diff and shift for int8/int16
4 participants