Skip to content

BUG: inconsistent upcasting dtypes between diff and shift for int8/int16 #45562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
lukemanley opened this issue Jan 23, 2022 · 3 comments
Open
3 tasks done
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions

Comments

@lukemanley
Copy link
Member

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

pd.Series([1, 2, 3], dtype="int8").agg(["diff", "shift"]).dtypes

Issue Description

It appears diff and shift have inconsistent upcasting of types when the input is either int8 or int16:

diff     float32
shift    float64
dtype: object

Its possible there is good reason for this, but I couldn't find it documented or explained anywhere.

Expected Behavior

More consistency in upcasting between diff and shift.

Installed Versions

INSTALLED VERSIONS

commit : a221491
python : 3.8.12.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.17134
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252

pandas : 1.5.0.dev0+133.ga221491f99.dirty
numpy : 1.21.5
pytz : 2021.3
dateutil : 2.8.2
pip : 21.3.1
setuptools : 60.5.0
Cython : 0.29.26
pytest : 6.2.5
hypothesis : 6.35.0
sphinx : 4.3.2
blosc : None
feather : None
xlsxwriter : 3.0.2
lxml.etree : 4.7.1
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.3
IPython : 8.0.0
pandas_datareader: None
bs4 : 4.10.0
bottleneck : 1.3.2
fastparquet : 0.7.2
fsspec : 2021.11.0
gcsfs : 2021.11.0
matplotlib : 3.5.1
numba : 0.55.0
numexpr : 2.8.0
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : 6.0.1
pyreadstat : 1.1.4
pyxlsb : None
s3fs : 2021.11.0
scipy : 1.7.3
sqlalchemy : 1.4.29
tables : 3.6.1
tabulate : 0.8.9
xarray : 0.18.2
xlrd : 2.0.1
xlwt : 1.3.0
zstandard : None

@lukemanley lukemanley added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 23, 2022
@rhshadrach rhshadrach added the Dtype Conversions Unexpected or buggy dtype conversions label Jan 23, 2022
@lukemanley
Copy link
Member Author

It looks like the diff upcasting to f32 is done here:

elif is_integer_dtype(dtype):
# We have to cast in order to be able to hold np.nan
# int8, int16 are incompatible with float64,
# see https://github.com/cython/cython/issues/2646
if arr.dtype.name in ["int8", "int16"]:
dtype = np.float32
else:
dtype = np.float64

and the docstring for that function actually mentions diff and shift:

def diff(arr, n: int, axis: int = 0):
"""
difference of n between self,
analogous to s-s.shift(n)

@jorisvandenbossche
Copy link
Member

Sidenote: "diff" and "shift" are not aggregations, and so calling them in agg is a bit strange (IMO we should actually disallow this?)

In [80]: pd.Series([1, 2, 3], dtype="int8")
Out[80]: 
0    1
1    2
2    3
dtype: int8

In [81]: pd.Series([1, 2, 3], dtype="int8").agg(["diff", "shift"])
Out[81]: 
   diff  shift
0   NaN    NaN
1   1.0    1.0
2   1.0    2.0

@rhshadrach
Copy link
Member

@jorisvandenbossche

(IMO we should actually disallow this?)

#35725 - trouble is that apply/agg code paths are intertwined in various places. This is one of the main motivations behind #41112.

@jreback jreback added this to the 1.5 milestone Jan 31, 2022
@jreback jreback removed the Needs Triage Issue that has not been reviewed by a pandas team member label Jan 31, 2022
@mroeschke mroeschke removed this from the 1.5 milestone Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants