Skip to content

BUG: Setting the index changes dtype #34304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
dmirecki opened this issue May 22, 2020 · 2 comments
Closed
2 of 3 tasks

BUG: Setting the index changes dtype #34304

dmirecki opened this issue May 22, 2020 · 2 comments
Labels
Dtype Conversions Unexpected or buggy dtype conversions Duplicate Report Duplicate issue or pull request Index Related to the Index class or subclasses

Comments

@dmirecki
Copy link

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': [1,2,3], 'B': [1,10,100]}).astype(np.uint16)
print(df.dtypes)
# A    uint16
# B    uint16
# dtype: object

df = df \
  .set_index('B') \
  .reset_index()
print(df.dtypes)
# B    uint64
# A    uint16
# dtype: object

Problem description

In the example above the type of column B changed from uint16 to uint64. It may cause the memory issue, so expected behavior is to not change the types implicitly.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.6.9.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.104+
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.0.3
numpy : 1.18.4
pytz : 2018.9
dateutil : 2.8.1
pip : 19.3.1
setuptools : 46.3.0
Cython : 0.29.18
pytest : 3.6.4
hypothesis : None
sphinx : 1.8.5
blosc : None
feather : 0.4.1
xlsxwriter : None
lxml.etree : 4.2.6
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.7.6.1 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 5.5.0
pandas_datareader: 0.8.1
bs4 : 4.6.3
bottleneck : 1.3.2
fastparquet : None
gcsfs : None
lxml.etree : 4.2.6
matplotlib : 3.2.1
numexpr : 2.7.1
odfpy : None
openpyxl : 2.5.9
pandas_gbq : 0.11.0
pyarrow : 0.14.1
pytables : None
pytest : 3.6.4
pyxlsb : None
s3fs : 0.4.2
scipy : 1.4.1
sqlalchemy : 1.3.17
tables : 3.4.4
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.1.0
xlwt : 1.3.0
xlsxwriter : None
numba : 0.48.0

@dmirecki dmirecki added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 22, 2020
@phofl
Copy link
Member

phofl commented May 23, 2020

I think I figured out were things go wrong here. Can someone help me out a bit to figure out, if this behavior is intentional or not? The dtype is changed at

https://github.com/pandas-dev/pandas/blob/master/pandas/core/indexes/numeric.py#L68

The function is_dtype_equal checks if the source dtype equals np.uint64 in this case in

https://github.com/pandas-dev/pandas/blob/master/pandas/core/dtypes/common.py#L608:L643

The dtype np.uint16 is cast to np.uint64 in this case, because both dtypes are not equal obviously. Is this done on purpose here, that np.uint16 and np.uint32 are cast to np.uint64 ?
If this is not done on purpose we could check if both dtypes are from the same category (np.unsignedinteger in this case).

@jreback
Copy link
Contributor

jreback commented May 23, 2020

see #30517 and the linked PR (closed) with 2
more issues: #27370

you are welcome to wade thru that discussion and potential resolution

closing this as duplicate

@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Index Related to the Index class or subclasses and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 23, 2020
@jreback jreback added this to the No action milestone May 23, 2020
@jreback jreback closed this as completed May 23, 2020
@jreback jreback added the Duplicate Report Duplicate issue or pull request label May 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Duplicate Report Duplicate issue or pull request Index Related to the Index class or subclasses
Projects
None yet
Development

No branches or pull requests

3 participants