Column dtype change on write of improper value #26049

jp68138743541 · 2019-04-11T08:01:13Z

Code Sample

import pandas as pd
df = pd.DataFrame(columns=['col1'], data=[1,2,3,4], dtype='uint8')
print('original dtypes:')
print(df.dtypes)
print()
print('original data frame:')
print(df)
print()
df.loc[2,'col1']=300
print('dtypes after write operation:')
print(df.dtypes)
print()
print('data frameafter write:')
print(df)

output:

original dtypes:
col1    uint8
dtype: object

original data frame:
   col1
0     1
1     2
2     3
3     4

dtypes after write operation:
col1    int64
dtype: object

data frameafter write:
   col1
0     1
1     2
2    44
3     4

Problem description

When writing, e.g., a too big integer to an 8-bit unsigned integer column, the value of the written integer is casted to uint8 and the data type of the column is changed to int64.

Expected Output

I would expect that either the value is casted and the data type is retained or the data type gets changed and the value is retained.

original dtypes:
col1    uint8
dtype: object

original data frame:
   col1
0     1
1     2
2     3
3     4

dtypes after write operation:
col1    uint8
dtype: object

data frameafter write:
   col1
0     1
1     2
2    44
3     4

or

original dtypes:
col1    uint8
dtype: object

original data frame:
   col1
0     1
1     2
2     3
3     4

dtypes after write operation:
col1    int64 [or even better uint16]
dtype: object

data frameafter write:
   col1
0     1
1     2
2    300
3     4

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.8.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.125-linuxkit
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: 4.3.1
pip: 19.0.3
setuptools: 40.8.0
Cython: 0.29.6
numpy: 1.16.2
scipy: 1.2.1
pyarrow: 0.11.1
xarray: 0.11.3
IPython: 7.1.1
sphinx: 2.0.0
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: None
tables: 3.5.1
numexpr: 2.6.9
feather: None
matplotlib: 3.0.3
openpyxl: None
xlrd: 1.2.0
xlwt: None
xlsxwriter: 1.1.5
lxml.etree: 4.3.0
bs4: 4.7.1
html5lib: None
sqlalchemy: 1.3.1
pymysql: None
psycopg2: 2.7.6.1 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: 0.2.1
pandas_gbq: None
pandas_datareader: None
gcsfs: None

The text was updated successfully, but these errors were encountered:

WillAyd · 2019-04-11T15:17:59Z

That is strange. Option 2 is what we would want here - investigation and PRs to fix would certainly be welcome

bpieper26 · 2019-04-25T22:06:49Z

Behavior is consistent with numpy.
dtype appears to overflow and loop back around to zero.

import numpy as np

value = np.array([300], dtype=np.uint8)[0]

print(f'value = {value}, value should = 300')

print(f'{300 - 2**8} = 300 - 2**8')

Output

value = 44, value should = 300
44 = 300 - 2**8

bpieper26 · 2019-04-26T14:13:21Z

My apologies, I failed to see the dtype change in the original example. Numpy's behavior is not exactly consistent pandas' as it doesn't cast the dtype to int64.

array = np.array([300], dtype=np.uint8)
print(f'value = {array[0]}, value should = 300')
print(f'array.dtype = {array.dtype}')

Ouput

value = 44, value should = 300
array.dtype = uint8

Will continue investigating on the pandas side.

phofl · 2020-11-07T21:32:06Z

@WillAyd The current behavior on master is

original dtypes:
col1    uint8
dtype: object

original data frame:
   col1
0     1
1     2
2     3
3     4

dtypes after write operation:
col1    uint8
dtype: object

data frameafter write:
   col1
0     1
1     2
2    44
3     4

Is int64 and 300 still the expected behavior?

WillAyd · 2020-11-10T17:50:45Z

Hmm OK. This is a pretty ambiguous set of actions so I think matching numpy is the best we can offer. Even there, this probably just falls back to the C standard for wrap around of unsigned integers

WillAyd added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Apr 11, 2019

WillAyd added this to the Contributions Welcome milestone Apr 11, 2019

WillAyd added the Dtype Conversions Unexpected or buggy dtype conversions label Apr 11, 2019

jbrockmendel mentioned this issue Jan 8, 2022

BUG: can_hold_element size checks on ints/floats #45273

Merged

4 tasks

jreback modified the milestones: Contributions Welcome, 1.5 Jan 10, 2022

jreback closed this as completed in #45273 Jan 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Column dtype change on write of improper value #26049

Column dtype change on write of improper value #26049

jp68138743541 commented Apr 11, 2019

INSTALLED VERSIONS

WillAyd commented Apr 11, 2019

bpieper26 commented Apr 25, 2019 •

edited

Loading

bpieper26 commented Apr 26, 2019

phofl commented Nov 7, 2020

WillAyd commented Nov 10, 2020

Column dtype change on write of improper value #26049

Column dtype change on write of improper value #26049

Comments

jp68138743541 commented Apr 11, 2019

Code Sample

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

WillAyd commented Apr 11, 2019

bpieper26 commented Apr 25, 2019 • edited Loading

bpieper26 commented Apr 26, 2019

phofl commented Nov 7, 2020

WillAyd commented Nov 10, 2020

Output of `pd.show_versions()`

bpieper26 commented Apr 25, 2019 •

edited

Loading