BUG: Float values get corrupted with df.astype(), for values with no overflow error #34618

ketanhdoshi · 2020-06-06T13:44:32Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

# Your code here
df = pd.DataFrame({'a': [1, 2, 3, 4, 5], 'b': [1270., 570., 14130., 620., 29910.]})
df['b'] = df['b'].astype('float16')
df

Problem description

Converting a 'float64' to 'float16' should return the original values, as long as they don't cause any overflow. In the example above, two values (14130. and 29910.) get corrupted to (14128. and 29904.) respectively. Both those values can fit into a 'float16' without overflow, which can be confirmed via np.finfo(np.float16).min < 29910. < np.finfo(np.float16).max

Expected Output

The values (14130. and 29910.) should be preserved.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None
python : 3.6.9.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.104+
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.0.4
numpy : 1.18.4
pytz : 2018.9
dateutil : 2.8.1
pip : 19.3.1
setuptools : 47.1.1
Cython : 0.29.19
pytest : 3.6.4
hypothesis : None
sphinx : 1.8.5
blosc : None
feather : 0.4.1
xlsxwriter : None
lxml.etree : 4.2.6
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.7.6.1 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 5.5.0
pandas_datareader: 0.8.1
bs4 : 4.6.3
bottleneck : 1.3.2
fastparquet : None
gcsfs : None
lxml.etree : 4.2.6
matplotlib : 3.2.1
numexpr : 2.7.1
odfpy : None
openpyxl : 2.5.9
pandas_gbq : 0.11.0
pyarrow : 0.14.1
pytables : None
pytest : 3.6.4
pyxlsb : None
s3fs : 0.4.2
scipy : 1.4.1
sqlalchemy : 1.3.17
tables : 3.4.4
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.1.0
xlwt : 1.3.0
xlsxwriter : None
numba : 0.48.0

The text was updated successfully, but these errors were encountered:

ketanhdoshi · 2020-06-07T13:33:04Z

Here is a similar but slightly different example

# Your code here
df = pd.DataFrame({'a': [1, 2, 3, 4, 5], 'b': [4., 4., 2., 3., 3.]})
df['c'] = df['b'].astype('float16')
df['b'] = (df['b'] - 3.52) / 1.86
df['c'] = (df['c'] - 3.52) / 1.86
df

Problem description

In this case, data doesn't get corrupted in the .astype('float16') call, but the arithmetic operations after that produce different results for the 'float16' column 'c' versus the 'float64' column 'b'. Since all the numbers fit comfortably into the 'float16' numeric range, they should produce identical results.

jreback · 2020-06-07T13:59:13Z

pandas has almost 0 support for float16

you are welcome to contribute patches

miccoli · 2020-06-07T21:55:40Z

This is definitely not a bug.

Half-precision floating-point format has only 11 bits of significand precision. This means that integers between 8192 and 16384 round to a multiple of 8: 14128 and 14130 round to the same number in half precision.

And of course, regarding the second example, it is expected that arithmetic with half precision gives different results from arithmetic in double precision.

The only thing to note here is that numpy and pandas display half precision floating point numbers in a different way:

>>> pd.Series([14128, 14130]).astype('float16')
0    14128.0
1    14128.0
dtype: float16
>>> np.array([14128, 14130], dtype='float16')
array([14130., 14130.], dtype=float16)

but please, be assured, there is no corruption here, only correct rounding:

>>> np.float16(14130) == np.float16(14128)
True
>>> pd.Series([14128, 14130]).astype('float16') == np.array([14128, 14130], dtype='float16')
0    True
1    True
dtype: bool
>>> pd.Series([14128, 14130]).astype('float16') == np.float16(14128)
0    True
1    True
dtype: bool
>>> pd.Series([14128, 14130]).astype('float16') == np.float16(14130)
0    True
1    True
dtype: bool

Please see also numpy/numpy#12613 as what regards half precision display.

jreback · 2020-06-08T00:09:04Z

duplicates of #9220 and #22841

jreback · 2020-06-08T00:09:34Z

closing as a duplicate

ketanhdoshi added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 6, 2020

jreback added Dtype Conversions Unexpected or buggy dtype conversions Duplicate Report Duplicate issue or pull request and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 8, 2020

jreback added this to the No action milestone Jun 8, 2020

jreback closed this as completed Jun 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Float values get corrupted with df.astype(), for values with no overflow error #34618

BUG: Float values get corrupted with df.astype(), for values with no overflow error #34618

ketanhdoshi commented Jun 6, 2020

INSTALLED VERSIONS

ketanhdoshi commented Jun 7, 2020

jreback commented Jun 7, 2020

miccoli commented Jun 7, 2020 •

edited

Loading

jreback commented Jun 8, 2020

jreback commented Jun 8, 2020

BUG: Float values get corrupted with df.astype(), for values with no overflow error #34618

BUG: Float values get corrupted with df.astype(), for values with no overflow error #34618

Comments

ketanhdoshi commented Jun 6, 2020

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

ketanhdoshi commented Jun 7, 2020

Here is a similar but slightly different example

Problem description

jreback commented Jun 7, 2020

miccoli commented Jun 7, 2020 • edited Loading

jreback commented Jun 8, 2020

jreback commented Jun 8, 2020

Output of `pd.show_versions()`

miccoli commented Jun 7, 2020 •

edited

Loading