Skip to content

BUG: Rounding on float16 type not working #35124

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
Genarito opened this issue Jul 4, 2020 · 15 comments
Closed
2 of 3 tasks

BUG: Rounding on float16 type not working #35124

Genarito opened this issue Jul 4, 2020 · 15 comments
Labels
Dtype Conversions Unexpected or buggy dtype conversions Usage Question

Comments

@Genarito
Copy link

Genarito commented Jul 4, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Problem description

I'm trying to round a column to 4 decimal places to store them in the DB. I must use np.float16 type to work with as I need the smallest possible precision, but it's not working (see output section), if I don't cast the values and leave them as np.float64 type It'll work correctly.
I think it's a bug of Pandas as if I run:

np.array([0.37324, 0.12321, 0.23]).astype(np.float16).round(4)

It works as expected, so It doesn't seem to be a problem for Numpy.

Code Sample

import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([0.37324, 0.12321, 0.23]), columns=['A'])
df.astype(np.float16).round(4)

Running:

pd.DataFrame(np.array([0.37324, 0.12321, 0.23]).astype(np.float16), columns=['A']).round({'A': 4})

Doesn't work either as it has the same output.

Current Output

          A
0  0.373291
1  0.123230
2  0.229980

Expected Output

          A
0  0.3732
1  0.1232
2  0.23

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.3.0-62-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : es_AR.UTF-8
LOCALE : es_AR.UTF-8
pandas : 1.0.5
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 18.1
setuptools : 40.8.0
Cython : None
pytest : 5.3.5
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.3.5
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None

@Genarito Genarito added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 4, 2020
@jreback
Copy link
Contributor

jreback commented Jul 4, 2020

we have very little support for float16

furthermore i suspect this is really just a display issue; convert the series to numpy and see

@Genarito
Copy link
Author

Genarito commented Jul 4, 2020

Hello!
First of all, thanks for responding!
Secondly, when it converts to a Numpy array, the rounding works, I confirmed it above in the topic description. However, it would be great to have a built-in solution to handle this kind of rounding at float16.
I'm inserting the DataFrame in the DB with PyMongo, I need all the DF structure, but every time I put the rounded data in it the values return to their original representation and are inserted with more than 4 decimals. So I'm not sure if it's just a display problem.

@erfannariman
Copy link
Member

erfannariman commented Jul 5, 2020

furthermore i suspect this is really just a display issue; convert the series to numpy and see

This is actually the case. Converting it back to numpy gives us the correct result, so the values underneath are actually rounded correct:

df = pd.DataFrame(np.array([0.37324, 0.12321, 0.23]), columns=['A'])
df.astype(np.float16).round(4).to_numpy()

array([[0.3733],
       [0.1232],
       [0.23  ]], dtype=float16)
pd.DataFrame(np.array([0.37324, 0.12321, 0.23]).astype(np.float16), columns=['A']).round({'A': 4}).to_numpy()

array([[0.3733],
       [0.1232],
       [0.23  ]], dtype=float16)

@jreback jreback added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Dtype Conversions Unexpected or buggy dtype conversions Usage Question and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Jul 5, 2020
@jreback jreback added this to the No action milestone Jul 5, 2020
@jreback jreback closed this as completed Jul 5, 2020
@Genarito
Copy link
Author

Genarito commented Jul 5, 2020

Thank you! When I save in CSV format it was 4 decimals truncated as expected. But now I found a use case where it's not working:

df = pd.DataFrame(np.array([0.37324, 0.12321, 0.23]), columns=['A'])
df.astype(np.float16).round(4).to_dict('records')

Outputs [{'A': 0.373291015625}, {'A': 0.12322998046875}, {'A': 0.22998046875}]. Shouldn't the truncation be preserved during the conversion?

Also important: the same problem occurs with float32. The float64 and float128 types don't present the problem.

Please, consider re-open the issue as the examples I gave are the commonly used way to work with Pandas and PyMongo. We could iterate over all the dict, but that's not the idea.

Thanks in advance

@erfannariman
Copy link
Member

Can confirm the problem stated above occurs on master as well.

@jreback would that be sufficient to re-open this issue?

@erfannariman
Copy link
Member

Maybe important to note that to_numpy still gives the correct result, to_dict does not:

>>> df.astype(np.float16).round(4).to_numpy()
array([[0.3733],
       [0.1232],
       [0.23  ]], dtype=float16)

@Genarito
Copy link
Author

Genarito commented Jul 7, 2020

@jreback Is there any update on this? If you like i would try making a PR when I have some free time.

@jreback
Copy link
Contributor

jreback commented Jul 7, 2020

you can make a PR but really sure what you are attempting to solve

@Genarito
Copy link
Author

Genarito commented Jul 7, 2020

Ok, I'll try in a few weeks

@mickaelandrieu
Copy link

Hi,

The issue is still present on the latest release of pandas with float32.

Regards,

@jreback
Copy link
Contributor

jreback commented Feb 22, 2022

you would have to open a new issue with a reproducible example

@mickaelandrieu
Copy link

Hi Jeff,

I don't know ! This issue was reproducible, clear and ... closed with no action.

The best I can do here is to inform you, then cast my series to float64 and move on.

No hard feelings here, I'm glad this bug was reported because I know why it's not working as I expected.

Regards,

@jreback
Copy link
Contributor

jreback commented Feb 22, 2022

this is about float16

@Genarito
Copy link
Author

Genarito commented Feb 22, 2022

Now it's about float16 and float32 :'D

@ghost
Copy link

ghost commented Dec 8, 2022

I found the root cause: #50125 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Usage Question
Projects
None yet
Development

No branches or pull requests

4 participants