Skip to content

BUG: Precision loss when casting 19-digit integer to float #43979

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 of 3 tasks
ch3rn0v opened this issue Oct 11, 2021 · 4 comments
Open
2 of 3 tasks

BUG: Precision loss when casting 19-digit integer to float #43979

ch3rn0v opened this issue Oct 11, 2021 · 4 comments
Labels
Bug Constructors Series/DataFrame/Index/pd.array Constructors Dtype Conversions Unexpected or buggy dtype conversions

Comments

@ch3rn0v
Copy link

ch3rn0v commented Oct 11, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

ls = [
    1234567890123456789,
    np.nan,
    1234567890123337789,
]

pd.Series(ls, dtype=np.float128)

Issue Description

When a series is created from a list that contains np.nan values, its type is float.
And if the list contains 19-digit integers the precision of the last few digits is lost.

A similar issue seems to have been fixed in numpy a couple of years ago: numpy/numpy#9006.

Would you please suggest any workaround until this is resolved? Having a series with integer values and nans would work for me in this particular case. Thank you in advance.

Expected Behavior

The expected behaviour would be to keep the precision exactly.

Installed Versions

INSTALLED VERSIONS

commit : 73c6825
python : 3.8.5.final.0

pandas : 1.3.3
numpy : 1.20.2

@ch3rn0v ch3rn0v added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 11, 2021
@mzeitlin11
Copy link
Member

Thanks for the report @ch3rn0v! My guess would be that some downcast to float64 occurs somewhere along the way (since there's not much explicit float128 support). Contributions welcome to find and fix!

For a workaround, holding nan and integers is what the new nullable integer type is designed for (https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html). Note however that there are still some open issues where operations still require casts to float, which may give a similar precision drop.

@mzeitlin11 mzeitlin11 added the Dtype Conversions Unexpected or buggy dtype conversions label Oct 11, 2021
@ch3rn0v
Copy link
Author

ch3rn0v commented Oct 11, 2021

Thank you for the rapid reply!
I'll look into the nullable integer type. Would you mind pointing out those open issues please? (The docs mentions it's experimental, but that's about it). Or rather, what operations with such arrays are considered to be safe? For instance, I'm interested in making it a part of a dataframe, sorting, filling nans in one way or another, comparisons, using it as a mask or a part of a mask, perhaps math operations.

@mzeitlin11
Copy link
Member

The cases you mention should be generally good (though hard to guarantee since preserving int values which don't cast losslessly to float is not thoroughly tested for all operations). An example issue would be #37493 - in this case there is no support for the mask in the operation, so a cast to float is necessary to hold missing values. Another would be #30268. Filtering by the MaskedArrays label might show some others

@mroeschke mroeschke added Constructors Series/DataFrame/Index/pd.array Constructors and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 16, 2021
@Ark-kun
Copy link

Ark-kun commented Aug 20, 2022

holding nan and integers is what the new nullable integer type is designed for

pandas.read_csv does not seem to support this. It reads nullable integer columns as float64 columns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Constructors Series/DataFrame/Index/pd.array Constructors Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

No branches or pull requests

4 participants