BUG: Precision loss when casting 19-digit integer to float #43979

ch3rn0v · 2021-10-11T20:43:28Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

ls = [
    1234567890123456789,
    np.nan,
    1234567890123337789,
]

pd.Series(ls, dtype=np.float128)

Issue Description

When a series is created from a list that contains np.nan values, its type is float.
And if the list contains 19-digit integers the precision of the last few digits is lost.

A similar issue seems to have been fixed in numpy a couple of years ago: numpy/numpy#9006.

Would you please suggest any workaround until this is resolved? Having a series with integer values and nans would work for me in this particular case. Thank you in advance.

Expected Behavior

The expected behaviour would be to keep the precision exactly.

Installed Versions

INSTALLED VERSIONS

commit : 73c6825
python : 3.8.5.final.0

pandas : 1.3.3
numpy : 1.20.2

The text was updated successfully, but these errors were encountered:

mzeitlin11 · 2021-10-11T21:11:43Z

Thanks for the report @ch3rn0v! My guess would be that some downcast to float64 occurs somewhere along the way (since there's not much explicit float128 support). Contributions welcome to find and fix!

For a workaround, holding nan and integers is what the new nullable integer type is designed for (https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html). Note however that there are still some open issues where operations still require casts to float, which may give a similar precision drop.

ch3rn0v · 2021-10-11T21:31:05Z

Thank you for the rapid reply!
I'll look into the nullable integer type. Would you mind pointing out those open issues please? (The docs mentions it's experimental, but that's about it). Or rather, what operations with such arrays are considered to be safe? For instance, I'm interested in making it a part of a dataframe, sorting, filling nans in one way or another, comparisons, using it as a mask or a part of a mask, perhaps math operations.

mzeitlin11 · 2021-10-11T21:53:52Z

The cases you mention should be generally good (though hard to guarantee since preserving int values which don't cast losslessly to float is not thoroughly tested for all operations). An example issue would be #37493 - in this case there is no support for the mask in the operation, so a cast to float is necessary to hold missing values. Another would be #30268. Filtering by the MaskedArrays label might show some others

Ark-kun · 2022-08-20T07:28:42Z

holding nan and integers is what the new nullable integer type is designed for

pandas.read_csv does not seem to support this. It reads nullable integer columns as float64 columns.

ch3rn0v added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 11, 2021

mzeitlin11 added the Dtype Conversions Unexpected or buggy dtype conversions label Oct 11, 2021

mroeschke added Constructors Series/DataFrame/Index/pd.array Constructors and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Precision loss when casting 19-digit integer to float #43979

BUG: Precision loss when casting 19-digit integer to float #43979

ch3rn0v commented Oct 11, 2021 •

edited

Loading

INSTALLED VERSIONS

mzeitlin11 commented Oct 11, 2021

ch3rn0v commented Oct 11, 2021

mzeitlin11 commented Oct 11, 2021

Ark-kun commented Aug 20, 2022

BUG: Precision loss when casting 19-digit integer to float #43979

BUG: Precision loss when casting 19-digit integer to float #43979

Comments

ch3rn0v commented Oct 11, 2021 • edited Loading

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

mzeitlin11 commented Oct 11, 2021

ch3rn0v commented Oct 11, 2021

mzeitlin11 commented Oct 11, 2021

Ark-kun commented Aug 20, 2022

ch3rn0v commented Oct 11, 2021 •

edited

Loading