FIX: Solving Int64 precision loss when read_csv(StringIO(data), dtype… #56253

thiago0003 · 2023-11-30T07:51:54Z

…={"a": "Int64"}, engine=pyarrow)

[] closes BUG: read_csv losing precision when reading Int64[pyarrow] data with N/A values #56135 and BUG: read_csv loses precision when engine='pyarrow' and dtype Int64 #56136
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.

Enable the integer_object_nulls option by setting it to True in the following line of code in the "arrow_parser_wrapper.py" file:
table.to_pandas(types_mapper=self.kwds["dtype"].get, integer_object_nulls=True)
This adjustment is crucial when using the "pyarrow" engine as a parameter for the 'read_csv' function. It ensures that columns containing integers (Int64) with null elements within a dataframe maintain their precision. Without this setting, numpy may treat columns with null elements by converting the entire column data to Float, leading to precision loss. Enabling 'integer_object_nulls' prevents this issue and preserves the precision of integer data.

…={"a": "Int64"}, engine=pyarrow)

phofl

Please change relevant files only

This fix has huge performance implications, I'd rather not do that

mroeschke · 2023-11-30T17:32:27Z

Thanks for the PR, but appears this is already being worked on in #56251 so closing in favor of that PR

thiago0003 added 2 commits November 30, 2023 04:48

FIX: Solving Int64 precision loss when read_csv(StringIO(data), dtype…

1dc673d

…={"a": "Int64"}, engine=pyarrow)

FIX: Solving Int64 precision loss when read_csv(StringIO(data), dtype…

1fb65d0

…={"a": "Int64"}, engine=pyarrow)

phofl requested changes Nov 30, 2023

View reviewed changes

mroeschke closed this Nov 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: Solving Int64 precision loss when read_csv(StringIO(data), dtype… #56253

FIX: Solving Int64 precision loss when read_csv(StringIO(data), dtype… #56253

thiago0003 commented Nov 30, 2023

phofl left a comment

mroeschke commented Nov 30, 2023

FIX: Solving Int64 precision loss when read_csv(StringIO(data), dtype… #56253

FIX: Solving Int64 precision loss when read_csv(StringIO(data), dtype… #56253

Conversation

thiago0003 commented Nov 30, 2023

phofl left a comment

Choose a reason for hiding this comment

mroeschke commented Nov 30, 2023