FIX: Solving Int64 precision loss when read_csv(StringIO(data), dtype… #56251

vitorcf10 · 2023-11-30T06:29:08Z

…={a:Int64}, engine=pyarrow)

[] closes BUG: read_csv losing precision when reading Int64[pyarrow] data with N/A values #56135 and BUG: read_csv loses precision when engine='pyarrow' and dtype Int64 #56136
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.

Enable the integer_object_nulls option by setting it to True in the following line of code in the "arrow_parser_wrapper.py" file:
table.to_pandas(types_mapper=self.kwds["dtype"].get, integer_object_nulls=True)
This adjustment is crucial when using the "pyarrow" engine as a parameter for the 'read_csv' function. It ensures that columns containing integers (Int64) with null elements within a dataframe maintain their precision. Without this setting, numpy may treat columns with null elements by converting the entire column data to Float, leading to precision loss. Enabling 'integer_object_nulls' prevents this issue and preserves the precision of integer data.

…={a:Int64}, engine=pyarrow)

phofl · 2023-11-30T17:33:56Z

Doing what you are describing here has huge performance implications, I don't want to do this here

mroeschke · 2023-12-27T19:17:16Z

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

vitorcf10 added 3 commits November 30, 2023 02:47

FIX: Solving Int64 precision loss when read_csv(StringIO(data), dtype…

7418a67

…={a:Int64}, engine=pyarrow)

Changing for test approval.

750e8af

testcommit

23d3772

mroeschke mentioned this pull request Nov 30, 2023

FIX: Solving Int64 precision loss when read_csv(StringIO(data), dtype… #56253

Closed

mroeschke closed this Dec 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: Solving Int64 precision loss when read_csv(StringIO(data), dtype… #56251

FIX: Solving Int64 precision loss when read_csv(StringIO(data), dtype… #56251

vitorcf10 commented Nov 30, 2023 •

edited

Loading

phofl commented Nov 30, 2023

mroeschke commented Dec 27, 2023

FIX: Solving Int64 precision loss when read_csv(StringIO(data), dtype… #56251

FIX: Solving Int64 precision loss when read_csv(StringIO(data), dtype… #56251

Conversation

vitorcf10 commented Nov 30, 2023 • edited Loading

phofl commented Nov 30, 2023

mroeschke commented Dec 27, 2023

vitorcf10 commented Nov 30, 2023 •

edited

Loading