Skip to content

Commit c1ce0a7

Browse files
authored
BUG: read_csv overflowing for ea int with nulls (#50847)
1 parent 0e484ed commit c1ce0a7

File tree

3 files changed

+23
-1
lines changed

3 files changed

+23
-1
lines changed

doc/source/whatsnew/v2.0.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -1023,6 +1023,7 @@ I/O
10231023
- Bug in :meth:`DataFrame.to_string` ignoring float formatter for extension arrays (:issue:`39336`)
10241024
- Fixed memory leak which stemmed from the initialization of the internal JSON module (:issue:`49222`)
10251025
- Fixed issue where :func:`json_normalize` would incorrectly remove leading characters from column names that matched the ``sep`` argument (:issue:`49861`)
1026+
- Bug in :func:`read_csv` unnecessarily overflowing for extension array dtype when containing ``NA`` (:issue:`32134`)
10261027
- Bug in :meth:`DataFrame.to_dict` not converting ``NA`` to ``None`` (:issue:`50795`)
10271028
- Bug in :meth:`DataFrame.to_json` where it would segfault when failing to encode a string (:issue:`50307`)
10281029
- Bug in :func:`read_xml` where file-like objects failed when iterparse is used (:issue:`50641`)

pandas/core/arrays/numeric.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -285,7 +285,7 @@ def _from_sequence_of_strings(
285285
) -> T:
286286
from pandas.core.tools.numeric import to_numeric
287287

288-
scalars = to_numeric(strings, errors="raise")
288+
scalars = to_numeric(strings, errors="raise", use_nullable_dtypes=True)
289289
return cls._from_sequence(scalars, dtype=dtype, copy=copy)
290290

291291
_HANDLED_TYPES = (np.ndarray, numbers.Number)

pandas/tests/io/parser/dtypes/test_dtypes_basic.py

+21
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
import pandas._testing as tm
1919
from pandas.core.arrays import (
2020
ArrowStringArray,
21+
IntegerArray,
2122
StringArray,
2223
)
2324

@@ -527,3 +528,23 @@ def test_use_nullable_dtypes_pyarrow_backend(all_parsers, request):
527528
}
528529
)
529530
tm.assert_frame_equal(result, expected)
531+
532+
533+
def test_ea_int_avoid_overflow(all_parsers):
534+
# GH#32134
535+
parser = all_parsers
536+
data = """a,b
537+
1,1
538+
,1
539+
1582218195625938945,1
540+
"""
541+
result = parser.read_csv(StringIO(data), dtype={"a": "Int64"})
542+
expected = DataFrame(
543+
{
544+
"a": IntegerArray(
545+
np.array([1, 1, 1582218195625938945]), np.array([False, True, False])
546+
),
547+
"b": 1,
548+
}
549+
)
550+
tm.assert_frame_equal(result, expected)

0 commit comments

Comments
 (0)