Skip to content

Commit c676844

Browse files
BUG: AttributeError: 'BooleanArray' object has no attribute 'sum' while infer types #44079 (#44442)
1 parent 63ebf77 commit c676844

File tree

3 files changed

+26
-1
lines changed

3 files changed

+26
-1
lines changed

doc/source/whatsnew/v1.4.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -646,6 +646,7 @@ I/O
646646
- Bug in :func:`read_csv` with :code:`float_precision="round_trip"` which did not skip initial/trailing whitespace (:issue:`43713`)
647647
- Bug in dumping/loading a :class:`DataFrame` with ``yaml.dump(frame)`` (:issue:`42748`)
648648
- Bug in :func:`read_csv` raising ``ValueError`` when ``parse_dates`` was used with ``MultiIndex`` columns (:issue:`8991`)
649+
- Bug in :func:`read_csv` raising ``AttributeError`` when attempting to read a .csv file and infer index column dtype from an nullable integer type (:issue:`44079`)
649650
- :meth:`DataFrame.to_csv` and :meth:`Series.to_csv` with ``compression`` set to ``'zip'`` no longer create a zip file containing a file ending with ".zip". Instead, they try to infer the inner file name more smartly. (:issue:`39465`)
650651

651652
Period

pandas/io/parsers/base_parser.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -705,7 +705,7 @@ def _infer_types(self, values, na_values, try_num_bool=True):
705705
# error: Argument 2 to "isin" has incompatible type "List[Any]"; expected
706706
# "Union[Union[ExtensionArray, ndarray], Index, Series]"
707707
mask = algorithms.isin(values, list(na_values)) # type: ignore[arg-type]
708-
na_count = mask.sum()
708+
na_count = mask.astype("uint8", copy=False).sum()
709709
if na_count > 0:
710710
if is_integer_dtype(values):
711711
values = values.astype(np.float64)

pandas/tests/io/parser/test_index_col.py

+24
Original file line numberDiff line numberDiff line change
@@ -297,3 +297,27 @@ def test_multiindex_columns_index_col_with_data(all_parsers):
297297
index=Index(["data"]),
298298
)
299299
tm.assert_frame_equal(result, expected)
300+
301+
302+
@skip_pyarrow
303+
def test_infer_types_boolean_sum(all_parsers):
304+
# GH#44079
305+
parser = all_parsers
306+
result = parser.read_csv(
307+
StringIO("0,1"),
308+
names=["a", "b"],
309+
index_col=["a"],
310+
dtype={"a": "UInt8"},
311+
)
312+
expected = DataFrame(
313+
data={
314+
"a": [
315+
0,
316+
],
317+
"b": [1],
318+
}
319+
).set_index("a")
320+
# Not checking index type now, because the C parser will return a
321+
# index column of dtype 'object', and the Python parser will return a
322+
# index column of dtype 'int64'.
323+
tm.assert_frame_equal(result, expected, check_index_type=False)

0 commit comments

Comments
 (0)