BUG: AttributeError: 'BooleanArray' object has no attribute 'sum' while infer types #44079

zhangxiaoxing · 2021-10-18T06:49:10Z

Line 694 in the pandas/io/parsers/base_parser.py file,

        if issubclass(values.dtype.type, (np.number, np.bool_)):
            # error: Argument 2 to "isin" has incompatible type "List[Any]"; expected
            # "Union[Union[ExtensionArray, ndarray], Index, Series]"
            mask = algorithms.isin(values, list(na_values))  # type: ignore[arg-type]
            na_count =mask.sum()
            ^^^^^^^^^^^^^THIS
            if na_count > 0:
                if is_integer_dtype(values):
                    values = values.astype(np.float64)
                np.putmask(values, mask, np.nan)
            return values, na_count

Caused an "AttributeError: 'BooleanArray' object has no attribute 'sum'" following some global ananconda update. Changed the line to
na_count =mask.astype('uint8').sum()

fixed the error in my case.

Edit by rhshadrach:

Minimal example

from io import StringIO

import pandas as pd

df = pd.read_csv(
     StringIO("0,1"),
     names=['a', 'b'],
     index_col=['a'],
     dtype={'a': 'UInt8'},
)

The text was updated successfully, but these errors were encountered:

rhshadrach · 2021-10-18T16:10:33Z

Can you post a reproducible example along with the version of pandas you're using.

LawrenceJGD · 2021-11-12T01:11:51Z

I'm having the same problem but from another place, it happens when I use read_csv to read a CSV file that has multiple indexes whose columns use the UInt8 dtype, although it also happens with UInt32; I don't know if the same will happen with the other numeric dtypes.

This is the code that I'm using:

>>> import pandas as pd
>>> df = pd.read_csv(
...     'geo20210715_pp.txt',
...     names=[
...         'cod_est', 'cod_mun', 'cod_par',
...         'nombre_est', 'nombre_mun', 'nombre_par'
...     ],
...     index_col=['cod_est', 'cod_mun', 'cod_par'],
...     dtype={
...         'cod_est': 'UInt8', 'cod_mun': 'UInt8', 'cod_par': 'UInt8',
...         'nombre_est': 'string', 'nombre_mun': 'string',
...         'nombre_par': 'string'
...     },
...     engine='c',
...     encoding='cp1252'
... )
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/usr/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 488, in _read
    return parser.read(nrows)
  File "/usr/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1047, in read
    index, columns, col_dict = self._engine.read(nrows)
  File "/usr/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 309, in read
    index, names = self._make_index(data, alldata, names)
  File "/usr/lib/python3.9/site-packages/pandas/io/parsers/base_parser.py", line 416, in _make_index
    index = self._agg_index(index)
  File "/usr/lib/python3.9/site-packages/pandas/io/parsers/base_parser.py", line 512, in _agg_index
    arr, _ = self._infer_types(arr, col_na_values | col_na_fvalues)
  File "/usr/lib/python3.9/site-packages/pandas/io/parsers/base_parser.py", line 695, in _infer_types
    na_count = mask.sum()
AttributeError: 'BooleanArray' object has no attribute 'sum'

And this is the CSV file (don't worry, is public information): geo20210715_pp.txt.

My way for circumventing it is this:

>>> import pandas as pd
>>> df = pd.read_csv(
...     '/home/galaxyljgd/Documentos/otros/registro_electoral_2021/geo20210715_pp.txt',
...     names=[
...         'cod_est', 'cod_mun', 'cod_par',
...         'nombre_est', 'nombre_mun', 'nombre_par'
...     ],
...     dtype={
...         'cod_est': 'UInt8', 'cod_mun': 'UInt8', 'cod_par': 'UInt8',
...         'nombre_est': 'string', 'nombre_mun': 'string',
...         'nombre_par': 'string'
...     },
...     engine='c',
...     encoding='cp1252'
... )
>>> df.set_index(['cod_est', 'cod_mun', 'cod_par'], inplace=True)
>>> df
                           nombre_est      nombre_mun                 nombre_par
cod_est cod_mun cod_par                                                         
21      1       3          EDO. ZULIA      MP. BARALT   PQ. MANUEL GUANIPA MATOS
                4          EDO. ZULIA      MP. BARALT      PQ. MARCELINO BRICEÑO
                5          EDO. ZULIA      MP. BARALT            PQ. SAN TIMOTEO
                6          EDO. ZULIA      MP. BARALT           PQ. PUEBLO NUEVO
        2       1          EDO. ZULIA  MP. SANTA RITA  PQ. PEDRO LUCAS URRIBARRI
...                               ...             ...                        ...
6       1       11       EDO. BOLIVAR      MP. CARONI             PQ. 5 DE JULIO
99      13      4            EMBAJADA          CANADA                  VANCOUVER
        97      1            EMBAJADA        JORDANIA                      AMMAN
        98      1            EMBAJADA          CHIPRE                    NICOSIA
        99      1            EMBAJADA          SERBIA                   BELGRADO

[1394 rows x 3 columns]

rhshadrach · 2021-11-12T22:44:52Z

Thanks @LawrenceJGD, I've updated the OP with a minimal example and confirmed this exists on master. Investigations and PRs to fix are most welcome!

zhangxiaoxing · 2021-11-13T03:31:10Z

take

…le infer types #44079 (#44442)

rhshadrach added the Needs Info Clarification about behavior needed to assess issue label Oct 18, 2021

mroeschke added the Bug label Oct 30, 2021

rhshadrach added this to the Contributions Welcome milestone Nov 12, 2021

rhshadrach added IO CSV read_csv, to_csv and removed Needs Info Clarification about behavior needed to assess issue labels Nov 12, 2021

github-actions bot assigned zhangxiaoxing Nov 13, 2021

This was referenced Nov 13, 2021

Bug gh44079 #44422

Closed

BUG: AttributeError: 'BooleanArray' object has no attribute 'sum' while infer types #44423

Closed

BUG: AttributeError: 'BooleanArray' object has no attribute 'sum' while infer types #44079 #44442

Merged

jreback added the NA - MaskedArrays Related to pd.NA and nullable extension arrays label Nov 14, 2021

jreback modified the milestones: Contributions Welcome, 1.4 Nov 14, 2021

jreback closed this as completed in #44442 Nov 20, 2021

jreback pushed a commit that referenced this issue Nov 20, 2021

BUG: AttributeError: 'BooleanArray' object has no attribute 'sum' whi…

c676844

…le infer types #44079 (#44442)

rhshadrach mentioned this issue Nov 20, 2021

API: Dispatch mechanism for EA reductions #33790

Closed

phofl mentioned this issue Dec 10, 2021

BUG: Int64 index_col raises AttributeError: 'BooleanArray' object has no attribute 'sum' #44835

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: AttributeError: 'BooleanArray' object has no attribute 'sum' while infer types #44079

BUG: AttributeError: 'BooleanArray' object has no attribute 'sum' while infer types #44079

zhangxiaoxing commented Oct 18, 2021 •

edited by rhshadrach

Loading

rhshadrach commented Oct 18, 2021

LawrenceJGD commented Nov 12, 2021

rhshadrach commented Nov 12, 2021

zhangxiaoxing commented Nov 13, 2021

BUG: AttributeError: 'BooleanArray' object has no attribute 'sum' while infer types #44079

BUG: AttributeError: 'BooleanArray' object has no attribute 'sum' while infer types #44079

Comments

zhangxiaoxing commented Oct 18, 2021 • edited by rhshadrach Loading

rhshadrach commented Oct 18, 2021

LawrenceJGD commented Nov 12, 2021

rhshadrach commented Nov 12, 2021

zhangxiaoxing commented Nov 13, 2021

zhangxiaoxing commented Oct 18, 2021 •

edited by rhshadrach

Loading