Skip to content

BUG: Int64 index_col raises AttributeError: 'BooleanArray' object has no attribute 'sum' #44835

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
5j9 opened this issue Dec 10, 2021 · 2 comments
Closed
2 of 3 tasks
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@5j9
Copy link
Contributor

5j9 commented Dec 10, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

from pandas import read_csv
from io import StringIO
df = read_csv(
    StringIO('1,2,3;4,5,6;7,8,9'), 
    lineterminator=';', names=('1', '2', '3'),
    dtype='Int64', index_col=('1', '2'))
print(df)

Issue Description

See the provided example. here is the traceback:

Traceback (most recent call last):
  File "C:\Users\a\AppData\Roaming\JetBrains\PyCharmCE2021.3\scratches\scratch_3.py", line 3, in <module>
    df = read_csv(
  File "C:\Users\a\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\a\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 586, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "C:\Users\a\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 488, in _read
    return parser.read(nrows)
  File "C:\Users\a\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 1047, in read
    index, columns, col_dict = self._engine.read(nrows)
  File "C:\Users\a\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 309, in read
    index, names = self._make_index(data, alldata, names)
  File "C:\Users\a\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\base_parser.py", line 416, in _make_index
    index = self._agg_index(index)
  File "C:\Users\a\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\base_parser.py", line 512, in _agg_index
    arr, _ = self._infer_types(arr, col_na_values | col_na_fvalues)
  File "C:\Users\a\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\base_parser.py", line 695, in _infer_types
    na_count = mask.sum()
AttributeError: 'BooleanArray' object has no attribute 'sum'

Process finished with exit code 1

Replace Int64 with int64 and it will work.

Expected Behavior

I expected Int64 to work just like int64; or maybe the error message can be more helpful?

Installed Versions

INSTALLED VERSIONS

commit : 73c6825
python : 3.10.1.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19043
machine : AMD64
processor : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252
pandas : 1.3.3
numpy : 1.21.2
pytz : 2021.3
dateutil : 2.8.2
pip : 21.3.1
setuptools : 58.1.0
Cython : 0.29.24
pytest : 6.2.5
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 3.0.2
lxml.etree : 4.6.3
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.3
IPython : 7.29.0
pandas_datareader: None
bs4 : 4.10.0
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.5.0
numexpr : None
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

@5j9 5j9 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 10, 2021
@jreback
Copy link
Contributor

jreback commented Dec 10, 2021

try on master pls

@phofl
Copy link
Member

phofl commented Dec 10, 2021

Duplicate of #44079 and works on master

@phofl phofl closed this as completed Dec 10, 2021
5j9 added a commit to 5j9/tsetmc that referenced this issue Dec 17, 2021
index_col does not work with Int64 datatype, see:
pandas-dev/pandas#44835
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

3 participants