Skip to content

BUG: list subclass objects don't work as indexers for data frames in 1.3.x #42433

Closed
@pwwang

Description

@pwwang
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

from pandas import DataFrame
df = DataFrame(dict(x=[1,2,3]))
class MyList(list):
    ...

key = MyList(['x'])
print(df[key])
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    df[key]
  File "/.../python3.7/site-packages/pandas/core/frame.py", line 3445, in __getitem__
    if com.is_bool_indexer(key):
  File "/.../python3.7/site-packages/pandas/core/common.py", line 146, in is_bool_indexer
    return len(key) > 0 and lib.is_bool_list(key)
TypeError: Argument 'obj' has incorrect type (expected list, got MyList)

Problem description

This works in v1.2.*.

This because:

pandas/pandas/core/frame.py

Lines 3445 to 3446 in f00ed8f

if com.is_bool_indexer(key):
return self._getitem_bool_array(key)

doesn't bypass list subclasses so that MyList objects are not treated as a collection of keys.

If we step in:

elif isinstance(key, list):
# check if np.array(key).dtype would be bool
return len(key) > 0 and lib.is_bool_list(key)

Compared to v1.2.5:

elif isinstance(key, list):
try:
arr = np.asarray(key)
return arr.dtype == np.bool_ and len(arr) == len(key)
except TypeError: # pragma: no cover
return False

A possible solution would be:

    elif isinstance(key, list):
        # check if np.array(key).dtype would be bool
+       try:
-       return len(key) > 0 and lib.is_bool_list(key)
+           return len(key) > 0 and lib.is_bool_list(key)
+       except TypeError:
+           return False

Expected Output

Print out the data frame df.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : f00ed8f
python : 3.7.8.final.0
python-bits : 64
OS : Linux
OS-release : 4.4.0-19041-Microsoft
Version : #488-Microsoft Mon Sep 01 13:43:00 PST 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.3.0
numpy : 1.21.0
pytz : 2020.4
dateutil : 2.8.1
pip : 19.3.1
setuptools : 52.0.0.post20210125
Cython : None
pytest : 6.2.2
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.19.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.4
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.6.3
sqlalchemy : 1.3.20
tables : None
tabulate : 0.8.6
xarray : None
xlrd : None
xlwt : None
numba : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIndexingRelated to indexing on series/frames, not to indexes themselvesRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions