Skip to content

BUG: NamedTuples do no match tuples in pandas.Index #57922

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
Apteryx0 opened this issue Mar 19, 2024 · 6 comments · Fixed by #59286
Closed
2 of 3 tasks

BUG: NamedTuples do no match tuples in pandas.Index #57922

Apteryx0 opened this issue Mar 19, 2024 · 6 comments · Fixed by #59286
Assignees
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@Apteryx0
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

$ ipython
Python 3.11.8 (main, Mar 19 2024, 17:46:15) [GCC 11.4.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.22.2 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import pandas

In [2]: import collections

In [3]: MyNamedTuple = collections.namedtuple("MyNamedTuple", "id sub_id")

In [4]: first = MyNamedTuple('identity','1234')

In [5]: idx = pandas.Index([('identity','1234')])

In [6]: idx
Out[6]: 
MultiIndex([('identity', '1234')],
           )

In [7]: idx2 = idx.to_flat_index()

In [8]: idx2
Out[8]: Index([('identity', '1234')], dtype='object')

In [9]: first in idx
Out[9]: True

In [10]: first in idx2
Out[10]: False

In [11]: first in idx2.to_list()
Out[11]: True

In [12]: first == idx2[0]
Out[12]: True

In [13]: pandas.__version__
Out[13]: '2.2.1'

In [14]: idx.get_loc(first)
Out[14]: 0

In [15]: idx2.get_loc(first)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/.pyenv/versions/3.11.8/envs/test-venv/lib/python3.11/site-packages/pandas/core/indexes/base.py:3805, in Index.get_loc(self, key)
   3804 try:
-> 3805     return self._engine.get_loc(casted_key)
   3806 except KeyError as err:

File index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()

File index.pyx:196, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:7081, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:7089, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: MyNamedTuple(id='identity', sub_id='1234')

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[15], line 1
----> 1 idx2.get_loc(first)

File ~/.pyenv/versions/3.11.8/envs/test-venv/lib/python3.11/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key)
   3807     if isinstance(casted_key, slice) or (
   3808         isinstance(casted_key, abc.Iterable)
   3809         and any(isinstance(x, slice) for x in casted_key)
   3810     ):
   3811         raise InvalidIndexError(key)
-> 3812     raise KeyError(key) from err
   3813 except TypeError:
   3814     # If we have a listlike key, _check_indexing_error will raise
   3815     #  InvalidIndexError. Otherwise we fall through and re-raise
   3816     #  the TypeError.
   3817     self._check_indexing_error(key)

KeyError: MyNamedTuple(id='identity', sub_id='1234')

In [16]:

Issue Description

Upgraded from pandas 1.2.5 to pandas 1.3.5 and noticed that I was unable to reference columns in a dataframe with column labels that were tuples via a NamedTuple, i.e. KeyError. Grabbed the latest pandas and reduced the issue down to pandas.Index.get_loc - though it works in the case where I leave the Index as a MultiIndex.

Note: I have seen the code work in about 25% of cases, so if you see it succeed please try again

Expected Behavior

NamedTuples should match regular tuples as they do elsewhere in python (as illustrated by the fact that they match when one does idx.to_list())

Installed Versions

In [16]: pandas.show_versions()
/home/russellm/.pyenv/versions/3.11.8/envs/test-venv/lib/python3.11/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS

commit : bdc79c1
python : 3.11.8.final.0
python-bits : 64
OS : Linux
OS-release : 6.5.0-21-generic
Version : #21~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Feb 9 13:32:52 UTC 2
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.2.1
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : 65.5.0
pip : 24.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 8.22.2
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None

In [17]:

@Apteryx0 Apteryx0 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 19, 2024
@Apteryx0
Copy link
Author

FWIW

In [20]: idx2._engine
Out[20]: <pandas._libs.index.ObjectEngine at 0x7f071ddcab00>

In [21]: idx2._engine.values
Out[21]: array([('identity', '1234')], dtype=object)

In [22]: first in idx2._engine.values
Out[22]: False

In [23]: first == idx2._engine.values[0]
Out[23]: True

In [24]: hash(first)
Out[24]: 5766037510587733218

In [25]: hash(idx2._engine.values[0])
Out[25]: 5766037510587733218

In [26]: 

@dontgoto
Copy link
Contributor

Happens on main as well, including the non deterministic success in a minority of attempts.

@rhshadrach
Copy link
Member

Thanks for the report. In general you will find very little support for containers as elements of an index or columns.

Related: #57004 (comment)

@rhshadrach rhshadrach added Indexing Related to indexing on series/frames, not to indexes themselves and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 20, 2024
@20revsined
Copy link

Hi, my partner (GitHub: mwanink) and I would like to work on this issue.

@20revsined
Copy link

take

@20revsined 20revsined removed their assignment May 5, 2024
@matiaslindgren
Copy link
Contributor

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants