Skip to content

Mixed dtypes of column index for data frames raises en error #47382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ChristopheBernard opened this issue Jun 16, 2022 · 5 comments · Fixed by #55720
Closed

Mixed dtypes of column index for data frames raises en error #47382

ChristopheBernard opened this issue Jun 16, 2022 · 5 comments · Fixed by #55720
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Index Related to the Index class or subclasses Needs Tests Unit test(s) needed to prevent regressions Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@ChristopheBernard
Copy link

ChristopheBernard commented Jun 16, 2022

Here is a snippet of code summarizing the issue (pandas 1.4.2, various versions of python).

import pandas as pd


d = pd.DataFrame(columns=list('abc'), data=0.0, index=[0])
e = pd.DataFrame(columns=list('abc'), data=0.0, index=[0])
e.columns = e.columns.astype('string')

d+e ## raises an unexpected exception

The simple addition of 2 arrays of apparently the same shape ends with the following exception:

ValueError: Location based indexing can only have [integer, integer slice 
(START point is INCLUDED, END point is EXCLUDED), listlike of integers, 
boolean array] types

The dtypes of the columns of p1 and p2 are respectively:

>>> p1.columns.dtype
dtype('O')

>>> p2.columns.dtype
string[python]

and that seems to cause the issue.

I would expect the formula p1+p2 to work like p1+p1.

@ChristopheBernard
Copy link
Author

This used to work with earlier pandas versions.

@phofl
Copy link
Member

phofl commented Jun 16, 2022

Hi, thanks for your report.

Could you reduce your example a bit? The pivots are not necessary to reproduce the bug :) see https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

@phofl phofl changed the title Mixed column dtypes for data frames raises en error Mixed dtypes of column index for data frames raises en error Jun 16, 2022
@ChristopheBernard
Copy link
Author

ChristopheBernard commented Jun 17, 2022

OK. Here is a reduced code snippet exhibiting the same issue:

import pandas as pd

d = pd.DataFrame(columns=list('abc'), data=0.0, index=[0])
e = pd.DataFrame(columns=list('abc'), data=0.0, index=[0])
e.columns = e.columns.astype('string')

d+e ## raises an unexpected exception

Fails with pandas 1.4.2, works fine with pandas 1.2.4

Traceback:

Traceback (most recent call last):
  File "/home/user/anaconda3/envs/python39/lib/python3.9/site-packages/pandas/core/indexing.py", line 769, in _validate_tuple_indexer
    self._validate_key(k, i)
  File "/home/user/anaconda3/envs/python39/lib/python3.9/site-packages/pandas/core/indexing.py", line 1378, in _validate_key
    raise ValueError(f"Can only index by location with a [{self._valid_types}]")
ValueError: Can only index by location with a [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array]

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/user/Documents/DEV/SO/ISSUE1/example2.py", line 7, in <module>
    d+e
  File "/home/user/anaconda3/envs/python39/lib/python3.9/site-packages/pandas/core/ops/common.py", line 70, in new_method
    return method(self, other)
  File "/home/user/anaconda3/envs/python39/lib/python3.9/site-packages/pandas/core/arraylike.py", line 100, in __add__
    return self._arith_method(other, operator.add)
  File "/home/user/anaconda3/envs/python39/lib/python3.9/site-packages/pandas/core/frame.py", line 6939, in _arith_method
    return ops.frame_arith_method_with_reindex(self, other, op)
  File "/home/user/anaconda3/envs/python39/lib/python3.9/site-packages/pandas/core/ops/__init__.py", line 364, in frame_arith_method_with_reindex
    new_left = left.iloc[:, lcols]
  File "/home/user/anaconda3/envs/python39/lib/python3.9/site-packages/pandas/core/indexing.py", line 961, in __getitem__
    return self._getitem_tuple(key)
  File "/home/user/anaconda3/envs/python39/lib/python3.9/site-packages/pandas/core/indexing.py", line 1458, in _getitem_tuple
    tup = self._validate_tuple_indexer(tup)
  File "/home/user/anaconda3/envs/python39/lib/python3.9/site-packages/pandas/core/indexing.py", line 771, in _validate_tuple_indexer
    raise ValueError(
ValueError: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types

Fails with pandas 1.4.2, i.e.:

INSTALLED VERSIONS
------------------
commit           : 4bfe3d07b4858144c219b9346329027024102ab6
python           : 3.9.7.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.13.0-48-generic
Version          : #54~20.04.1-Ubuntu SMP Thu Jun 2 23:37:17 UTC 2022
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.4.2
numpy            : 1.22.3

works fine with 1.2.4.

INSTALLED VERSIONS
------------------
commit           : 2cb96529396d93b46abab7bbc73a208e708c642e
python           : 3.8.8.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.13.0-48-generic
Version          : #54~20.04.1-Ubuntu SMP Thu Jun 2 23:37:17 UTC 2022
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.2.4
numpy            : 1.20.1

@phofl
Copy link
Member

phofl commented Jun 17, 2022

Thx, this is great. I'll edited the initial post. This is caused by #33940, depending on the resolution there, this might vanish or not

@phofl phofl added Numeric Operations Arithmetic, Comparison, and Logical operations ExtensionArray Extending pandas with custom dtypes or arrays. Index Related to the Index class or subclasses labels Jun 17, 2022
@jbrockmendel
Copy link
Member

This works on main, could use a test.

@jbrockmendel jbrockmendel added the Needs Tests Unit test(s) needed to prevent regressions label Jul 27, 2023
mliu08 added a commit to mliu08/pandas that referenced this issue Oct 27, 2023
mroeschke pushed a commit that referenced this issue Oct 30, 2023
* add test for mixed dtypes of df col index #47382

* modified test to use result and expected dfs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Index Related to the Index class or subclasses Needs Tests Unit test(s) needed to prevent regressions Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants