Skip to content

Fatal error with astype if duplicate columns are supplied for categorical #24704

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
thomasfrederikhoeck opened this issue Jan 10, 2019 · 7 comments
Labels
Astype Bug ExtensionArray Extending pandas with custom dtypes or arrays. Indexing Related to indexing on series/frames, not to indexes themselves Segfault Non-Recoverable Error

Comments

@thomasfrederikhoeck
Copy link

Code Sample

Running the following code for changing type to category runs perfectly

import pandas as pd

df = pd.DataFrame({'a': ['1',1,3], 'b' : [1,2,3]})

print(df.dtypes)

categoricals = list(df.select_dtypes(include='object').columns.values)
df[categoricals] = df[categoricals].astype('category')

print(df.dtypes)

which returns

a    object
b     int64
dtype: object

a    category
b       int64
dtype: object

If an extra extra column is faulty added ('a' is added again):

import pandas as pd

df = pd.DataFrame({'a': ['1',1,3], 'b' : [1,2,3]})

print(df.dtypes)

categoricals = list(df.select_dtypes(include='object').columns.values)
categoricals =categoricals + ['a']

df[categoricals] = df[categoricals].astype('category')

print(df.dtypes)

Python crashes with

a    object
b     int64
dtype: object
Fatal Python error: Cannot recover from stack overflow.

Current thread 0x00007f806cac8700 (most recent call first):
  File "<frozen importlib._bootstrap>", line 172 in _get_module_lock
  File "<frozen importlib._bootstrap>", line 148 in __enter__
  File "<frozen importlib._bootstrap>", line 960 in _find_and_load
  File "<frozen importlib._bootstrap>", line 205 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 936 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 961 in _find_and_load
  File "<frozen importlib._bootstrap>", line 205 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 936 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 961 in _find_and_load
  File "/home/runner/.site-packages/pandas/core/indexes/base.py", line 4960 in _ensure_index
  File "/home/runner/.site-packages/pandas/core/indexes/base.py", line 3363 in get_indexer_non_unique
  File "/home/runner/.site-packages/pandas/core/indexes/base.py", line 3386 in get_indexer_for
  File "/home/runner/.site-packages/pandas/core/internals.py", line 4132 in get
  File "/home/runner/.site-packages/pandas/core/frame.py", line 2698 in _getitem_column
  File "/home/runner/.site-packages/pandas/core/frame.py", line 2671 in __getitem__
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper

Problem description

One would expect pandas to raise an error that there is duplicate columns or remove duplicate instead of crashing.

I'm using Python 3.6.1 and pandas-0.23.4.

Expected Output

"The list of columns you have supplied has duplicates"

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-1011-gcp
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: None
pip: 9.0.1
setuptools: 40.6.2
Cython: None
numpy: 1.15.4
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@thomasfrederikhoeck
Copy link
Author

It also seem to be the case when mapping from int64 too (so not specific to object):

import pandas as pd

df = pd.DataFrame({'a': ['1',1,3], 'b' : [1,2,3]})

print(df.dtypes)

categoricals = list(df.select_dtypes(include='int64').columns.values)
categoricals =categoricals + ['b']

df[categoricals] = df[categoricals].astype('category')

print(df.dtypes)
a    object
b     int64
dtype: object
Fatal Python error: Cannot recover from stack overflow.

Current thread 0x00007fa47ad78700 (most recent call first):
  File "<frozen importlib._bootstrap>", line 172 in _get_module_lock
  File "<frozen importlib._bootstrap>", line 148 in __enter__
  File "<frozen importlib._bootstrap>", line 960 in _find_and_load
  File "<frozen importlib._bootstrap>", line 205 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 936 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 961 in _find_and_load
  File "<frozen importlib._bootstrap>", line 205 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 936 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 961 in _find_and_load
  File "/home/runner/.site-packages/pandas/core/indexes/base.py", line 4960 in _ensure_index
  File "/home/runner/.site-packages/pandas/core/indexes/base.py", line 3363 in get_indexer_non_unique
  File "/home/runner/.site-packages/pandas/core/indexes/base.py", line 3386 in get_indexer_for
  File "/home/runner/.site-packages/pandas/core/internals.py", line 4132 in get
  File "/home/runner/.site-packages/pandas/core/frame.py", line 2698 in _getitem_column
  File "/home/runner/.site-packages/pandas/core/frame.py", line 2671 in __getitem__
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper
  File "/home/runner/.site-packages/pandas/core/generic.py", line 4996 in <genexpr>
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 256 in __init__
  File "/home/runner/.site-packages/pandas/core/reshape/concat.py", line 225 in concat
  File "/home/runner/.site-packages/pandas/core/generic.py", line 5005 in astype
  File "/home/runner/.site-packages/pandas/util/_decorators.py", line 178 in wrapper

@thomasfrederikhoeck
Copy link
Author

But not when the astype is 'int64':

import pandas as pd

df = pd.DataFrame({'a': ['1',1,3], 'b' : [1,2,3]})

print(df.dtypes)

categoricals = list(df.select_dtypes(include='object').columns.values)
categoricals =categoricals + ['a']

df[categoricals] = df[categoricals].astype('int64')

print(df.dtypes)
a    object
b     int64
dtype: object
a    int64
b    int64
dtype: object

So it is probably related to category.

@jreback
Copy link
Contributor

jreback commented Jan 10, 2019

you are trying to set with a duplicate

In [6]: categoricals
Out[6]: ['b', 'b']

@jschendel
Copy link
Member

It looks like there are actually two issues here:

1. DataFrame.astype(ExtensionDtype) fails with duplicate columns

In [2]: df = pd.DataFrame([[1, 2], [1, 1], [3, 2]], columns=['a', 'a'])

In [3]: df
Out[3]:
   a  a
0  1  2
1  1  1
2  3  2

In [4]: df.astype('category')
---------------------------------------------------------------------------
RecursionError: maximum recursion depth exceeded

In [5]: df.astype('Int64')
---------------------------------------------------------------------------
RecursionError: maximum recursion depth exceeded

This works for other dtypes when duplicate columns are present, and the fix looks easy, so we could probably support it.

2. Setting to a DataFrame[ExtensionDtype] with duplicate columns results in object dtype

Even if item 1 was fixed, the setting process would result in an object dtype instead of a categorical/extension dtype:

In [6]: df = pd.DataFrame({'a': [1, 1, 2], 'b' :['foo', 'bar', 'baz']})

In [7]: df
Out[7]:
   a    b
0  1  foo
1  1  bar
2  2  baz

In [8]: df.dtypes
Out[8]:
a     int64
b    object
dtype: object

In [9]: df_aa = pd.concat([pd.Series([10, 20, 30], name='a', dtype='category'),
    ...:                   pd.Series([11, 22, 33], name='a', dtype='category')], axis=1)

In [10]: df_aa
Out[10]:
    a   a
0  10  11
1  20  22
2  30  33

In [11]: df_aa.dtypes
Out[11]:
a    category
a    category
dtype: object

In [12]: df['a'] = df_aa

In [13]: df
Out[13]:
    a    b
0  10  foo
1  20  bar
2  30  baz

In [14]: df.dtypes
Out[14]:
a    object
b    object
dtype: object

In [15]: df[['a', 'a']] = df_aa

In [16]: df
Out[16]:
    a    b
0  10  foo
1  20  bar
2  30  baz

In [17]: df.dtypes
Out[17]:
a    object
b    object
dtype: object

I'm not sure that this should be supported. The operation doesn't really make sense to me, and I'm a little bit surprised that it didn't raise.

@jreback : what are your thoughts on items 1 and 2?

@jreback
Copy link
Contributor

jreback commented Jan 10, 2019

yeah 1) is ok, 2) is somewhat tricky and prob ok to not support right away.

@jschendel jschendel added Bug ExtensionArray Extending pandas with custom dtypes or arrays. labels Jan 10, 2019
@mroeschke mroeschke added the Indexing Related to indexing on series/frames, not to indexes themselves label Jun 25, 2021
@jbrockmendel jbrockmendel added the Segfault Non-Recoverable Error label Jun 29, 2021
@roberthdevries
Copy link
Contributor

I am not able to reproduce this issue with the code posted as the faulty example.
It now returns

a    object
b     int64
dtype: object
Traceback (most recent call last):
  File "bug-24704.py", line 10, in <module>
    df[categoricals] = df[categoricals].astype('category')
  File "/home/robert/projects/pandas/pandas/core/frame.py", line 3875, in __setitem__
    self._setitem_array(key, value)
  File "/home/robert/projects/pandas/pandas/core/frame.py", line 3919, in _setitem_array
    self[k1] = value[k2]
  File "/home/robert/projects/pandas/pandas/core/frame.py", line 3877, in __setitem__
    self._set_item_frame_value(key, value)
  File "/home/robert/projects/pandas/pandas/core/frame.py", line 4007, in _set_item_frame_value
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key

@mroeschke
Copy link
Member

Since this looks fixed on main, closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Astype Bug ExtensionArray Extending pandas with custom dtypes or arrays. Indexing Related to indexing on series/frames, not to indexes themselves Segfault Non-Recoverable Error
Projects
None yet
Development

No branches or pull requests

6 participants