Skip to content

BUG: reindexing empty CategoricalIndex fails if target contains duplicates #38906

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
batterseapower opened this issue Jan 2, 2021 · 1 comment · Fixed by #39046
Closed
2 of 3 tasks

BUG: reindexing empty CategoricalIndex fails if target contains duplicates #38906

batterseapower opened this issue Jan 2, 2021 · 1 comment · Fixed by #39046
Labels
Bug Categorical Categorical Data Type Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@batterseapower
Copy link
Contributor

batterseapower commented Jan 2, 2021

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandas. (Tested version 1.2.0)
  • (optional) I have confirmed this bug exists on the master branch of pandas.

This fails:

pd.DataFrame(columns=pd.CategoricalIndex([]), index=['K']).reindex(columns=pd.CategoricalIndex(['A', 'A']))

But these succeed:

pd.DataFrame(columns=pd.Index([]), index=['K']).reindex(columns=pd.CategoricalIndex(['A', 'A']))
pd.DataFrame(columns=pd.CategoricalIndex([]), index=['K']).reindex(columns=pd.CategoricalIndex(['A', 'B']))
pd.DataFrame(columns=pd.CategoricalIndex([]), index=['K']).reindex(columns=pd.CategoricalIndex([]))

The error is:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\mboling\Anaconda3\envs\pandastest\lib\site-packages\pandas\util\_decorators.py", line 312, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\mboling\Anaconda3\envs\pandastest\lib\site-packages\pandas\core\frame.py", line 4173, in reindex
    return super().reindex(**kwargs)
  File "C:\Users\mboling\Anaconda3\envs\pandastest\lib\site-packages\pandas\core\generic.py", line 4806, in reindex
    return self._reindex_axes(
  File "C:\Users\mboling\Anaconda3\envs\pandastest\lib\site-packages\pandas\core\frame.py", line 4013, in _reindex_axes
    frame = frame._reindex_columns(
  File "C:\Users\mboling\Anaconda3\envs\pandastest\lib\site-packages\pandas\core\frame.py", line 4055, in _reindex_columns
    new_columns, indexer = self.columns.reindex(
  File "C:\Users\mboling\Anaconda3\envs\pandastest\lib\site-packages\pandas\core\indexes\category.py", line 448, in reindex
    new_target, indexer, _ = result._reindex_non_unique(np.array(target))
  File "C:\Users\mboling\Anaconda3\envs\pandastest\lib\site-packages\pandas\core\indexes\base.py", line 3589, in _reindex_non_unique
    new_indexer = np.arange(len(self.take(indexer)))
  File "C:\Users\mboling\Anaconda3\envs\pandastest\lib\site-packages\pandas\core\indexes\base.py", line 751, in take
    taken = algos.take(
  File "C:\Users\mboling\Anaconda3\envs\pandastest\lib\site-packages\pandas\core\algorithms.py", line 1657, in take
    result = arr.take(indices, axis=axis)
IndexError: cannot do a non-empty take from an empty axes.

Problem description

It is unexpected that CategoricalIndex behaves differently than Index in this regard. A problem similar to this was already reported and solved in #16770, but it looks like there is a remaining bug in the edge case where the target index contains duplicates.

Expected Output

The failing code should return a dataframe with two columns and one row.

Output of pd.show_versions()

pd.show_versions() fails because I dont' have numba installed in this environment and it's currently impossible to install it simultaneously with Pandas 1.2.0, so this is the output of "conda list" instead:

blas 1.0 mkl
bottleneck 1.3.2 py39h7cc1a96_1
ca-certificates 2020.12.8 haa95532_0
certifi 2020.12.5 py39haa95532_0
et_xmlfile 1.0.1 py_1001
icc_rt 2019.0.0 h0cc432a_1
intel-openmp 2020.3 h57928b3_311 conda-forge
jdcal 1.4.1 py_0
libblas 3.9.0 5_mkl conda-forge
libcblas 3.9.0 5_mkl conda-forge
liblapack 3.9.0 5_mkl conda-forge
mkl 2020.4 hb70f87d_311 conda-forge
mkl-service 2.3.0 py39h196d8e1_0
numpy 1.19.4 py39h6635163_1 conda-forge
openpyxl 3.0.5 py_0
openssl 1.1.1i h2bbff1b_0
pandas 1.2.0 py39h2e25243_0 conda-forge
pip 20.3.3 pyhd8ed1ab_0 conda-forge
python 3.9.1 h7840368_2_cpython conda-forge
python-dateutil 2.8.1 py_0 conda-forge
python_abi 3.9 1_cp39 conda-forge
pytz 2020.5 pyhd8ed1ab_0 conda-forge
scipy 1.5.2 py39h14eb087_0
setuptools 49.6.0 py39h467e6f4_2 conda-forge
six 1.15.0 pyh9f0ad1d_0 conda-forge
sqlite 3.34.0 h8ffe710_0 conda-forge
tzdata 2020f he74cb21_0 conda-forge
vc 14.2 hb210afc_2 conda-forge
vs2015_runtime 14.28.29325 h5e1d092_0 conda-forge
wheel 0.36.2 pyhd3deb0d_0 conda-forge
wincertstore 0.2 py39hde42818_1005 conda-forge
xlrd 2.0.1 pyhd3eb1b0_0

@batterseapower batterseapower added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 2, 2021
@MarcoGorelli MarcoGorelli added Categorical Categorical Data Type and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 3, 2021
@MarcoGorelli
Copy link
Member

Thanks @batterseapower , can confirm this reproduces

batterseapower added a commit to batterseapower/pandas that referenced this issue Jan 8, 2021
batterseapower added a commit to batterseapower/pandas that referenced this issue Jan 8, 2021
batterseapower added a commit to batterseapower/pandas that referenced this issue Jan 9, 2021
batterseapower added a commit to batterseapower/pandas that referenced this issue Jan 9, 2021
batterseapower added a commit to batterseapower/pandas that referenced this issue Jan 9, 2021
@jreback jreback added the Indexing Related to indexing on series/frames, not to indexes themselves label Jan 9, 2021
@jreback jreback added this to the 1.3 milestone Jan 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Categorical Categorical Data Type Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants