Skip to content

Commit 1eb443e

Browse files
committed
BUG: fix bug where appending unordered CategoricalIndex variables overrides index (#24845)
1 parent a62897a commit 1eb443e

File tree

3 files changed

+64
-2
lines changed

3 files changed

+64
-2
lines changed

doc/source/whatsnew/v1.5.0.rst

+41
Original file line numberDiff line numberDiff line change
@@ -390,6 +390,47 @@ upon serialization. (Related issue :issue:`12997`)
390390
# Roundtripping now works
391391
pd.read_json(a.to_json(date_format='iso'), typ="series").index == a.index
392392
393+
.. _whatsnew_150.notable_bug_fixes.concat_unordered_categoricalindex_overrides_indices:
394+
395+
Concatenating two ``CategoricalIndex`` variables with ``ordered=False`` yields correct result
396+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
397+
398+
Calling :meth:`DataFrame.concat` on two (or more) ``CategoricalIndex`` values with ``ordered=False`` would
399+
incorrectly initialize index values (:issue:`24845`). As a result, if a you are concatenating two ``DataFrames`` with
400+
``CategoricalIndex`` values with ``ordered=False``, the index of the second parameter of :meth:`DataFrame.concat`
401+
would disappear.
402+
403+
.. ipython:: ipython
404+
405+
In [1]: category1 = pd.CategoricalIndex(["a", "a"], categories=["a", "b"], ordered=False)
406+
category2 = pd.CategoricalIndex(["b", "b"], categories=["b", "a"], ordered=False)
407+
df1 = pd.DataFrame({"A": [1, 2]}, index=category1)
408+
df2 = pd.DataFrame({"B": [3, 4]}, index=category2)
409+
410+
*Old Behavior*
411+
412+
.. ipython:: ipython
413+
414+
In [2]: pd.concat((df1, df2))
415+
Out[2]:
416+
A
417+
a 1
418+
a 2
419+
a 3
420+
a 4
421+
422+
*New Behavior*
423+
424+
.. ipython:: ipython
425+
426+
In [3]: pd.concat((df1, df2))
427+
Out[3]:
428+
A
429+
a 1
430+
a 2
431+
b 3
432+
b 4
433+
393434
.. ---------------------------------------------------------------------------
394435
.. _whatsnew_150.api_breaking:
395436

pandas/core/indexes/category.py

+5-2
Original file line numberDiff line numberDiff line change
@@ -571,14 +571,17 @@ def map(self, mapper):
571571

572572
def _concat(self, to_concat: list[Index], name: Hashable) -> Index:
573573
# if calling index is category, don't check dtype of others
574+
574575
try:
575-
codes = np.concatenate([self._is_dtype_compat(c).codes for c in to_concat])
576+
cat = Categorical._concat_same_type(
577+
[self._is_dtype_compat(c) for c in to_concat]
578+
)
576579
except TypeError:
577580
# not all to_concat elements are among our categories (or NA)
578581
from pandas.core.dtypes.concat import concat_compat
579582

580583
res = concat_compat([x._values for x in to_concat])
584+
581585
return Index(res, name=name)
582586
else:
583-
cat = self._data._from_backing_data(codes)
584587
return type(self)._simple_new(cat, name=name)

pandas/tests/reshape/concat/test_categorical.py

+18
Original file line numberDiff line numberDiff line change
@@ -238,3 +238,21 @@ def test_categorical_missing_from_one_frame(self):
238238
index=[0, 1, 2, 0, 1, 2],
239239
)
240240
tm.assert_frame_equal(result, expected)
241+
242+
def test_concat_categorical_same_categories_different_order(self):
243+
# https://github.com/pandas-dev/pandas/issues/24845
244+
245+
c1 = pd.CategoricalIndex(["a", "a"], categories=["a", "b"], ordered=False)
246+
c2 = pd.CategoricalIndex(["b", "b"], categories=["b", "a"], ordered=False)
247+
c3 = pd.CategoricalIndex(
248+
["a", "a", "b", "b"], categories=["a", "b"], ordered=False
249+
)
250+
251+
df1 = DataFrame({"A": [1, 2]}, index=c1)
252+
df2 = DataFrame({"A": [3, 4]}, index=c2)
253+
254+
result = pd.concat((df1, df2))
255+
256+
expected = DataFrame({"A": [1, 2, 3, 4]}, index=c3)
257+
258+
tm.assert_frame_equal(result, expected)

0 commit comments

Comments
 (0)