Skip to content

Commit 75a799c

Browse files
Backport PR #47206 on branch 1.4.x (REGR: concat not sorting columns for mixed column names) (#47251)
Backport PR #47206: REGR: concat not sorting columns for mixed column names Co-authored-by: Patrick Hoefler <[email protected]>
1 parent fb27ba9 commit 75a799c

File tree

3 files changed

+34
-1
lines changed

3 files changed

+34
-1
lines changed

doc/source/whatsnew/v1.4.3.rst

+1
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ Fixed regressions
1717
- Fixed regression in :meth:`DataFrame.replace` when the replacement value was explicitly ``None`` when passed in a dictionary to ``to_replace`` also casting other columns to object dtype even when there were no values to replace (:issue:`46634`)
1818
- Fixed regression in :meth:`DataFrame.nsmallest` led to wrong results when ``np.nan`` in the sorting column (:issue:`46589`)
1919
- Fixed regression in :func:`read_fwf` raising ``ValueError`` when ``widths`` was specified with ``usecols`` (:issue:`46580`)
20+
- Fixed regression in :func:`concat` not sorting columns for mixed column names (:issue:`47127`)
2021
- Fixed regression in :meth:`.Groupby.transform` and :meth:`.Groupby.agg` failing with ``engine="numba"`` when the index was a :class:`MultiIndex` (:issue:`46867`)
2122
- Fixed regression is :meth:`.Styler.to_latex` and :meth:`.Styler.to_html` where ``buf`` failed in combination with ``encoding`` (:issue:`47053`)
2223
- Fixed regression in :func:`read_csv` with ``index_col=False`` identifying first row as index names when ``header=None`` (:issue:`46955`)

pandas/core/indexes/api.py

+10-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
from __future__ import annotations
22

33
import textwrap
4+
from typing import cast
5+
6+
import numpy as np
47

58
from pandas._libs import (
69
NaT,
@@ -10,6 +13,7 @@
1013

1114
from pandas.core.dtypes.common import is_dtype_equal
1215

16+
from pandas.core.algorithms import safe_sort
1317
from pandas.core.indexes.base import (
1418
Index,
1519
_new_Index,
@@ -154,7 +158,12 @@ def _get_combined_index(
154158

155159
if sort:
156160
try:
157-
index = index.sort_values()
161+
array_sorted = safe_sort(index)
162+
array_sorted = cast(np.ndarray, array_sorted)
163+
if isinstance(index, MultiIndex):
164+
index = MultiIndex.from_tuples(array_sorted, names=index.names)
165+
else:
166+
index = Index(array_sorted, name=index.name)
158167
except TypeError:
159168
pass
160169

pandas/tests/reshape/concat/test_dataframe.py

+23
Original file line numberDiff line numberDiff line change
@@ -205,3 +205,26 @@ def test_concat_copies(self, axis, order, ignore_index):
205205
for arr in res._iter_column_arrays():
206206
for arr2 in df._iter_column_arrays():
207207
assert not np.shares_memory(arr, arr2)
208+
209+
def test_outer_sort_columns(self):
210+
# GH#47127
211+
df1 = DataFrame({"A": [0], "B": [1], 0: 1})
212+
df2 = DataFrame({"A": [100]})
213+
result = concat([df1, df2], ignore_index=True, join="outer", sort=True)
214+
expected = DataFrame({0: [1.0, np.nan], "A": [0, 100], "B": [1.0, np.nan]})
215+
tm.assert_frame_equal(result, expected)
216+
217+
def test_inner_sort_columns(self):
218+
# GH#47127
219+
df1 = DataFrame({"A": [0], "B": [1], 0: 1})
220+
df2 = DataFrame({"A": [100], 0: 2})
221+
result = concat([df1, df2], ignore_index=True, join="inner", sort=True)
222+
expected = DataFrame({0: [1, 2], "A": [0, 100]})
223+
tm.assert_frame_equal(result, expected)
224+
225+
def test_sort_columns_one_df(self):
226+
# GH#47127
227+
df1 = DataFrame({"A": [100], 0: 2})
228+
result = concat([df1], ignore_index=True, join="inner", sort=True)
229+
expected = DataFrame({0: [2], "A": [100]})
230+
tm.assert_frame_equal(result, expected)

0 commit comments

Comments
 (0)