Skip to content

BUG: fix union_indexes not supporting sort=False for Index subclasses #35098

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 44 commits into from
Jul 9, 2020
Merged
Show file tree
Hide file tree
Changes from 43 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
148a589
BUG: fix index detection in _sanitize_and_check
AlexKirko Jul 2, 2020
8a51ded
DOC: add comment to fix
AlexKirko Jul 2, 2020
371fd6e
TST: add tests
AlexKirko Jul 2, 2020
e5f1e68
DOC: add whatsnew entry
AlexKirko Jul 2, 2020
9701bb5
REFACT: refact the boolean expression
AlexKirko Jul 2, 2020
3208fbe
REFACT: use generator comprehension
AlexKirko Jul 2, 2020
a15cef4
TST: add cases to the test
AlexKirko Jul 3, 2020
933956c
preserve Index subclass, exclude MultiIndex
AlexKirko Jul 3, 2020
4ecd83e
include RangeIndex in exceptions
AlexKirko Jul 3, 2020
896d62a
exclude DatetimeIndex
AlexKirko Jul 3, 2020
950b279
exclude CategoricalIndex
AlexKirko Jul 3, 2020
4b2076d
CLN: sort imports in test_common
AlexKirko Jul 3, 2020
6f2f3c9
DOC: remove unnecessary backquoutes in whatsnew
AlexKirko Jul 3, 2020
0a6c62e
DOC: add missing colon to whatsnew
AlexKirko Jul 3, 2020
9ca48f7
pass sort to Index.union instead of previous approach
AlexKirko Jul 5, 2020
f41b7e7
explicitly ignore sort for heavily modified Index subclasses
AlexKirko Jul 5, 2020
2fe2c9d
fully incorporate sort ignoring for particular Index subtypes
AlexKirko Jul 5, 2020
cefd5c9
CLN: fix typo
AlexKirko Jul 5, 2020
605ac21
CLN: switch ind_types to list comprehension
AlexKirko Jul 5, 2020
18abbae
test without DatetimeIndex and CategoricalIndex in exceptions
AlexKirko Jul 5, 2020
81116d4
test without multi, range, datetime in exceptions
AlexKirko Jul 5, 2020
a7ec4bf
return MultiIndex, RangeIndex, CategoricalIndex to exceptions
AlexKirko Jul 5, 2020
e85529d
CLN: black indexes/api.py
AlexKirko Jul 5, 2020
683f615
Merge branch 'master' into append-err-sort
AlexKirko Jul 5, 2020
c76859e
restart tests
AlexKirko Jul 6, 2020
11773a5
Merge branch 'master' into append-err-sort
AlexKirko Jul 6, 2020
836fc1a
Merge branch 'master' into append-err-sort: pin isort and sphinx
AlexKirko Jul 6, 2020
b9d5ab4
add DatetimeIndex back into exceptions
AlexKirko Jul 6, 2020
ee7048e
DOC: move to Reshaping in whatsnew and edit
AlexKirko Jul 7, 2020
1f67a3c
pass sort through to all Index subclasses
AlexKirko Jul 7, 2020
c722154
TST: alter test_construct_with_two_categoricalindex_series
AlexKirko Jul 7, 2020
ab291a7
TST: alter test_str_cat_align_mixed_inputs
AlexKirko Jul 7, 2020
93a4ccc
TST: alter test_unbalanced in test_melt
AlexKirko Jul 7, 2020
93452c1
TST: alter nonnumeric and float suffix tests in test_melt
AlexKirko Jul 7, 2020
fb3a906
CLN: run black pandas
AlexKirko Jul 7, 2020
0479618
TST: add OP test, use fixture in the index test
AlexKirko Jul 7, 2020
0f96dbe
CLN: run black on test files
AlexKirko Jul 7, 2020
ada7346
REFACT: remove unnecessary change of sort from True to None
AlexKirko Jul 7, 2020
c3cbecc
Revert "REFACT: remove unnecessary change of sort from True to None"
AlexKirko Jul 7, 2020
7cc4d9d
restart tests
AlexKirko Jul 7, 2020
4c1cc42
DOC: add comment with ref to sort=None issue
AlexKirko Jul 8, 2020
cc0d167
DOC: clarify comment language
AlexKirko Jul 8, 2020
54140af
DOC: clarify comment more
AlexKirko Jul 8, 2020
1c371bd
restart tests
AlexKirko Jul 8, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1112,6 +1112,7 @@ Reshaping
- Fixed bug in :func:`melt` where melting MultiIndex columns with ``col_level`` > 0 would raise a ``KeyError`` on ``id_vars`` (:issue:`34129`)
- Bug in :meth:`Series.where` with an empty Series and empty ``cond`` having non-bool dtype (:issue:`34592`)
- Fixed regression where :meth:`DataFrame.apply` would raise ``ValueError`` for elements whth ``S`` dtype (:issue:`34529`)
- Bug in :meth:`DataFrame.append` leading to sorting columns even when ``sort=False`` is specified (:issue:`35092`)

Sparse
^^^^^^
Expand Down
8 changes: 7 additions & 1 deletion pandas/core/indexes/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,13 @@ def conv(i):
return result.union_many(indexes[1:])
else:
for other in indexes[1:]:
result = result.union(other)
# GH 35092. Index.union expects sort=None instead of sort=True
# to signify that sort=True isn't fully implemented and
# legacy implementation sometimes might not sort (see GH 24959)
# In this case we currently sort in _get_combined_index
if sort:
sort = None
result = result.union(other, sort=sort)
return result
elif kind == "array":
index = indexes[0]
Expand Down
6 changes: 4 additions & 2 deletions pandas/tests/frame/test_constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -2542,11 +2542,13 @@ def test_construct_with_two_categoricalindex_series(self):
index=pd.CategoricalIndex(["f", "female", "m", "male", "unknown"]),
)
result = DataFrame([s1, s2])
# GH 35092. Extra s2 columns are now appended to s1 columns
# in original order
expected = DataFrame(
np.array(
[[np.nan, 39.0, np.nan, 6.0, 4.0], [2.0, 152.0, 2.0, 242.0, 150.0]]
[[39.0, 6.0, 4.0, np.nan, np.nan], [152.0, 242.0, 150.0, 2.0, 2.0]]
),
columns=["f", "female", "m", "male", "unknown"],
columns=["female", "male", "unknown", "f", "m"],
)
tm.assert_frame_equal(result, expected)

Expand Down
18 changes: 17 additions & 1 deletion pandas/tests/indexes/test_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,9 @@
from pandas.core.dtypes.common import is_period_dtype, needs_i8_conversion

import pandas as pd
from pandas import CategoricalIndex, MultiIndex, RangeIndex
from pandas import CategoricalIndex, Index, MultiIndex, RangeIndex
import pandas._testing as tm
from pandas.core.indexes.api import union_indexes


class TestCommon:
Expand Down Expand Up @@ -395,3 +396,18 @@ def test_astype_preserves_name(self, index, dtype, copy):
assert result.names == index.names
else:
assert result.name == index.name


@pytest.mark.parametrize("arr", [[0, 1, 4, 3]])
@pytest.mark.parametrize("dtype", ["int8", "int16", "int32", "int64"])
def test_union_index_no_sort(arr, sort, dtype):
# GH 35092. Check that we don't sort with sort=False
ind1 = Index(arr[:2], dtype=dtype)
ind2 = Index(arr[2:], dtype=dtype)

# sort is None indicates that we sort the combined index
if sort is None:
arr.sort()
expected = Index(arr, dtype=dtype)
result = union_indexes([ind1, ind2], sort=sort)
tm.assert_index_equal(result, expected)
14 changes: 14 additions & 0 deletions pandas/tests/reshape/test_concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -2857,3 +2857,17 @@ def test_concat_frame_axis0_extension_dtypes():
result = pd.concat([df2, df1], ignore_index=True)
expected = pd.DataFrame({"a": [4, 5, 6, 1, 2, 3]}, dtype="Int64")
tm.assert_frame_equal(result, expected)


@pytest.mark.parametrize("sort", [True, False])
def test_append_sort(sort):
# GH 35092. Check that DataFrame.append respects the sort argument.
df1 = pd.DataFrame(data={0: [1, 2], 1: [3, 4]})
df2 = pd.DataFrame(data={3: [1, 2], 2: [3, 4]})
cols = list(df1.columns) + list(df2.columns)
if sort:
cols.sort()

result = df1.append(df2, sort=sort).columns
expected = type(result)(cols)
tm.assert_index_equal(result, expected)
26 changes: 13 additions & 13 deletions pandas/tests/reshape/test_melt.py
Original file line number Diff line number Diff line change
Expand Up @@ -691,11 +691,11 @@ def test_unbalanced(self):
)
df["id"] = df.index
exp_data = {
"X": ["X1", "X1", "X2", "X2"],
"A": [1.0, 3.0, 2.0, 4.0],
"B": [5.0, np.nan, 6.0, np.nan],
"id": [0, 0, 1, 1],
"year": [2010, 2011, 2010, 2011],
"X": ["X1", "X2", "X1", "X2"],
"A": [1.0, 2.0, 3.0, 4.0],
"B": [5.0, 6.0, np.nan, np.nan],
"id": [0, 1, 0, 1],
"year": [2010, 2010, 2011, 2011],
}
expected = pd.DataFrame(exp_data)
expected = expected.set_index(["id", "year"])[["X", "A", "B"]]
Expand Down Expand Up @@ -938,10 +938,10 @@ def test_nonnumeric_suffix(self):
)
expected = pd.DataFrame(
{
"A": ["X1", "X1", "X2", "X2"],
"colname": ["placebo", "test", "placebo", "test"],
"result": [5.0, np.nan, 6.0, np.nan],
"treatment": [1.0, 3.0, 2.0, 4.0],
"A": ["X1", "X2", "X1", "X2"],
"colname": ["placebo", "placebo", "test", "test"],
"result": [5.0, 6.0, np.nan, np.nan],
"treatment": [1.0, 2.0, 3.0, 4.0],
}
)
expected = expected.set_index(["A", "colname"])
Expand Down Expand Up @@ -985,10 +985,10 @@ def test_float_suffix(self):
)
expected = pd.DataFrame(
{
"A": ["X1", "X1", "X1", "X1", "X2", "X2", "X2", "X2"],
"colname": [1, 1.1, 1.2, 2.1, 1, 1.1, 1.2, 2.1],
"result": [0.0, np.nan, 5.0, np.nan, 9.0, np.nan, 6.0, np.nan],
"treatment": [np.nan, 1.0, np.nan, 3.0, np.nan, 2.0, np.nan, 4.0],
"A": ["X1", "X2", "X1", "X2", "X1", "X2", "X1", "X2"],
"colname": [1.2, 1.2, 1.0, 1.0, 1.1, 1.1, 2.1, 2.1],
"result": [5.0, 6.0, 0.0, 9.0, np.nan, np.nan, np.nan, np.nan],
"treatment": [np.nan, np.nan, np.nan, np.nan, 1.0, 2.0, 3.0, 4.0],
}
)
expected = expected.set_index(["A", "colname"])
Expand Down
9 changes: 8 additions & 1 deletion pandas/tests/test_strings.py
Original file line number Diff line number Diff line change
Expand Up @@ -636,8 +636,15 @@ def test_str_cat_align_mixed_inputs(self, join):
# mixed list of indexed/unindexed
u = np.array(["A", "B", "C", "D"])
expected_outer = Series(["aaA", "bbB", "c-C", "ddD", "-e-"])

# joint index of rhs [t, u]; u will be forced have index of s
rhs_idx = t.index & s.index if join == "inner" else t.index | s.index
# GH 35092. If right join, maintain order of t.index
if join == "inner":
rhs_idx = t.index & s.index
elif join == "right":
rhs_idx = t.index.union(s.index, sort=False)
else:
rhs_idx = t.index | s.index

expected = expected_outer.loc[s.index.join(rhs_idx, how=join)]
result = s.str.cat([t, u], join=join, na_rep="-")
Expand Down