Skip to content

BUG: fix union_indexes not supporting sort=False for Index subclasses #35098

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 44 commits into from
Jul 9, 2020
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
148a589
BUG: fix index detection in _sanitize_and_check
AlexKirko Jul 2, 2020
8a51ded
DOC: add comment to fix
AlexKirko Jul 2, 2020
371fd6e
TST: add tests
AlexKirko Jul 2, 2020
e5f1e68
DOC: add whatsnew entry
AlexKirko Jul 2, 2020
9701bb5
REFACT: refact the boolean expression
AlexKirko Jul 2, 2020
3208fbe
REFACT: use generator comprehension
AlexKirko Jul 2, 2020
a15cef4
TST: add cases to the test
AlexKirko Jul 3, 2020
933956c
preserve Index subclass, exclude MultiIndex
AlexKirko Jul 3, 2020
4ecd83e
include RangeIndex in exceptions
AlexKirko Jul 3, 2020
896d62a
exclude DatetimeIndex
AlexKirko Jul 3, 2020
950b279
exclude CategoricalIndex
AlexKirko Jul 3, 2020
4b2076d
CLN: sort imports in test_common
AlexKirko Jul 3, 2020
6f2f3c9
DOC: remove unnecessary backquoutes in whatsnew
AlexKirko Jul 3, 2020
0a6c62e
DOC: add missing colon to whatsnew
AlexKirko Jul 3, 2020
9ca48f7
pass sort to Index.union instead of previous approach
AlexKirko Jul 5, 2020
f41b7e7
explicitly ignore sort for heavily modified Index subclasses
AlexKirko Jul 5, 2020
2fe2c9d
fully incorporate sort ignoring for particular Index subtypes
AlexKirko Jul 5, 2020
cefd5c9
CLN: fix typo
AlexKirko Jul 5, 2020
605ac21
CLN: switch ind_types to list comprehension
AlexKirko Jul 5, 2020
18abbae
test without DatetimeIndex and CategoricalIndex in exceptions
AlexKirko Jul 5, 2020
81116d4
test without multi, range, datetime in exceptions
AlexKirko Jul 5, 2020
a7ec4bf
return MultiIndex, RangeIndex, CategoricalIndex to exceptions
AlexKirko Jul 5, 2020
e85529d
CLN: black indexes/api.py
AlexKirko Jul 5, 2020
683f615
Merge branch 'master' into append-err-sort
AlexKirko Jul 5, 2020
c76859e
restart tests
AlexKirko Jul 6, 2020
11773a5
Merge branch 'master' into append-err-sort
AlexKirko Jul 6, 2020
836fc1a
Merge branch 'master' into append-err-sort: pin isort and sphinx
AlexKirko Jul 6, 2020
b9d5ab4
add DatetimeIndex back into exceptions
AlexKirko Jul 6, 2020
ee7048e
DOC: move to Reshaping in whatsnew and edit
AlexKirko Jul 7, 2020
1f67a3c
pass sort through to all Index subclasses
AlexKirko Jul 7, 2020
c722154
TST: alter test_construct_with_two_categoricalindex_series
AlexKirko Jul 7, 2020
ab291a7
TST: alter test_str_cat_align_mixed_inputs
AlexKirko Jul 7, 2020
93a4ccc
TST: alter test_unbalanced in test_melt
AlexKirko Jul 7, 2020
93452c1
TST: alter nonnumeric and float suffix tests in test_melt
AlexKirko Jul 7, 2020
fb3a906
CLN: run black pandas
AlexKirko Jul 7, 2020
0479618
TST: add OP test, use fixture in the index test
AlexKirko Jul 7, 2020
0f96dbe
CLN: run black on test files
AlexKirko Jul 7, 2020
ada7346
REFACT: remove unnecessary change of sort from True to None
AlexKirko Jul 7, 2020
c3cbecc
Revert "REFACT: remove unnecessary change of sort from True to None"
AlexKirko Jul 7, 2020
7cc4d9d
restart tests
AlexKirko Jul 7, 2020
4c1cc42
DOC: add comment with ref to sort=None issue
AlexKirko Jul 8, 2020
cc0d167
DOC: clarify comment language
AlexKirko Jul 8, 2020
54140af
DOC: clarify comment more
AlexKirko Jul 8, 2020
1c371bd
restart tests
AlexKirko Jul 8, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -974,6 +974,7 @@ Indexing
- Bug in :meth:`DataFrame.loc` with dictionary of values changes columns with dtype of ``int`` to ``float`` (:issue:`34573`)
- Bug in :meth:`Series.loc` when used with a :class:`MultiIndex` would raise an IndexingError when accessing a None value (:issue:`34318`)
- Bug in :meth:`DataFrame.reset_index` and :meth:`Series.reset_index` would not preserve data types on an empty :class:`DataFrame` or :class:`Series` with a :class:`MultiIndex` (:issue:`19602`)
- Bug in :func:`pandas.core.indexes.api.union_indexes` would lead to :meth:`DataFrame.append` sorting columns even when ``sort=False`` is specified (:issue:`35092`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need the first part (as this is a private function); move to reshaping section of Bug Fixes


Missing
^^^^^^^
Expand Down
17 changes: 16 additions & 1 deletion pandas/core/indexes/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,17 @@ def conv(i):

return Index(lib.fast_unique_multiple_list([conv(i) for i in inds], sort=sort))

# GH 35092. Detect if we have an Index type, for which the sort
# setting doesn't make sense
ind_types = list({type(index) for index in indexes})
if any(
ind_type in [MultiIndex, RangeIndex, DatetimeIndex, CategoricalIndex]
Copy link
Member Author

@AlexKirko AlexKirko Jul 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gfyoung Turns out that DatetimeIndex breaks the test where we use eval to add a Series and a DataFrame in test_basic_series_frame_alignment in test_eval.py, but this happens only on the MacOS pipeline. So I'm keeping this in the exception list too.

This is the last example of each of these types of indices breaking sorting expectations somewhere.

for ind_type in ind_types
):
ignore_sort = True
else:
ignore_sort = False
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we let any of these index type not get sorted, we'll break some tests where we assume that the results of joining on any of these come out sorted. Moreover, there isn't much point to having any of them unsorted except for pretty printing purposes, in my opinion.
If you think we should pass sort through for every type of Index subclass, please let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you show what tests break?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would rather just pass thru and adjust the tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


if kind == "special":
result = indexes[0]

Expand All @@ -214,7 +225,11 @@ def conv(i):
return result.union_many(indexes[1:])
else:
for other in indexes[1:]:
result = result.union(other)
# GH 35092. Pass sort to Index.union
# Index.union expects sort=None instead of sort=True
if sort:
sort = None
result = result.union(other, sort=sort)
return result
elif kind == "array":
index = indexes[0]
Expand Down
15 changes: 14 additions & 1 deletion pandas/tests/indexes/test_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,9 @@
from pandas.core.dtypes.common import is_period_dtype, needs_i8_conversion

import pandas as pd
from pandas import CategoricalIndex, MultiIndex, RangeIndex
from pandas import CategoricalIndex, Index, MultiIndex, RangeIndex
import pandas._testing as tm
from pandas.core.indexes.api import union_indexes


class TestCommon:
Expand Down Expand Up @@ -395,3 +396,15 @@ def test_astype_preserves_name(self, index, dtype, copy):
assert result.names == index.names
else:
assert result.name == index.name


@pytest.mark.parametrize("exp_arr, sort", [([0, 1, 4, 3], False), ([0, 1, 3, 4], True)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use the index fixture

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Using the sort fixture. There are only two in conftest, and only this one is useful.

@pytest.mark.parametrize("dtype", ["int8", "int16", "int32", "int64"])
def test_union_index_no_sort(exp_arr, sort, dtype):
# GH 35092. Check that we don't sort with sort=False
ind1 = Index([0, 1], dtype=dtype)
ind2 = Index([4, 3], dtype=dtype)

expected = Index(exp_arr, dtype=dtype)
result = union_indexes([ind1, ind2], sort=sort)
tm.assert_index_equal(result, expected)