Skip to content

BUG: support EAs in nargsort without warning #25595

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions pandas/core/sorting.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@

from pandas.core.dtypes.cast import infer_dtype_from_array
from pandas.core.dtypes.common import (
ensure_int64, ensure_platform_int, is_categorical_dtype, is_list_like)
ensure_int64, ensure_platform_int, is_categorical_dtype, is_list_like,
is_extension_array_dtype)
from pandas.core.dtypes.generic import ABCIndexClass
from pandas.core.dtypes.missing import isna

import pandas.core.algorithms as algorithms
Expand Down Expand Up @@ -239,7 +241,9 @@ def nargsort(items, kind='quicksort', ascending=True, na_position='last'):
"""

# specially handle Categorical
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is slightly stale now.

if is_categorical_dtype(items):
if is_extension_array_dtype(items):
if isinstance(items, ABCIndexClass):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so when we have ExtensionArrays-backed indexes in the future, this is liable to break? Or we'll need a ._values as part of the interface...

I don't suppose items = items._values_for_argsort works here? In theory, DatetimeArray could just use i8 values for sorting, bypassing the warning?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

._values is defined to be "the best array representation", so if the Index is backed by an EA, it will always return the EA? (also if we have an ExtensionIndex). I can maybe also use the public .array now?

I don't suppose items = items._values_for_argsort

No, because we want to dispatch to the argsort method of the EA, I think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I mis-remembered where the warning was coming from (I thought it was implemented on DatetimeArray).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why exactly is this change needed? should this not just defer to the index .argsort, which for DTA handles this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Index.argsort has a different signature (and that's not specific to DatetimeTZ, but the general Index.argsort, so I would rather not deal with that here. Can open an issue for that)

items = items._values
if na_position not in {'first', 'last'}:
raise ValueError('invalid na_position: {!r}'.format(na_position))

Expand Down
20 changes: 19 additions & 1 deletion pandas/tests/test_sorting.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,9 @@

from pandas.compat import PY2

from pandas import DataFrame, MultiIndex, Series, compat, concat, merge
from pandas import (
Categorical, DataFrame, MultiIndex, Series, compat,
concat, merge, to_datetime)
from pandas.core import common as com
from pandas.core.sorting import (
decons_group_index, get_group_index, is_int64_overflow_possible,
Expand Down Expand Up @@ -183,6 +185,22 @@ def test_nargsort(self):
exp = list(range(5)) + list(range(105, 110)) + list(range(104, 4, -1))
tm.assert_numpy_array_equal(result, np.array(exp), check_dtype=False)

@pytest.mark.parametrize('data', [
Categorical(['a', 'c', 'a', 'b']),
to_datetime([0, 2, 0, 1]).tz_localize('Europe/Brussels')])
def test_nargsort_extension_array(self, data):
result = nargsort(data)
expected = np.array([0, 2, 3, 1], dtype=np.intp)
tm.assert_numpy_array_equal(result, expected)

def test_nargsort_datetimearray_warning(self, recwarn):
# https://github.com/pandas-dev/pandas/issues/25439
# can be removed once the FutureWarning for np.array(DTA) is removed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason you are not using tm.assert_produces_warning, this looks non-standard

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I want to assert that a specific warning is not raised. I found it a bit clunky as well, so if you know a better way, all ears.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with pytest.raises(None): is the cleanest, but I think our linter disallows that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming you mean pytest.warns(None): in that case you still have to do something like

with pytest.warns(None) as record:
    ....

assert len(record) == 0

giving basically the same as I did above (but above I did it with the recwarn fixture instead of the with statement. Happy to use the with statement if that is more typically used / easier to understand. Actually now thinking about it, I would find that clearer myself :-))

Copy link
Contributor

@TomAugspurger TomAugspurger Mar 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I meant pytest.warns. I really thought that pytest.warns(None) asserted that there were no warnings, but apparently not.

In theory, you could rely on our global "fail on warning" setting from the one CI job, but I don't think I trust that enough :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tm.assert_produces_warning(None) already does this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But not for a specific warning, what I tried to do here.
But OK, maybe it is not that important to do it (the reason I wanted to do that was to avoid unrelated failures, eg when numpy starts raising a warning on something unrelated used under the hood).

Will change to tm.assert_produces_warning(None)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But not for a specific warning, what I tried to do here.
But OK, maybe it is not that important to do it (the reason I wanted to do that was to avoid unrelated failures, eg when numpy starts raising a warning on something unrelated used under the hood).

sure, but try to guard against unexpected warnings is a bit of a rabbit hole

data = to_datetime([0, 2, 0, 1]).tz_localize('Europe/Brussels')
nargsort(data)
msg = "Converting timezone-aware DatetimeArray to timezone-naive"
assert len([w for w in recwarn.list if msg in str(w.message)]) == 0


class TestMerge(object):

Expand Down