-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
[ENH] nargsort handles EA with its _values_for_argsort #26854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codecov Report
@@ Coverage Diff @@
## master #26854 +/- ##
==========================================
+ Coverage 91.98% 91.99% +<.01%
==========================================
Files 180 180
Lines 50772 50774 +2
==========================================
+ Hits 46704 46708 +4
+ Misses 4068 4066 -2
Continue to review full report at Codecov.
|
Only test_api is failed. But this test hasn't called Line 17 in ea06f8d
Do you have any idea? |
I had another PR failing with the same thing @makbigc. I haven't been able to reproduce it yet. |
doc/source/whatsnew/v0.25.0.rst
Outdated
@@ -133,6 +133,7 @@ Other Enhancements | |||
- :meth:`DataFrame.describe` now formats integer percentiles without decimal point (:issue:`26660`) | |||
- Added support for reading SPSS .sav files using :func:`read_spss` (:issue:`26537`) | |||
- Added new option ``plotting.backend`` to be able to select a plotting backend different than the existing ``matplotlib`` one. Use ``pandas.set_option('plotting.backend', '<backend-module>')`` where ``<backend-module`` is a library implementing the pandas plotting API (:issue:`14130`) | |||
- :meth:`nargsort` handles ``ExtensionArray`` without calling ``np.asanyarray`` (:issue:`25439`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC you had this different before, can you say that EA now argsorts Nans at the end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Referencing :meth:pandas.api.extensions.ExtensionArray.argsort
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@makbigc can you update this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#26354 But the PR in which ExtensionArray.argsort
places Nans at the end by nargsort
hasn't been merged. Should we add the reference in the whatsnew entry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah got them mixed up. You can just remove this note then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed.
pandas/core/sorting.py
Outdated
@@ -239,12 +238,12 @@ def nargsort(items, kind='quicksort', ascending=True, na_position='last'): | |||
GH #6399, #5231 | |||
""" | |||
|
|||
mask = isna(items) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change this to
mask = np.asarray(isna(items))
and I don't think you need the if for sparse below
@@ -181,13 +180,6 @@ def test_nargsort(self): | |||
exp = list(range(5)) + list(range(105, 110)) + list(range(104, 4, -1)) | |||
tm.assert_numpy_array_equal(result, np.array(exp), check_dtype=False) | |||
|
|||
def test_nargsort_datetimearray_warning(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we not leave this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I don't see the warning on master. It seems to have changed in the meantime. Probably fine to just remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure that's cool
doc/source/whatsnew/v0.25.0.rst
Outdated
@@ -133,6 +133,7 @@ Other Enhancements | |||
- :meth:`DataFrame.describe` now formats integer percentiles without decimal point (:issue:`26660`) | |||
- Added support for reading SPSS .sav files using :func:`read_spss` (:issue:`26537`) | |||
- Added new option ``plotting.backend`` to be able to select a plotting backend different than the existing ``matplotlib`` one. Use ``pandas.set_option('plotting.backend', '<backend-module>')`` where ``<backend-module`` is a library implementing the pandas plotting API (:issue:`14130`) | |||
- :meth:`nargsort` handles ``ExtensionArray`` without calling ``np.asanyarray`` (:issue:`25439`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Referencing :meth:pandas.api.extensions.ExtensionArray.argsort
@@ -181,13 +180,6 @@ def test_nargsort(self): | |||
exp = list(range(5)) + list(range(105, 110)) + list(range(104, 4, -1)) | |||
tm.assert_numpy_array_equal(result, np.array(exp), check_dtype=False) | |||
|
|||
def test_nargsort_datetimearray_warning(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I don't see the warning on master. It seems to have changed in the meantime. Probably fine to just remove.
def test_nargsort(self, data_missing_for_sorting, na_position, expected): | ||
# GH 25439 | ||
result = nargsort(data_missing_for_sorting, na_position=na_position) | ||
tm.assert_numpy_array_equal(result, expected, check_dtype=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is check_dtype
False? Was it an Appveyor failure?
If you change the expected to np.array([2, 0, 1], dtype='int64')
I think you may be OK, and can remove check_dtype=False
.
pandas/core/sorting.py
Outdated
warnings.filterwarnings( | ||
"ignore", category=FutureWarning, | ||
message="Converting timezone-aware DatetimeArray to") | ||
if (not isinstance(items, ABCIndexClass) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more question: why do we have this check for ABCIndexClass here? Shouldn't they be treated the same?
I think it'd be more correct to do
from pandas.core.internals.construction import extract_array
items = extract_array(items)
on all inputs. This will get an ExtensionArray from an Index or Series. Then if the new items
is_extension_array_dtype, we do the _values_for_argsort
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extract_array
is imported inside nargsort
to avoid circular import.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging later today.
Thanks @makbigc! |
git diff upstream/master -u -- "*.py" | flake8 --diff
Presently,
nargsort
can handle EA too. Becausenp.asanyarray
turns EA intonp.ndarray
.pandas/pandas/core/sorting.py
Line 264 in 430f0fd
In this PR,
nargsort
useEA._values_for_argsort
instead to avoid thenp.ndarray
conversion.