[ENH] nargsort handles EA with its _values_for_argsort #26854

makbigc · 2019-06-14T15:31:23Z

closes FutureWarning when sorting tz-aware datetimeindex #25439
1 test added and 1 test deleted
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Presently, nargsort can handle EA too. Because np.asanyarray turns EA into np.ndarray.

pandas/pandas/core/sorting.py

Line 264 in 430f0fd

items = np.asanyarray(items)

In this PR, nargsort use EA._values_for_argsort instead to avoid the np.ndarray conversion.

simonjayhawkins

@makbigc Thanks for the PR.

#25439 is already closed. is this a follow-on?

pandas/tests/extension/base/methods.py

doc/source/whatsnew/v0.25.0.rst

codecov · 2019-06-15T02:11:31Z

Codecov Report

Merging #26854 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #26854      +/-   ##
==========================================
+ Coverage   91.98%   91.99%   +<.01%     
==========================================
  Files         180      180              
  Lines       50772    50774       +2     
==========================================
+ Hits        46704    46708       +4     
+ Misses       4068     4066       -2

Flag	Coverage Δ
#multiple	`90.63% <100%> (+0.05%)`	⬆️
#single	`41.83% <100%> (-0.08%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/sorting.py	`98.35% <100%> (ø)`	⬆️
pandas/io/gbq.py	`88.88% <0%> (-11.12%)`	⬇️
pandas/core/frame.py	`96.89% <0%> (-0.12%)`	⬇️
pandas/core/generic.py	`94.2% <0%> (ø)`	⬆️
pandas/io/pytables.py	`90.3% <0%> (ø)`	⬆️
pandas/core/dtypes/cast.py	`90.72% <0%> (+0.16%)`	⬆️
pandas/core/arrays/sparse.py	`94.19% <0%> (+0.46%)`	⬆️
pandas/core/dtypes/missing.py	`93.93% <0%> (+0.6%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2243629...fc99129. Read the comment docs.

makbigc · 2019-06-15T02:55:01Z

Only test_api is failed. But this test hasn't called nargsort.
The test error is that three more objects are found i.e., 'missing_dependencies', 'dependency', 'hard_dependencies'.
But they are already deleted in

pandas/pandas/__init__.py

Line 17 in ea06f8d

del hard_dependencies, dependency, missing_dependencies

Do you have any idea?

TomAugspurger · 2019-06-16T19:42:44Z

I had another PR failing with the same thing @makbigc. I haven't been able to reproduce it yet.

jreback · 2019-06-21T12:28:00Z

doc/source/whatsnew/v0.25.0.rst

@@ -133,6 +133,7 @@ Other Enhancements
 - :meth:`DataFrame.describe` now formats integer percentiles without decimal point (:issue:`26660`)
 - Added support for reading SPSS .sav files using :func:`read_spss` (:issue:`26537`)
 - Added new option ``plotting.backend`` to be able to select a plotting backend different than the existing ``matplotlib`` one. Use ``pandas.set_option('plotting.backend', '<backend-module>')`` where ``<backend-module`` is a library implementing the pandas plotting API (:issue:`14130`)
+- :meth:`nargsort` handles ``ExtensionArray`` without calling ``np.asanyarray`` (:issue:`25439`)


IIRC you had this different before, can you say that EA now argsorts Nans at the end

Referencing :meth:pandas.api.extensions.ExtensionArray.argsort

@makbigc can you update this?

#26354 But the PR in which ExtensionArray.argsort places Nans at the end by nargsort hasn't been merged. Should we add the reference in the whatsnew entry?

Ah got them mixed up. You can just remove this note then.

jreback · 2019-06-21T12:28:58Z

pandas/core/sorting.py

@@ -239,12 +238,12 @@ def nargsort(items, kind='quicksort', ascending=True, na_position='last'):
    GH #6399, #5231
    """

+    mask = isna(items)


change this to

mask = np.asarray(isna(items)) and I don't think you need the if for sparse below

jreback · 2019-06-21T12:29:48Z

pandas/tests/test_sorting.py

@@ -181,13 +180,6 @@ def test_nargsort(self):
        exp = list(range(5)) + list(range(105, 110)) + list(range(104, 4, -1))
        tm.assert_numpy_array_equal(result, np.array(exp), check_dtype=False)

-    def test_nargsort_datetimearray_warning(self):


should we not leave this?

FWIW, I don't see the warning on master. It seems to have changed in the meantime. Probably fine to just remove.

sure that's cool

TomAugspurger · 2019-06-21T13:38:01Z

doc/source/whatsnew/v0.25.0.rst

@@ -133,6 +133,7 @@ Other Enhancements
 - :meth:`DataFrame.describe` now formats integer percentiles without decimal point (:issue:`26660`)
 - Added support for reading SPSS .sav files using :func:`read_spss` (:issue:`26537`)
 - Added new option ``plotting.backend`` to be able to select a plotting backend different than the existing ``matplotlib`` one. Use ``pandas.set_option('plotting.backend', '<backend-module>')`` where ``<backend-module`` is a library implementing the pandas plotting API (:issue:`14130`)
+- :meth:`nargsort` handles ``ExtensionArray`` without calling ``np.asanyarray`` (:issue:`25439`)


Referencing :meth:pandas.api.extensions.ExtensionArray.argsort

TomAugspurger · 2019-06-21T13:42:21Z

pandas/tests/test_sorting.py

@@ -181,13 +180,6 @@ def test_nargsort(self):
        exp = list(range(5)) + list(range(105, 110)) + list(range(104, 4, -1))
        tm.assert_numpy_array_equal(result, np.array(exp), check_dtype=False)

-    def test_nargsort_datetimearray_warning(self):


FWIW, I don't see the warning on master. It seems to have changed in the meantime. Probably fine to just remove.

TomAugspurger · 2019-06-21T13:43:20Z

pandas/tests/extension/base/methods.py

+    def test_nargsort(self, data_missing_for_sorting, na_position, expected):
+        # GH 25439
+        result = nargsort(data_missing_for_sorting, na_position=na_position)
+        tm.assert_numpy_array_equal(result, expected, check_dtype=False)


Why is check_dtype False? Was it an Appveyor failure?

If you change the expected to np.array([2, 0, 1], dtype='int64') I think you may be OK, and can remove check_dtype=False.

TomAugspurger · 2019-06-22T19:34:47Z

pandas/core/sorting.py

-        warnings.filterwarnings(
-            "ignore", category=FutureWarning,
-            message="Converting timezone-aware DatetimeArray to")
+    if (not isinstance(items, ABCIndexClass)


One more question: why do we have this check for ABCIndexClass here? Shouldn't they be treated the same?

I think it'd be more correct to do

from pandas.core.internals.construction import extract_array items = extract_array(items)

on all inputs. This will get an ExtensionArray from an Index or Series. Then if the new items is_extension_array_dtype, we do the _values_for_argsort.

extract_array is imported inside nargsort to avoid circular import.

TomAugspurger

Merging later today.

TomAugspurger · 2019-06-26T13:28:17Z

Thanks @makbigc!

makbigc mentioned this pull request Jun 14, 2019

API: ExtensionArray.argsort places the missing value at the end #26354

Closed

simonjayhawkins reviewed Jun 14, 2019

View reviewed changes

pandas/tests/extension/base/methods.py Show resolved Hide resolved

pandas/tests/extension/base/methods.py Outdated Show resolved Hide resolved

doc/source/whatsnew/v0.25.0.rst Outdated Show resolved Hide resolved

simonjayhawkins added the ExtensionArray Extending pandas with custom dtypes or arrays. label Jun 14, 2019

makbigc added 7 commits June 21, 2019 19:03

Add test to ensure that nargsort can handle EA

8f9637c

Modify nargsort

4460191

Remove test_nargsort_datetimearray_warning

2f37a63

Add whatsnew entry

707d9e3

Fix lint error

5ce8a29

Change after 1st review

42d7092

Correct comment

f84aad8

makbigc force-pushed the fix-25439 branch from 0b51172 to f84aad8 Compare June 21, 2019 11:06

jreback added this to the 0.25.0 milestone Jun 21, 2019

jreback requested changes Jun 21, 2019

View reviewed changes

TomAugspurger reviewed Jun 21, 2019

View reviewed changes

makbigc added 3 commits June 21, 2019 22:34

change after 2nd review

7543a84

Fix lint

a65fea3

Remove whatsnew entry

365afef

TomAugspurger reviewed Jun 22, 2019

View reviewed changes

Use extract_array ahead

fc99129

TomAugspurger approved these changes Jun 24, 2019

View reviewed changes

TomAugspurger merged commit a7f1d69 into pandas-dev:master Jun 26, 2019

jorisvandenbossche mentioned this pull request Jul 3, 2019

API: ExtensionArray.argsort places the missing value at the end #27137

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] nargsort handles EA with its _values_for_argsort #26854

[ENH] nargsort handles EA with its _values_for_argsort #26854

makbigc commented Jun 14, 2019 •

edited

Loading

simonjayhawkins left a comment

codecov bot commented Jun 15, 2019 •

edited

Loading

makbigc commented Jun 15, 2019 •

edited

Loading

TomAugspurger commented Jun 16, 2019

jreback Jun 21, 2019

TomAugspurger Jun 21, 2019

TomAugspurger Jun 21, 2019

makbigc Jun 22, 2019

TomAugspurger Jun 22, 2019

makbigc Jun 22, 2019

jreback Jun 21, 2019

jreback Jun 21, 2019

TomAugspurger Jun 21, 2019

jreback Jun 21, 2019

TomAugspurger Jun 21, 2019

TomAugspurger Jun 21, 2019

TomAugspurger Jun 21, 2019

TomAugspurger Jun 22, 2019

makbigc Jun 23, 2019

TomAugspurger left a comment

TomAugspurger commented Jun 26, 2019

[ENH] nargsort handles EA with its _values_for_argsort #26854

[ENH] nargsort handles EA with its _values_for_argsort #26854

Conversation

makbigc commented Jun 14, 2019 • edited Loading

simonjayhawkins left a comment

Choose a reason for hiding this comment

codecov bot commented Jun 15, 2019 • edited Loading

Codecov Report

makbigc commented Jun 15, 2019 • edited Loading

TomAugspurger commented Jun 16, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger left a comment

Choose a reason for hiding this comment

TomAugspurger commented Jun 26, 2019

makbigc commented Jun 14, 2019 •

edited

Loading

codecov bot commented Jun 15, 2019 •

edited

Loading

makbigc commented Jun 15, 2019 •

edited

Loading