Skip to content

[ENH] nargsort handles EA with its _values_for_argsort #26854

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 26, 2019

Conversation

makbigc
Copy link
Contributor

@makbigc makbigc commented Jun 14, 2019

Presently, nargsort can handle EA too. Because np.asanyarray turns EA into np.ndarray.

items = np.asanyarray(items)

In this PR, nargsort use EA._values_for_argsort instead to avoid the np.ndarray conversion.

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@makbigc Thanks for the PR.

#25439 is already closed. is this a follow-on?

@simonjayhawkins simonjayhawkins added the ExtensionArray Extending pandas with custom dtypes or arrays. label Jun 14, 2019
@codecov
Copy link

codecov bot commented Jun 15, 2019

Codecov Report

Merging #26854 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26854      +/-   ##
==========================================
+ Coverage   91.98%   91.99%   +<.01%     
==========================================
  Files         180      180              
  Lines       50772    50774       +2     
==========================================
+ Hits        46704    46708       +4     
+ Misses       4068     4066       -2
Flag Coverage Δ
#multiple 90.63% <100%> (+0.05%) ⬆️
#single 41.83% <100%> (-0.08%) ⬇️
Impacted Files Coverage Δ
pandas/core/sorting.py 98.35% <100%> (ø) ⬆️
pandas/io/gbq.py 88.88% <0%> (-11.12%) ⬇️
pandas/core/frame.py 96.89% <0%> (-0.12%) ⬇️
pandas/core/generic.py 94.2% <0%> (ø) ⬆️
pandas/io/pytables.py 90.3% <0%> (ø) ⬆️
pandas/core/dtypes/cast.py 90.72% <0%> (+0.16%) ⬆️
pandas/core/arrays/sparse.py 94.19% <0%> (+0.46%) ⬆️
pandas/core/dtypes/missing.py 93.93% <0%> (+0.6%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2243629...fc99129. Read the comment docs.

@makbigc
Copy link
Contributor Author

makbigc commented Jun 15, 2019

Only test_api is failed. But this test hasn't called nargsort.
The test error is that three more objects are found i.e., 'missing_dependencies', 'dependency', 'hard_dependencies'.
But they are already deleted in

del hard_dependencies, dependency, missing_dependencies

Do you have any idea?

@TomAugspurger
Copy link
Contributor

I had another PR failing with the same thing @makbigc. I haven't been able to reproduce it yet.

@jreback jreback added this to the 0.25.0 milestone Jun 21, 2019
@@ -133,6 +133,7 @@ Other Enhancements
- :meth:`DataFrame.describe` now formats integer percentiles without decimal point (:issue:`26660`)
- Added support for reading SPSS .sav files using :func:`read_spss` (:issue:`26537`)
- Added new option ``plotting.backend`` to be able to select a plotting backend different than the existing ``matplotlib`` one. Use ``pandas.set_option('plotting.backend', '<backend-module>')`` where ``<backend-module`` is a library implementing the pandas plotting API (:issue:`14130`)
- :meth:`nargsort` handles ``ExtensionArray`` without calling ``np.asanyarray`` (:issue:`25439`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC you had this different before, can you say that EA now argsorts Nans at the end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Referencing :meth:pandas.api.extensions.ExtensionArray.argsort

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@makbigc can you update this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#26354 But the PR in which ExtensionArray.argsort places Nans at the end by nargsort hasn't been merged. Should we add the reference in the whatsnew entry?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got them mixed up. You can just remove this note then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

@@ -239,12 +238,12 @@ def nargsort(items, kind='quicksort', ascending=True, na_position='last'):
GH #6399, #5231
"""

mask = isna(items)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change this to

mask = np.asarray(isna(items)) and I don't think you need the if for sparse below

@@ -181,13 +180,6 @@ def test_nargsort(self):
exp = list(range(5)) + list(range(105, 110)) + list(range(104, 4, -1))
tm.assert_numpy_array_equal(result, np.array(exp), check_dtype=False)

def test_nargsort_datetimearray_warning(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we not leave this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I don't see the warning on master. It seems to have changed in the meantime. Probably fine to just remove.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure that's cool

@@ -133,6 +133,7 @@ Other Enhancements
- :meth:`DataFrame.describe` now formats integer percentiles without decimal point (:issue:`26660`)
- Added support for reading SPSS .sav files using :func:`read_spss` (:issue:`26537`)
- Added new option ``plotting.backend`` to be able to select a plotting backend different than the existing ``matplotlib`` one. Use ``pandas.set_option('plotting.backend', '<backend-module>')`` where ``<backend-module`` is a library implementing the pandas plotting API (:issue:`14130`)
- :meth:`nargsort` handles ``ExtensionArray`` without calling ``np.asanyarray`` (:issue:`25439`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Referencing :meth:pandas.api.extensions.ExtensionArray.argsort

@@ -181,13 +180,6 @@ def test_nargsort(self):
exp = list(range(5)) + list(range(105, 110)) + list(range(104, 4, -1))
tm.assert_numpy_array_equal(result, np.array(exp), check_dtype=False)

def test_nargsort_datetimearray_warning(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I don't see the warning on master. It seems to have changed in the meantime. Probably fine to just remove.

def test_nargsort(self, data_missing_for_sorting, na_position, expected):
# GH 25439
result = nargsort(data_missing_for_sorting, na_position=na_position)
tm.assert_numpy_array_equal(result, expected, check_dtype=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is check_dtype False? Was it an Appveyor failure?

If you change the expected to np.array([2, 0, 1], dtype='int64') I think you may be OK, and can remove check_dtype=False.

warnings.filterwarnings(
"ignore", category=FutureWarning,
message="Converting timezone-aware DatetimeArray to")
if (not isinstance(items, ABCIndexClass)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more question: why do we have this check for ABCIndexClass here? Shouldn't they be treated the same?

I think it'd be more correct to do

from pandas.core.internals.construction import extract_array

items = extract_array(items)

on all inputs. This will get an ExtensionArray from an Index or Series. Then if the new items is_extension_array_dtype, we do the _values_for_argsort.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract_array is imported inside nargsort to avoid circular import.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging later today.

@TomAugspurger TomAugspurger merged commit a7f1d69 into pandas-dev:master Jun 26, 2019
@TomAugspurger
Copy link
Contributor

Thanks @makbigc!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FutureWarning when sorting tz-aware datetimeindex
4 participants