Skip to content

ENH: Add argmax and argmin to ExtensionArray #27801

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 36 commits into from
Jul 8, 2020
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
8303674
Add argmax, max, argmin, min to EA
makbigc Aug 7, 2019
44838f7
Remove argmax, argmin, max, min from ArrowEA
makbigc Aug 9, 2019
8bc9573
Fix black error
makbigc Aug 9, 2019
bc69a4a
Add issue number to the test
makbigc Aug 10, 2019
4787c63
update
makbigc Aug 10, 2019
8bd82f5
merge for update
makbigc Dec 2, 2019
e84268b
Move the whatsnew entry to v1
makbigc Dec 2, 2019
f9c9fea
Add test for categorical
makbigc Dec 3, 2019
8b7585f
Add func doc and pre-check for zero length
makbigc Dec 4, 2019
41e8ce4
Add min and max to StringArray
makbigc Dec 4, 2019
20ca0a2
Add test for empty array
makbigc Dec 4, 2019
1aa7422
Resolve black format
makbigc Dec 5, 2019
f2b6958
merge again
makbigc Jan 18, 2020
7d81cc5
Fix test
makbigc Jan 18, 2020
d18c8bb
Fix lint error
makbigc Jan 18, 2020
3530a6a
Change the error message
makbigc Jan 19, 2020
8d8506a
Move the whatsnew entry from v1 to v1.1
makbigc Jan 19, 2020
e7e7c86
merge for update
makbigc Jan 22, 2020
10f9b27
merge for update
makbigc Jan 29, 2020
ccded0b
merge again
makbigc Feb 2, 2020
9aea8fe
merge for update
makbigc Feb 6, 2020
5636d2a
merge for update
makbigc Feb 7, 2020
4db39e0
merge again
makbigc Feb 14, 2020
2036b25
Refactor max and min
makbigc Feb 14, 2020
d81a8f8
merge again
makbigc Feb 25, 2020
58d46d0
merge for update
makbigc Feb 26, 2020
6b88790
Merge remote-tracking branch 'upstream/master' into enh-24382
jorisvandenbossche May 9, 2020
8cd7169
fixup merge
jorisvandenbossche May 9, 2020
810ac56
Remove min/max for now
jorisvandenbossche May 9, 2020
6b030aa
argmin/argmax implementation based on _values_for_argsort
jorisvandenbossche May 9, 2020
e5a6d8c
test clean-up + update docstring
jorisvandenbossche May 9, 2020
d7a49c1
Merge remote-tracking branch 'upstream/master' into enh-24382
jorisvandenbossche May 9, 2020
65e1e4c
simplify test_sparse override
jorisvandenbossche May 9, 2020
1b64e2f
Merge remote-tracking branch 'upstream/master' into enh-24382
jorisvandenbossche May 22, 2020
2cdf16b
Merge remote-tracking branch 'upstream/master' into enh-24382
jorisvandenbossche Jun 20, 2020
7c79f5c
Merge remote-tracking branch 'upstream/master' into enh-24382
jorisvandenbossche Jul 8, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.25.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ What's new in 0.25.1 (July XX, 2019)
Enhancements
~~~~~~~~~~~~

- Add :meth:`ExtensionArray.argmax`, :meth:`ExtensionArray.max`, :meth:`ExtensionArray.argmin` and :meth:`ExtensionArray.min` (:issue:`24382`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this to 1.0.0 now?


.. _whatsnew_0251.enhancements.other:

Expand Down
29 changes: 29 additions & 0 deletions pandas/core/arrays/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -461,6 +461,35 @@ def argsort(self, ascending=True, kind="quicksort", *args, **kwargs):
result = nargsort(self, kind=kind, ascending=ascending, na_position="last")
return result

def argmin(self):
"""
Return the minimun argument indexer.

Returns
-------
scalar
Minimun argument indexer.

See Also
--------
Index.max : Return the maximum value of the object.
Series.min : Return the minimum value in a Series.
DataFrame.min : Return the minimum values in a DataFrame.
"""
return self.argsort()[0]

def min(self):
min_idx = self.argmin()
return self[min_idx]

def argmax(self):
no_nan = self.isna().sum()
return self.argsort()[-1 - no_nan]

def max(self):
max_idx = self.argmax()
return self[max_idx]

def fillna(self, value=None, method=None, limit=None):
"""
Fill NA/NaN values using the specified method.
Expand Down
12 changes: 12 additions & 0 deletions pandas/tests/extension/arrow/arrays.py
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,18 @@ def any(self, axis=0, out=None):
def all(self, axis=0, out=None):
return self._data.to_pandas().all()

def argmin(self):
raise NotImplementedError

def min(self):
raise NotImplementedError

def argmax(self):
raise NotImplementedError

def max(self):
raise NotImplementedError


class ArrowBoolArray(ArrowExtensionArray):
def __init__(self, values):
Expand Down
10 changes: 10 additions & 0 deletions pandas/tests/extension/arrow/test_bool.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,16 @@ def test_from_sequence_from_cls(self, data):


class TestReduce(base.BaseNoReduceTests):
@pytest.mark.parametrize("skipna", [True, False])
def test_reduce_series_numeric(self, data, all_numeric_reductions, skipna):
op_name = all_numeric_reductions
if op_name in ("max", "min"):
pass
else:
ser = pd.Series(data)
with pytest.raises(TypeError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review.
I think this is non-sense to implement argmax, argmin, max and min for ArrowBoolArray which have two values only. Calling those methods will raise NotImplementedError. Getting but not calling the max and min attributes by gettattr doesn't raise any error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @gfyoung was asking that you assert something about the error message with the match= keyword.

Copy link
Contributor Author

@makbigc makbigc Dec 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those functions just raise the TypeError without any error message.

getattr(ser, op_name)(skipna=skipna)

def test_reduce_series_boolean(self):
pass

Expand Down
24 changes: 24 additions & 0 deletions pandas/tests/extension/base/methods.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,30 @@ def test_argsort_missing(self, data_missing_for_sorting):
expected = pd.Series(np.array([1, -1, 0], dtype=np.int64))
self.assert_series_equal(result, expected)

def test_argmax(self, data_missing_for_sorting):
# GH 24382
result = data_missing_for_sorting.argmax()
expected = 0
assert result == expected

def test_max(self, data_missing_for_sorting):
# GH 24382
result = data_missing_for_sorting.max()
expected = data_missing_for_sorting[0]
assert result == expected

def test_argmin(self, data_missing_for_sorting):
# GH 24382
result = data_missing_for_sorting.argmin()
expected = 2
assert result == expected

def test_min(self, data_missing_for_sorting):
# GH 24382
result = data_missing_for_sorting.min()
expected = data_missing_for_sorting[2]
assert result == expected

@pytest.mark.parametrize(
"na_position, expected",
[
Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/extension/test_categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -195,6 +195,14 @@ def test_searchsorted(self, data_for_sorting):
if not data_for_sorting.ordered:
raise pytest.skip(reason="searchsorted requires ordered data.")

def test_max(self):
# GH 24382
pass

def test_min(self):
# GH 24382
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come these are empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The methods added in this PR ignore nan ,i.e., skipna=True. The existing categorical.min return nan if categorical contain any nan. This behavior is expected in test_min_max (tests/arrays/categorical/test_analytics.py).

If min and max of the generic EA ignoring nan is what we want, future PR is required to add skipna parameter to categorical.min and categorical.max.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want a release where these become out of sync, so perhaps a PR implementing skipna=True/False for Categorical first makes sense.



class TestCasting(base.BaseCastingTests):
pass
Expand Down
14 changes: 14 additions & 0 deletions pandas/tests/extension/test_numpy.py
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,20 @@ def test_repeat(self, data, repeats, as_series, use_numpy):
# Fails creating expected
super().test_repeat(data, repeats, as_series, use_numpy)

def test_max(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the base tests not work here?

# GH 24382
data = PandasArray(np.array([1, np.nan, 0]))
result = data.max()
expected = data[0]
assert result == expected

def test_min(self):
# GH 24382
data = PandasArray(np.array([1, np.nan, 0]))
result = data.min()
expected = data[2]
assert result == expected


@skip_nested
class TestArithmetics(BaseNumPyTests, base.BaseArithmeticOpsTests):
Expand Down