Skip to content

Commit 12ba7a7

Browse files
DGradyDaniel Grady
authored and
Daniel Grady
committed
BUG: Fix behavior of argmax and argmin with inf (pandas-dev#16449)
Closes pandas-dev#13595 The implementations of `nanargmin` and `nanargmax` in `nanops` were forcing the `_get_values` utility function to always mask out infinite values. For example, in `nanargmax`, >>> nanops._get_values(np.array([1, np.nan, np.inf]), True, isfinite=True, fill_value_typ='-inf') (array([ 1., -inf, -inf]), array([False, True, True], dtype=bool), dtype('float64'), numpy.float64) The first element of the result tuple (the masked version of the values array) is used for actually finding the max or min argument. As a result, infinite values could never be correctly recognized as the maximum or minimum values in an array. This also affects the behavior of `Series.idxmax` with string data (or the `object` dtype generally). Previously, `nanargmax` would always attempt to coerce its input to float, even when there were no missing values. Now, it will not, and so will work correctly in the particular case of a `Series` of strings with no missing values. However, because it's difficult to ensure that `nanargmin` and `nanargmax` will behave consistently for arbitrary `Series` of `object` with and without missing values, these functions are now explicitly disallowed for `object`.
1 parent 06850a1 commit 12ba7a7

File tree

4 files changed

+86
-5
lines changed

4 files changed

+86
-5
lines changed

doc/source/whatsnew/v0.21.0.txt

+25
Original file line numberDiff line numberDiff line change
@@ -233,6 +233,30 @@ Dtype Conversions
233233
- Inconsistent behavior in ``.where()`` with datetimelikes which would raise rather than coerce to ``object`` (:issue:`16402`)
234234
- Bug in assignment against ``int64`` data with ``np.ndarray`` with ``float64`` dtype may keep ``int64`` dtype (:issue:`14001`)
235235

236+
Dropping argmin/argmax support for arbitrary dtypes
237+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
238+
239+
:func:`Series.argmin` and :func:`Series.argmax` will now raise a ``TypeError`` when used with ``object`` dtypes. Previously, these functions would attempt to coerce their arguments to floats, which usually led to a ``ValueError``. Now, ``object`` dtypes are explicitly disallowed with ``argmax`` and related functions. For example, the old behavior produced a somewhat confusing message:
240+
241+
.. code-block:: ipython
242+
243+
In [1]: s = pd.Series(['foo', 'bar'])
244+
245+
In [2]: s.argmax()
246+
...
247+
ValueError: could not convert string to float: 'bar'
248+
249+
This has now changed to
250+
251+
.. code-block:: ipython
252+
253+
In [1]: s = pd.Series(['foo', 'bar'])
254+
255+
In [2]: s.argmax()
256+
...
257+
TypeError: reduction operation 'argmax' not allowed for this dtype
258+
259+
236260
.. _whatsnew_0210.api.na_changes:
237261

238262
NA naming Changes
@@ -369,6 +393,7 @@ Reshaping
369393
- Fixes regression from 0.20, :func:`Series.aggregate` and :func:`DataFrame.aggregate` allow dictionaries as return values again (:issue:`16741`)
370394
- Fixes dtype of result with integer dtype input, from :func:`pivot_table` when called with ``margins=True`` (:issue:`17013`)
371395
- Bug in :func:`crosstab` where passing two ``Series`` with the same name raised a ``KeyError`` (:issue:`13279`)
396+
- :func:`Series.argmin`, :func:`Series.argmax`, and their counterparts on ``DataFrame`` and groupby objects work correctly with floating point data that contains infinite values (:issue:`13595`).
372397

373398
Numeric
374399
^^^^^^^

pandas/core/nanops.py

+4-4
Original file line numberDiff line numberDiff line change
@@ -486,23 +486,23 @@ def reduction(values, axis=None, skipna=True):
486486
nanmax = _nanminmax('max', fill_value_typ='-inf')
487487

488488

489+
@disallow('O')
489490
def nanargmax(values, axis=None, skipna=True):
490491
"""
491492
Returns -1 in the NA case
492493
"""
493-
values, mask, dtype, _ = _get_values(values, skipna, fill_value_typ='-inf',
494-
isfinite=True)
494+
values, mask, dtype, _ = _get_values(values, skipna, fill_value_typ='-inf')
495495
result = values.argmax(axis)
496496
result = _maybe_arg_null_out(result, axis, mask, skipna)
497497
return result
498498

499499

500+
@disallow('O')
500501
def nanargmin(values, axis=None, skipna=True):
501502
"""
502503
Returns -1 in the NA case
503504
"""
504-
values, mask, dtype, _ = _get_values(values, skipna, fill_value_typ='+inf',
505-
isfinite=True)
505+
values, mask, dtype, _ = _get_values(values, skipna, fill_value_typ='+inf')
506506
result = values.argmin(axis)
507507
result = _maybe_arg_null_out(result, axis, mask, skipna)
508508
return result

pandas/tests/groupby/test_groupby.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -2339,7 +2339,8 @@ def test_non_cython_api(self):
23392339
assert_frame_equal(result, expected)
23402340

23412341
# idxmax
2342-
expected = DataFrame([[0], [nan]], columns=['B'], index=[1, 3])
2342+
expected = DataFrame([[0.0], [nan]], columns=['B'],
2343+
index=[1, 3])
23432344
expected.index.name = 'A'
23442345
result = g.idxmax()
23452346
assert_frame_equal(result, expected)

pandas/tests/series/test_operators.py

+55
Original file line numberDiff line numberDiff line change
@@ -1857,3 +1857,58 @@ def test_op_duplicate_index(self):
18571857
result = s1 + s2
18581858
expected = pd.Series([11, 12, np.nan], index=[1, 1, 2])
18591859
assert_series_equal(result, expected)
1860+
1861+
def test_argminmax(self):
1862+
# Series.argmin, Series.argmax are aliased to Series.idxmin,
1863+
# Series.idxmax
1864+
1865+
# Expected behavior for empty Series
1866+
s = pd.Series([])
1867+
1868+
with pytest.raises(ValueError):
1869+
s.argmin()
1870+
with pytest.raises(ValueError):
1871+
s.argmin(skipna=False)
1872+
with pytest.raises(ValueError):
1873+
s.argmax()
1874+
with pytest.raises(ValueError):
1875+
s.argmax(skipna=False)
1876+
1877+
# For numeric data with NA and Inf (GH #13595)
1878+
s = pd.Series([0, -np.inf, np.inf, np.nan])
1879+
1880+
assert s.argmin() == 1
1881+
assert np.isnan(s.argmin(skipna=False))
1882+
1883+
assert s.argmax() == 2
1884+
assert np.isnan(s.argmax(skipna=False))
1885+
1886+
# Using old-style behavior that treats floating point nan, -inf, and
1887+
# +inf as missing
1888+
s = pd.Series([0, -np.inf, np.inf, np.nan])
1889+
1890+
with pd.option_context('mode.use_inf_as_na', True):
1891+
assert s.argmin() == 0
1892+
assert np.isnan(s.argmin(skipna=False))
1893+
assert s.argmax() == 0
1894+
np.isnan(s.argmax(skipna=False))
1895+
1896+
# For strings (or any Series with dtype 'O')
1897+
s = pd.Series(['foo', 'bar', 'baz'])
1898+
with pytest.raises(TypeError):
1899+
s.argmin()
1900+
with pytest.raises(TypeError):
1901+
s.argmax()
1902+
1903+
s = pd.Series([(1,), (2,)])
1904+
with pytest.raises(TypeError):
1905+
s.argmin()
1906+
with pytest.raises(TypeError):
1907+
s.argmax()
1908+
1909+
# For mixed data types
1910+
s = pd.Series(['foo', 'foo', 'bar', 'bar', None, np.nan, 'baz'])
1911+
with pytest.raises(TypeError):
1912+
s.argmin()
1913+
with pytest.raises(TypeError):
1914+
s.argmax()

0 commit comments

Comments
 (0)