BUG: TypeError on Series(dtype='string').value_counts().idxmax() #36566

wkschwartz · 2020-09-23T05:16:43Z

Expected output

Obtain the most frequent string in a series:

>>> pd.Series('a').value_counts().idxmax()                
'a'

Problem

Changing the dtype on the second example from 'object' to 'string' breaks this idiom:

>>> pd.Series('a', dtype='string').value_counts().idxmax()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/series.py", line 2168, in idxmax
    i = nanops.nanargmax(self._values, skipna=skipna)
  File "pandas/core/nanops.py", line 71, in _f
    return f(*args, **kwargs)
  File "pandas/core/nanops.py", line 924, in nanargmax
    result = values.argmax(axis)
TypeError: argmax() takes 1 positional argument but 2 were given

The problem is evidently not due to a 'string' index:

>>> pd.Series([1], index=pd.Series(['a'], dtype='string')).idxmax()
'a'

Output of `pd.show_versions()`

>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : 2a7d3326dee660824a8433ffd01065f8ac37f7d6
python           : 3.7.9.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 19.6.0
Version          : Darwin Kernel Version 19.6.0: Thu Jun 18 20:49:00 PDT 2020; root:xnu-6153.141.1~1/RELEASE_X86_64
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.1.2
numpy            : 1.19.2
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 20.1.1
setuptools       : 47.1.0
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : None
IPython          : None
pandas_datareader: None
bs4              : None
bottleneck       : None
fsspec           : None
fastparquet      : None
gcsfs            : None
matplotlib       : None
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pytables         : None
pyxlsb           : None
s3fs             : None
scipy            : 1.5.2
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
numba            : None

The text was updated successfully, but these errors were encountered:

arw2019 · 2020-09-23T05:28:25Z

Confirming this happens on 1.2 master.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 76eb314
python : 3.8.3.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-48-generic
Version : #52-Ubuntu SMP Thu Sep 10 10:58:49 UTC 2020
machine : x86_64
processor :
byteorder : little
LC_ALL : C.UTF-8
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.0.dev0+232.g76eb314a0
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 49.1.0.post20200704
Cython : 0.29.21
pytest : 5.4.3
hypothesis : 5.19.0
sphinx : 3.1.1
blosc : None
feather : None
xlsxwriter : 1.2.9
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : 2.8.5 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.7.4
fastparquet : 0.4.0
gcsfs : 0.6.2
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.4
pandas_gbq : None
pyarrow : 1.0.1
pytables : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.5.0
sqlalchemy : 1.3.18
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.50.1

A similar snippet with dtype=Int64 also throws:

In [11]: pd.Series(1, dtype='Int64').value_counts().idxmax()                                                             
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-11-0a7f15f3eb23> in <module>
----> 1 pd.Series(1, dtype='Int64').value_counts().idxmax()

/workspaces/pandas-arw2019/pandas/core/series.py in idxmax(self, axis, skipna, *args, **kwargs)
   2187         """
   2188         skipna = nv.validate_argmax_with_skipna(skipna, args, kwargs)
-> 2189         i = nanops.nanargmax(self._values, skipna=skipna)
   2190         if i == -1:
   2191             return np.nan

/workspaces/pandas-arw2019/pandas/core/nanops.py in _f(*args, **kwargs)
     69             try:
     70                 with np.errstate(invalid="ignore"):
---> 71                     return f(*args, **kwargs)
     72             except ValueError as e:
     73                 # we want to transform an object array

/workspaces/pandas-arw2019/pandas/core/nanops.py in nanargmax(values, axis, skipna, mask)
    922     """
    923     values, mask, _, _, _ = _get_values(values, True, fill_value_typ="-inf", mask=mask)
--> 924     result = values.argmax(axis)
    925     result = _maybe_arg_null_out(result, axis, mask, skipna)
    926     return result

TypeError: argmax() takes 1 positional argument but 2 were given

but if we switch to dtype=int64 it works:

In [12]: pd.Series(1, dtype='int64').value_counts().idxmax()                                                             
Out[12]: 1

jorisvandenbossche · 2020-09-23T06:33:13Z

So we recently added ExtensionArray.argmin/argmax (#27801), but this is not yet hooked into Series -> #35178. The issue is for Series.argmin/argmax, but I think the same is needed for idxmin/idxmax.

Note, you don't need the value_counts() to be able to reproduce it:

In [21]: pd.Series(1, dtype='Int64').idxmax()                                                                                                                                                                      
...
TypeError: argmax() takes 1 positional argument but 2 were given

wkschwartz added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 23, 2020

jorisvandenbossche added ExtensionArray Extending pandas with custom dtypes or arrays. and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 23, 2020

tonyyyyip mentioned this issue Dec 5, 2020

BUG: idxmax/min (and argmax/min) for Series with underlying ExtensionArray #37924

Merged

5 tasks

jorisvandenbossche added this to the 1.3 milestone Dec 29, 2020

jreback closed this as completed in #37924 Jan 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: TypeError on Series(dtype='string').value_counts().idxmax() #36566

BUG: TypeError on Series(dtype='string').value_counts().idxmax() #36566

wkschwartz commented Sep 23, 2020

arw2019 commented Sep 23, 2020

INSTALLED VERSIONS

jorisvandenbossche commented Sep 23, 2020

BUG: TypeError on Series(dtype='string').value_counts().idxmax() #36566

BUG: TypeError on Series(dtype='string').value_counts().idxmax() #36566

Comments

wkschwartz commented Sep 23, 2020

Expected output

Problem

Output of pd.show_versions()

arw2019 commented Sep 23, 2020

INSTALLED VERSIONS

jorisvandenbossche commented Sep 23, 2020

Output of `pd.show_versions()`