TypeError: rank() got an unexpected keyword argument 'numeric_only' #11759

nbonnotte · 2015-12-04T13:55:40Z

In [19]: df = DataFrame({'a':['A1', 'A1', 'A1'], 'b':['B1','B1','B2'], 'c':1})

In [20]: df.set_index('a').groupby('b').rank(method='first')
Out[20]: 
    c
a    
A1  1
A1  2
A1  1

In [21]: df.set_index('a').groupby('c').rank(method='first')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-21-6b8d4cae9d91> in <module>()
----> 1 df.set_index('a').groupby('c').rank(method='first')

/home/nicolas/Git/pandas/pandas/core/groupby.pyc in rank(self, axis, numeric_only, method, na_option, ascending, pct)

/home/nicolas/Git/pandas/pandas/core/groupby.pyc in wrapper(*args, **kwargs)
    618                     # mark this column as an error
    619                     try:
--> 620                         return self._aggregate_item_by_item(name, *args, **kwargs)
    621                     except (AttributeError):
    622                         raise ValueError

/home/nicolas/Git/pandas/pandas/core/groupby.pyc in _aggregate_item_by_item(self, func, *args, **kwargs)
   3076             # GH6337
   3077             if not len(result_columns) and errors is not None:
-> 3078                 raise errors
   3079 
   3080         return DataFrame(result, columns=result_columns)

TypeError: rank() got an unexpected keyword argument 'numeric_only'

I'm trying to obtain what I would get with a row_number() in SQL...

Notice that if I replace the value in the 'c' column with the string '1', then even df.set_index('a').groupby('b').rank(method='first') fails.

Am I doing something wrong?

The text was updated successfully, but these errors were encountered:

jreback · 2015-12-04T14:40:06Z

you are trying to rank on a string column, which is not supported.

But should give a better message I would think.

In [20]: df.set_index('a').groupby('c').first()
Out[20]: 
    b
c    
1  B1

nbonnotte · 2015-12-27T18:30:36Z

That's weird, because .rank() work with method='average' (the default value) but not with method='first'.

In [2]: df = DataFrame({'a':['A1', 'A1', 'A1'], 'b':['B1','B1','B2'], 'c':1})

In [3]: df.set_index('a').groupby('c').rank()
Out[3]: 
      b
a      
A1  1.5
A1  1.5
A1  3.0

In [4]: df.set_index('a').groupby('c').rank(method='first')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-6b8d4cae9d91> in <module>()
----> 1 df.set_index('a').groupby('c').rank(method='first')

/home/nicolas/Git/pandas/pandas/core/groupby.pyc in rank(self, axis, numeric_only, method, na_option, ascending, pct)

/home/nicolas/Git/pandas/pandas/core/groupby.pyc in wrapper(*args, **kwargs)
    582                     # mark this column as an error
    583                     try:
--> 584                         return self._aggregate_item_by_item(name, *args, **kwargs)
    585                     except (AttributeError):
    586                         raise ValueError

/home/nicolas/Git/pandas/pandas/core/groupby.pyc in _aggregate_item_by_item(self, func, *args, **kwargs)
   3017             # GH6337
   3018             if not len(result_columns) and errors is not None:
-> 3019                 raise errors
   3020 
   3021         return DataFrame(result, columns=result_columns)

TypeError: rank() got an unexpected keyword argument 'numeric_only'

I'm looking into it.

nbonnotte · 2015-12-28T08:31:04Z

I think I understand what is going on.

DataFrameGroupBy.rank is created as part of a whitelist of operators, and its signature is taken from DataFrame.rank, which uses a wrapper obtained with DataFrame._make_wrapper. There, different things are tried to produce the result.

With method='average', the first try succeeds.

With method='first', the first two trys raise an exception with the message "first not supported for non-numeric data", which is good, but then at the last try the method NDFrame._aggregate_item_by_item is called. Things go wrong here, as it uses SeriesGroupBy.rank, the signature of which is taken from Series.rank. And the parameter numeric_only does not exist there, hence the error.

There is a design flaw here:

either the DataFrame and Series (and Panel, I guess) versions of rank (and the like) should always have the same signature
or the DataFrameGroupBy.rank should not use `SeriesGroupBy.rank

I'll think about a solution that is as minimalist as possible, solves the initial issue, and if possible addresses this flaw. If I can't, I'll just add a hack somewhere to solve the initial issue.

jreback · 2015-12-28T12:35:23Z

the right way to fix this is to move Series.rank and DataFrame.rank into generic.py and make the signature uniform.

You then accept numeric_only=None in the Series.rank (and raise NotImplementedError if its not None).

Further need to add axis as a parameter (the _get_axis_name handles the case where the axis is > than the ndim FYI).

you can raise if ndim>2 as well

nbonnotte · 2015-12-28T16:11:56Z

So now, SeriesGroupBy.rank has the right signature, and Series._make_wrapper is used, so again there is a call to ._aggregate_item_by_item()... except that this method comes from NDFrameGroupBy, and SeriesGroupBy does not inherit from NDFrameGroupBy, so now an AttributeError is raised. This is caught and transformed into a simple ValueError, with the following comment:

related to : GH #3688
try item-by-item
this can be called recursively, so need to raise ValueError if
we don't have this method to indicated to aggregate to
mark this column as an error

Indeed, the first call to _aggregate_item_by_item (the one that called SeriesGroupBy.rank... still following?) uses this ValueError to simply discard the column, and we end up with an empty dataframe with the example I gave in the beginning.

I'm going to prevent the call to SeriesGroupBy._aggregate_item_by_item (instead of asking for forgiveness), so that the exceptions can be sorted and a meaningful error message can be given to the user.

kuanche · 2015-12-30T21:21:55Z

Hi guys!
Dealing with the exact same issue- any tips on what to try instead?

nbonnotte · 2015-12-30T21:37:45Z

What are you trying to do, exactly?

nbonnotte · 2016-03-13T08:29:29Z

Following pull request #11924, we now get an empty dataframe:

In [2]: df = DataFrame({'a': ['A1', 'A1', 'A1'],
   ...:     'b': ['B1', 'B1', 'B2'],
   ...:     'c': 1})

In [3]: dg = df.set_index('a').groupby('c')

In [5]: dg.rank(method='first')
Out[5]:
Empty DataFrame
Columns: []
Index: []

jreback · 2016-11-22T11:52:29Z

these seem to be working in currently master

In [3]: df = DataFrame({'a':['A1', 'A1', 'A1'], 'b':['B1','B1','B2'], 'c':1})

In [4]: df
Out[4]: 
    a   b  c
0  A1  B1  1
1  A1  B1  1
2  A1  B2  1

In [5]: 

In [5]: df.set_index('a').groupby('c').rank(method='first')
Out[5]: 
Empty DataFrame
Columns: []
Index: []

In [6]: df.set_index('a').groupby('b').rank(method='first')
Out[6]: 
      c
a      
A1  1.0
A1  2.0
A1  1.0

In [7]: df.set_index('a').groupby('c').rank()
Out[7]: 
      b
a      
A1  1.5
A1  1.5
A1  3.0

In [8]: df.set_index('a').groupby('b').rank()
Out[8]: 
      c
a      
A1  1.5
A1  1.5
A1  1.0

jreback · 2018-02-10T18:02:10Z

closing in favor of #19560 which will report the proper error message.

jreback added Groupby API Design Error Reporting Incorrect or improved errors from pandas Difficulty Intermediate labels Dec 4, 2015

jreback added this to the 0.18.0 milestone Dec 4, 2015

This was referenced Dec 28, 2015

BUG in DataFrameGroupBy.rank raising an obscure TypeError #11918

Closed

CLN: Moving Series.rank and DataFrame.rank to generic.py #11924

Closed

jreback modified the milestones: Next Major Release, 0.18.0 Feb 8, 2016

jreback mentioned this issue Nov 25, 2016

groupby agg with rank and parameter return does not reduce #14741

Open

jreback mentioned this issue Jan 9, 2017

Unclear ValueError on core.groupby #15082

Closed

jreback mentioned this issue Jul 19, 2017

groupby.rank(): unexpected behavior with object type column #17016

Closed

jreback modified the milestones: 0.21.0, Next Major Release Jul 19, 2017

jreback modified the milestones: 0.21.0, Next Major Release Sep 23, 2017

jreback mentioned this issue Jan 22, 2018

Shortcut functions in transform are not grouped #19354

Closed

WillAyd mentioned this issue Feb 9, 2018

PERF: Cythonize Groupby Rank #19481

Merged

4 tasks

jreback mentioned this issue Feb 10, 2018

Raise ValueError When Attempting to Rank Object Dtypes #19560

Closed

jreback closed this as completed Feb 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError: rank() got an unexpected keyword argument 'numeric_only' #11759

TypeError: rank() got an unexpected keyword argument 'numeric_only' #11759

nbonnotte commented Dec 4, 2015

jreback commented Dec 4, 2015

nbonnotte commented Dec 27, 2015

nbonnotte commented Dec 28, 2015

jreback commented Dec 28, 2015

nbonnotte commented Dec 28, 2015

kuanche commented Dec 30, 2015

nbonnotte commented Dec 30, 2015

nbonnotte commented Mar 13, 2016

jreback commented Nov 22, 2016

jreback commented Feb 10, 2018

TypeError: rank() got an unexpected keyword argument 'numeric_only' #11759

TypeError: rank() got an unexpected keyword argument 'numeric_only' #11759

Comments

nbonnotte commented Dec 4, 2015

jreback commented Dec 4, 2015

nbonnotte commented Dec 27, 2015

nbonnotte commented Dec 28, 2015

jreback commented Dec 28, 2015

nbonnotte commented Dec 28, 2015

kuanche commented Dec 30, 2015

nbonnotte commented Dec 30, 2015

nbonnotte commented Mar 13, 2016

jreback commented Nov 22, 2016

jreback commented Feb 10, 2018