Skip to content

TypeError: rank() got an unexpected keyword argument 'numeric_only' #11759

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nbonnotte opened this issue Dec 4, 2015 · 10 comments
Closed

TypeError: rank() got an unexpected keyword argument 'numeric_only' #11759

nbonnotte opened this issue Dec 4, 2015 · 10 comments
Labels
API Design Error Reporting Incorrect or improved errors from pandas Groupby

Comments

@nbonnotte
Copy link
Contributor

In [19]: df = DataFrame({'a':['A1', 'A1', 'A1'], 'b':['B1','B1','B2'], 'c':1})

In [20]: df.set_index('a').groupby('b').rank(method='first')
Out[20]: 
    c
a    
A1  1
A1  2
A1  1

In [21]: df.set_index('a').groupby('c').rank(method='first')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-21-6b8d4cae9d91> in <module>()
----> 1 df.set_index('a').groupby('c').rank(method='first')

/home/nicolas/Git/pandas/pandas/core/groupby.pyc in rank(self, axis, numeric_only, method, na_option, ascending, pct)

/home/nicolas/Git/pandas/pandas/core/groupby.pyc in wrapper(*args, **kwargs)
    618                     # mark this column as an error
    619                     try:
--> 620                         return self._aggregate_item_by_item(name, *args, **kwargs)
    621                     except (AttributeError):
    622                         raise ValueError

/home/nicolas/Git/pandas/pandas/core/groupby.pyc in _aggregate_item_by_item(self, func, *args, **kwargs)
   3076             # GH6337
   3077             if not len(result_columns) and errors is not None:
-> 3078                 raise errors
   3079 
   3080         return DataFrame(result, columns=result_columns)

TypeError: rank() got an unexpected keyword argument 'numeric_only'

I'm trying to obtain what I would get with a row_number() in SQL...

Notice that if I replace the value in the 'c' column with the string '1', then even df.set_index('a').groupby('b').rank(method='first') fails.

Am I doing something wrong?

@jreback
Copy link
Contributor

jreback commented Dec 4, 2015

you are trying to rank on a string column, which is not supported.

But should give a better message I would think.

In [20]: df.set_index('a').groupby('c').first()
Out[20]: 
    b
c    
1  B1

@jreback jreback added this to the 0.18.0 milestone Dec 4, 2015
@nbonnotte
Copy link
Contributor Author

That's weird, because .rank() work with method='average' (the default value) but not with method='first'.

In [2]: df = DataFrame({'a':['A1', 'A1', 'A1'], 'b':['B1','B1','B2'], 'c':1})

In [3]: df.set_index('a').groupby('c').rank()
Out[3]: 
      b
a      
A1  1.5
A1  1.5
A1  3.0

In [4]: df.set_index('a').groupby('c').rank(method='first')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-6b8d4cae9d91> in <module>()
----> 1 df.set_index('a').groupby('c').rank(method='first')

/home/nicolas/Git/pandas/pandas/core/groupby.pyc in rank(self, axis, numeric_only, method, na_option, ascending, pct)

/home/nicolas/Git/pandas/pandas/core/groupby.pyc in wrapper(*args, **kwargs)
    582                     # mark this column as an error
    583                     try:
--> 584                         return self._aggregate_item_by_item(name, *args, **kwargs)
    585                     except (AttributeError):
    586                         raise ValueError

/home/nicolas/Git/pandas/pandas/core/groupby.pyc in _aggregate_item_by_item(self, func, *args, **kwargs)
   3017             # GH6337
   3018             if not len(result_columns) and errors is not None:
-> 3019                 raise errors
   3020 
   3021         return DataFrame(result, columns=result_columns)

TypeError: rank() got an unexpected keyword argument 'numeric_only'

I'm looking into it.

@nbonnotte
Copy link
Contributor Author

I think I understand what is going on.

DataFrameGroupBy.rank is created as part of a whitelist of operators, and its signature is taken from DataFrame.rank, which uses a wrapper obtained with DataFrame._make_wrapper. There, different things are tried to produce the result.

With method='average', the first try succeeds.

With method='first', the first two trys raise an exception with the message "first not supported for non-numeric data", which is good, but then at the last try the method NDFrame._aggregate_item_by_item is called. Things go wrong here, as it uses SeriesGroupBy.rank, the signature of which is taken from Series.rank. And the parameter numeric_only does not exist there, hence the error.

There is a design flaw here:

  • either the DataFrame and Series (and Panel, I guess) versions of rank (and the like) should always have the same signature
  • or the DataFrameGroupBy.rank should not use `SeriesGroupBy.rank

I'll think about a solution that is as minimalist as possible, solves the initial issue, and if possible addresses this flaw. If I can't, I'll just add a hack somewhere to solve the initial issue.

@jreback
Copy link
Contributor

jreback commented Dec 28, 2015

the right way to fix this is to move Series.rank and DataFrame.rank into generic.py and make the signature uniform.

You then accept numeric_only=None in the Series.rank (and raise NotImplementedError if its not None).

Further need to add axis as a parameter (the _get_axis_name handles the case where the axis is > than the ndim FYI).

you can raise if ndim>2 as well

@nbonnotte
Copy link
Contributor Author

So now, SeriesGroupBy.rank has the right signature, and Series._make_wrapper is used, so again there is a call to ._aggregate_item_by_item()... except that this method comes from NDFrameGroupBy, and SeriesGroupBy does not inherit from NDFrameGroupBy, so now an AttributeError is raised. This is caught and transformed into a simple ValueError, with the following comment:

related to : GH #3688
try item-by-item
this can be called recursively, so need to raise ValueError if
we don't have this method to indicated to aggregate to
mark this column as an error

Indeed, the first call to _aggregate_item_by_item (the one that called SeriesGroupBy.rank... still following?) uses this ValueError to simply discard the column, and we end up with an empty dataframe with the example I gave in the beginning.

I'm going to prevent the call to SeriesGroupBy._aggregate_item_by_item (instead of asking for forgiveness), so that the exceptions can be sorted and a meaningful error message can be given to the user.

@kuanche
Copy link

kuanche commented Dec 30, 2015

Hi guys!
Dealing with the exact same issue- any tips on what to try instead?

@nbonnotte
Copy link
Contributor Author

What are you trying to do, exactly?

@jreback jreback modified the milestones: Next Major Release, 0.18.0 Feb 8, 2016
@nbonnotte
Copy link
Contributor Author

Following pull request #11924, we now get an empty dataframe:

In [2]: df = DataFrame({'a': ['A1', 'A1', 'A1'],
   ...:     'b': ['B1', 'B1', 'B2'],
   ...:     'c': 1})

In [3]: dg = df.set_index('a').groupby('c')

In [5]: dg.rank(method='first')
Out[5]:
Empty DataFrame
Columns: []
Index: []

@jreback
Copy link
Contributor

jreback commented Nov 22, 2016

these seem to be working in currently master

In [3]: df = DataFrame({'a':['A1', 'A1', 'A1'], 'b':['B1','B1','B2'], 'c':1})

In [4]: df
Out[4]: 
    a   b  c
0  A1  B1  1
1  A1  B1  1
2  A1  B2  1

In [5]: 

In [5]: df.set_index('a').groupby('c').rank(method='first')
Out[5]: 
Empty DataFrame
Columns: []
Index: []

In [6]: df.set_index('a').groupby('b').rank(method='first')
Out[6]: 
      c
a      
A1  1.0
A1  2.0
A1  1.0

In [7]: df.set_index('a').groupby('c').rank()
Out[7]: 
      b
a      
A1  1.5
A1  1.5
A1  3.0

In [8]: df.set_index('a').groupby('b').rank()
Out[8]: 
      c
a      
A1  1.5
A1  1.5
A1  1.0

@jreback
Copy link
Contributor

jreback commented Feb 10, 2018

closing in favor of #19560 which will report the proper error message.

@jreback jreback closed this as completed Feb 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Error Reporting Incorrect or improved errors from pandas Groupby
Projects
None yet
Development

No branches or pull requests

3 participants