Skip to content

BUG: bug in group by rank string #21554

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
xinai57 opened this issue Jun 20, 2018 · 6 comments
Closed

BUG: bug in group by rank string #21554

xinai57 opened this issue Jun 20, 2018 · 6 comments
Labels
Duplicate Report Duplicate issue or pull request Groupby

Comments

@xinai57
Copy link

xinai57 commented Jun 20, 2018

Bug description

In[1]: import pandas as pd
In[2]: df = pd.DataFrame({"key": ["a", "a", "b", "b", "b"],
                   "value_str": ["u1", "u2", "u3", "u4", "u5"],
                   "value_int": range(5)})
In[3]: df
Out[3]:
  key value_str  value_int
0   a        u1          0
1   a        u2          1
2   b        u3          2
3   b        u4          3
4   b        u5          4

when groupby "key" and rank "value_str", error presents

In[4]: df.groupby("key")["value_str"].rank()  # error
Out[4]: 
Traceback (most recent call last):
  File "/Users/wangqian/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2862, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-9-5357c8abb14f>", line 1, in <module>
    df.groupby("key")["value_str"].rank()
  File "/Users/wangqian/anaconda3/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 1906, in rank
    na_option=na_option, pct=pct, axis=axis)
  File "/Users/wangqian/anaconda3/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 1025, in _cython_transform
    **kwargs)
  File "/Users/wangqian/anaconda3/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 2630, in transform
    return self._cython_operation('transform', values, how, axis, **kwargs)
  File "/Users/wangqian/anaconda3/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 2590, in _cython_operation
    **kwargs)
  File "/Users/wangqian/anaconda3/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 2664, in _transform
    transform_func(result, values, comp_ids, is_datetimelike, **kwargs)
  File "/Users/wangqian/anaconda3/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 2479, in wrapper
    return f(afunc, *args, **kwargs)
  File "/Users/wangqian/anaconda3/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 2431, in <lambda>
    kwargs.get('na_option', 'keep')
TypeError: 'NoneType' object is not callable

but, if group by "key" and rank "value_int", the error doesn't present:

In[10]: df.groupby("key")["value_int"].rank()

Out[10]: 
0    1.0
1    2.0
2    1.0
3    2.0
4    3.0
Name: value_int, dtype: float64

if we just rank "value_str", the error dosen't present either:

In[11]: df["value_str"].rank()

Out[11]: 
0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
Name: value_str, dtype: float64
@xinai57 xinai57 changed the title bug in group by rank string BUG: bug in group by rank string Jun 20, 2018
@WillAyd
Copy link
Member

WillAyd commented Jun 20, 2018

Lexicographcal ranking is not supported, hence the Error with groupby. IIRC there is an issue already to make that consistent across GroupBy and Series objects

@WillAyd WillAyd added Groupby Duplicate Report Duplicate issue or pull request labels Jun 20, 2018
@WillAyd
Copy link
Member

WillAyd commented Jun 20, 2018

Here's the original issue - #19560. Looks like there's a PR referenced there that hasn't been updated in a couple months, so if you are interested can reach out to the author and try to push over the finish line.

Closing this issue specifically as it is a duplicate

@WillAyd WillAyd closed this as completed Jun 20, 2018
@xinai57
Copy link
Author

xinai57 commented Jun 20, 2018

@WillAyd Thanks very much.
in fact, this error presents after I update pandas from v0.20 to v0.23. There is no error in version 0.20...

@hz336
Copy link

hz336 commented Sep 6, 2018

I'm having the exact same issue as @xinai57 has, after upgrading the pandas version to v0.23.

@ZenW00kie
Copy link

Out of curiosity what's the reasoning behind removing the ability for lexicographic ranking?

@YFWDZWS
Copy link

YFWDZWS commented Sep 12, 2019

你好,我也遇到了同样的问题,df.groupby("key")["value_int"].rank(),这句话的意思是不是把key这一列按照value_int这一列的数值大小进行排序?如果是这样,字符串应该不能排序吧,是不是只支持数字排序?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request Groupby
Projects
None yet
Development

No branches or pull requests

5 participants