Skip to content

rank incorrectly orders ordered categories #15420

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dfd opened this issue Feb 16, 2017 · 1 comment
Closed

rank incorrectly orders ordered categories #15420

dfd opened this issue Feb 16, 2017 · 1 comment
Labels
Bug Categorical Categorical Data Type
Milestone

Comments

@dfd
Copy link

dfd commented Feb 16, 2017

Code Sample, a copy-pastable example if possible

import pandas as pd
a = pd.DataFrame(['first', 'second', 'third', 'fourth', 'fifth', 'sixth'], columns=['A'])
a['A'] = a['A'].astype('category', ).cat.set_categories(
    ['first', 'second', 'third', 'fourth', 'fifth', 'sixth'], ordered=True)
a['A'].rank()
# outputs:
# 0    2.0
# 1    4.0
# 2    6.0
# 3    3.0
# 4    1.0
# 5    5.0

Problem description

rank seems to be ignoring the order of ordered categories.

Expected Output

0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
5    6.0

Output of pd.show_versions()

``` INSTALLED VERSIONS ------------------ commit: None python: 3.6.0.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-59-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 34.2.0
Cython: None
numpy: 1.12.0
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None


</details>
@jorisvandenbossche jorisvandenbossche added Bug Categorical Categorical Data Type labels Feb 16, 2017
@jorisvandenbossche
Copy link
Member

@dfd Thanks for the report! That is indeed clearly a bug.
For example in sort_values, it takes the correct order into account, but rank was apparently missed.

In [6]: a.A.sort_values()
Out[6]: 
0     first
1    second
2     third
3    fourth
4     fifth
5     sixth
Name: A, dtype: category
Categories (6, object): [first < second < third < fourth < fifth < sixth]

I think this should be a rather easy fix (in the pd.core.algorithms.rank, we should need to check for categorical, and then pass the underlying integer codes). If you would be interested in trying to do a pull request with a fix, always welcome!

@jorisvandenbossche jorisvandenbossche added this to the Next Major Release milestone Feb 16, 2017
@jeetjitsu jeetjitsu mentioned this issue Feb 16, 2017
4 tasks
@jreback jreback modified the milestones: 0.20.0, Next Major Release Feb 24, 2017
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this issue Mar 21, 2017
check for categorical, and then pass the underlying integer codes.
closes pandas-dev#15420

Author: Prasanjit Prakash <[email protected]>

Closes pandas-dev#15422 from ikilledthecat/rank_categorical and squashes the following commits:

a7e573b [Prasanjit Prakash] moved test for categorical, in rank, to top
3ba4e3a [Prasanjit Prakash] corrections after rebasing
c43a029 [Prasanjit Prakash] using if/else construct to pick sorting function for categoricals
f8ec019 [Prasanjit Prakash] ask Categorical for ranking function
40d88c1 [Prasanjit Prakash] return values for rank from categorical object
049c0fc [Prasanjit Prakash] GH#15420 added support for na_option when ranking categorical
5e5bbeb [Prasanjit Prakash] BUG: GH#15420 rank for categoricals
ef999c3 [Prasanjit Prakash] merged with upstream master
fbaba1b [Prasanjit Prakash] return values for rank from categorical object
fa0b4c2 [Prasanjit Prakash] BUG: GH15420 - _rank private method on Categorical
9a6b5cd [Prasanjit Prakash] BUG: GH15420 - _rank private method on Categorical
4220e56 [Prasanjit Prakash] BUG: GH15420 - _rank private method on Categorical
6b70921 [Prasanjit Prakash] GH#15420 move rank inside categoricals
bf4e36c [Prasanjit Prakash] GH#15420 added support for na_option when ranking categorical
ce90207 [Prasanjit Prakash] BUG: GH#15420 rank for categoricals
85b267a [Prasanjit Prakash] Added support for categorical datatype in rank - issue#15420
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Categorical Categorical Data Type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants