Skip to content

Maintaining the order of the categorical variable that is passed into pd.crosstab #8860

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nsriram13 opened this issue Nov 19, 2014 · 3 comments
Labels
Categorical Categorical Data Type Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@nsriram13
Copy link

xref #8731, soln might be the same

Currently when we do a crosstab, the distinct values in each column is reported in the lexical order. But crosstabs are usually useful when we have categorical data (that may have an inherent ordering).

import pandas as pd

d = {'MAKE' : pd.Series(['Honda', 'Acura', 'Tesla', 'Honda', 'Honda', 'Acura']),
'MODEL' : pd.Series(['Sedan', 'Sedan', 'Electric', 'Pickup', 'Sedan', 'Sedan'])}
data = pd.DataFrame(d)
pd.crosstab(data['MAKE'],data['MODEL'])

data['MODEL'] = data['MODEL'].astype('category')
data['MODEL'] = data['MODEL'].cat.set_categories(['Sedan','Electric','Pickup'])
pd.crosstab(data['MAKE'],data['MODEL'])

Both the cross-tab statements above result in the same output as below - essentially the code I believe is performing a lexical sort on the contents of the Series being passed.

Output:
MODEL  Electric  Pickup  Sedan
MAKE                          
Acura         0       0      2
Honda         0       1      2
Tesla         1       0      0

Would it be possible for crosstab to maintain the ordering of the categorical variable if column.cat.ordered on the passed column is True? Thanks!

@jreback
Copy link
Contributor

jreback commented Nov 19, 2014

this sounds sensible. Would you be interested in doing a pull-request for this? should be straightforward

@jreback jreback added Categorical Categorical Data Type Won't Fix Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Won't Fix labels Nov 19, 2014
@jreback jreback added this to the 0.15.2 milestone Nov 19, 2014
@nsriram13
Copy link
Author

I have never worked with pandas code base and have no idea where to look. If you can point me to any resource that can help me get going I can definitely dabble around. But if it is an easy update to make for someone who is more familiar with the code - I would definitely recommend you assign it to them.

@jreback
Copy link
Contributor

jreback commented Nov 20, 2014

https://github.com/pydata/pandas/wiki/Contributing

always a good time to become familar!

lmk know how it goes

@jreback jreback modified the milestones: 0.16.0, 0.15.2 Dec 3, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@jreback jreback closed this as completed in 1725d24 Dec 6, 2016
@jreback jreback modified the milestones: 0.19.2, Next Major Release Dec 6, 2016
@jorisvandenbossche jorisvandenbossche modified the milestones: 0.18.0, 0.19.2, 0.19.0 Dec 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants