You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The pandas solution x = df.apply(lambda x: x.astype('category').cat.codes) Is by far the fastest. However it doesn't give a consistent answer if the data frame has more than one column.
@lesshaste The fact that df.apply(lambda x: x.astype('category').cat.codes) does this column by column is expected.
But see #12860 for some discussion on how to be able to do this on multiple columns at once (using the same categories for all columns).
Code Sample, a copy-pastable example if possible
This SO questions asks the simple question of how to recode strings in a data frame as numerical categories http://stackoverflow.com/questions/39475187/how-to-speed-up-recoding-into-integers .
The pandas solution x = df.apply(lambda x: x.astype('category').cat.codes) Is by far the fastest. However it doesn't give a consistent answer if the data frame has more than one column.
E.g.
g,k
a,h
c,i
j,e
d,i
i,h
b,b
d,d
i,a
d,h
gets recoded to:
0 1
0 4 6
1 0 4
2 2 5
3 6 3
4 3 5
5 5 4
6 1 1
7 3 2
8 5 0
9 3 4
Notice that 'd' is mapped to 3 in the first column but 2 in the second.
It would be great if pandas could do this recoding consistently.
Expected Output
output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: