Skip to content

DataFrame.apply() silently converting columns to non-categorical type #11208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pganssle opened this issue Sep 30, 2015 · 1 comment
Closed
Labels
Categorical Categorical Data Type Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@pganssle
Copy link
Contributor

Per this SO question, using apply() to convert multiple DataFrame columns to categorical does not work unless all columns are categorical.

An MWE demonstrating the issue can be found at this gist for easy copy-paste.

Using this for example data:

import pandas as pd

pdf = pd.DataFrame(dict(name=       ('Earl', 'Eve', 'Alan', 'Randall', 'Danielle'),
                        age=        (    29,    17,     73,        31,         62),
                        gender=     (   'M',   'F',    'M',       'M',        'F'),
                        nationality=(  'US',  'UK',  'CAN',     'CAN',       'US'),
                        height=     ( 182.9, 167.6,  175.3,     170.2,      172.8)),
                   columns=('name', 'age', 'gender', 'nationality', 'height'))
pdf = pdf.set_index('name')
>>> print(pdf)

          age gender nationality  height
name
Earl       29      M          US   182.9
Eve        17      F          UK   167.6
Alan       73      M         CAN   175.3
Randall    31      M         CAN   170.2
Danielle   62      F          US   172.8

I tried to use pdf.apply() to convert 'gender' and 'nationality' to categorical columns:

cat_list = {'gender', 'nationality'}
set_cat_list = lambda x: x.astype('category') if x.name in cat_list else x
dfa = pdf.apply(set_cat_list)

>>> print('Applied to subset: dtype={}'.format(dfa['gender'].dtype))
Applied to subset: dtype=object

To make sure that the problem isn't just that I'm never reaching the x.astype('category') branch of the lambda expression, I added in an alert:

in_cl = lambda x: x.name in cat_list
set_cat_list_alert = lambda x: (set_cat_list(x),
                                sys.stdout.write('{}: {}\n'.format(x.name, in_cl(x))))[0]
dfa = pdf.apply(set_cat_list_alert)
>>> print('Applied to subset: dtype={}'.format(dfa['gender'].dtype))
age: False
age: False
gender: True
nationality: True
height: False
Applied to subset: dtype=object

To verify that it's not just a problem setting any columns as categorical, I tried setting all columns to categorical, which works just fine:

set_cat = lambda x: x.astype('category')
dfb = pdf.apply(set_cat)

>>> print('Applied to whole frame: dtype={}'.format(dfb['gender'].dtype))
Applied to whole frame: dtype=category

Finally, I tried just using a for loop to duplicate the final result, to make sure that mixed categorical / non-categorical columns can coexist like this:

dfc = pdf.copy()
for cat in cat_list:
    dfc[cat] = pdf[cat].astype('category')

>>> print('For loop: dtype={}'.format(dfc['gender'].dtype))
For loop: dtype=category

Is this the desired behavior?

@jreback
Copy link
Contributor

jreback commented Sep 30, 2015

this is a dup of #9573 which is closed in the shortly-to-be-released 0.17.0

thanks for the report

@jreback jreback closed this as completed Sep 30, 2015
@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Categorical Categorical Data Type labels Sep 30, 2015
@jreback jreback added this to the 0.17.0 milestone Sep 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

2 participants