Skip to content

DataFrame.apply() with function that return category series #9573

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ruoyu0088 opened this issue Mar 3, 2015 · 3 comments · Fixed by #10354
Closed

DataFrame.apply() with function that return category series #9573

ruoyu0088 opened this issue Mar 3, 2015 · 3 comments · Fixed by #10354
Labels
Bug Categorical Categorical Data Type Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@ruoyu0088
Copy link

import pandas as pd
df = pd.DataFrame({"c0":["A","A","B","B"], "c1":["C","C","D","D"]})
df.apply(lambda s:s.astype("category"))

the resut is a series with series as element, not a dataframe:

c0    [A, A, B, B]
Categories (2, object): [A < B]
c1    [C, C, D, D]
Categories (2, object): [C < D]
dtype: object
Here is the output of `show_vershons()`:

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.9.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: x86
processor: x86 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.15.2.dev
nose: 1.3.4
Cython: 0.21.2
numpy: 1.9.1
scipy: 0.15.0
statsmodels: 0.6.1
IPython: 3.0.0
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.3
pytz: 2014.10
bottleneck: None
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.4.2
openpyxl: None
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.6.5
lxml: 3.4.1
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
rpy2: 2.5.4
sqlalchemy: 0.9.8
pymysql: None
psycopg2: None
@jreback
Copy link
Contributor

jreback commented Mar 3, 2015

hmm, not behaving properly. as a work-around you can do this:

In [8]: DataFrame(dict([(c,col.astype('category')) for c, col in df.iteritems()]))
Out[8]: 
  c0 c1
0  A  C
1  A  C
2  B  D
3  B  D

In [9]: DataFrame(dict([(c,col.astype('category')) for c, col in df.iteritems()])).dtypes
Out[9]: 
c0    category
c1    category
dtype: object

@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Categorical Categorical Data Type labels Mar 3, 2015
@jreback jreback added this to the 0.16.1 milestone Mar 3, 2015
@sebp
Copy link

sebp commented Mar 31, 2015

This problem is still present in 0.16.0, apply does not retain category dtypes.

import pandas

df = pandas.DataFrame({'col0': [1, 2, 3, 4, 5],
                       'col1': ['yes', 'no', 'no', 'yes', 'yes'],
                       'col2': ['small', 'large', 'medium', 'large', 'small']})
df['col2'] = pandas.Categorical(df['col2'], categories=['small', 'medium', 'large'],
                                ordered=True)

other = df.apply(lambda x: x)

# returns category
print(df['col2'].dtype)
# returns object
print(other['col2'].dtype)

pandas.show_versions():

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Linux
OS-release: 3.19.1-201.fc21.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: de_DE.utf8

pandas: 0.16.0
nose: 1.3.4
Cython: 0.21.2
numpy: 1.9.2
scipy: 0.15.1
statsmodels: 0.6.1
IPython: 3.0.0
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.4.1
pytz: 2015.2
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.3
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.5.7
lxml: 3.4.0
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None

@jreback
Copy link
Contributor

jreback commented Mar 31, 2015

@sebp well this is still an open issue; issues are closed if they are merged in
pull requests are welcome

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Categorical Categorical Data Type Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants