Skip to content

Column loses category when using .loc for a one row dataframe #16360

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nyejon opened this issue May 15, 2017 · 5 comments · Fixed by #37988
Closed

Column loses category when using .loc for a one row dataframe #16360

nyejon opened this issue May 15, 2017 · 5 comments · Fixed by #37988
Assignees
Labels
Categorical Categorical Data Type good first issue Indexing Related to indexing on series/frames, not to indexes themselves Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@nyejon
Copy link

nyejon commented May 15, 2017

If I convert a list of columns to type 'category'

import pandas as pd 

d1 = {'one' : ['a'],
     'two' : ['a']}

d2 = {'one' : ['a', 'b'],
     'two' : ['a', 'b']}


df1 = pd.DataFrame(d1)
df2 = pd.DataFrame(d2)
df1.loc[: , 'one']= df1['one'].astype('category')
df2.loc[: , 'one'] = df2['one'].astype('category')

print('df1')
print(df1)
print(df1.dtypes)
print('df2')
print(df2)
print(df2.dtypes)

df1
  one two
0   a   a
one    category
two      object
dtype: object
df2
  one two
0   a   a
1   b   b
one    category
two      object
dtype: object

df1['one'] = df1['one'].cat.set_categories(df2['one'].cat.categories)

print('Assigning without loc')
print(df1)
print(df1.dtypes)

Assigning without loc
  one two
0   a   a
one    category
two      object
dtype: object

df1.loc[:, 'one'] = df1['one'].cat.set_categories(df2['one'].cat.categories)
print('Assigning with loc')
print(df1)
print(df1.dtypes)

Assigning with loc
  one two
0   a   a
one    object
two    object
dtype: object

df2.loc[:, 'one'] = df2['one'].cat.set_categories(df2['one'].cat.categories)
print('Assigning df2 with loc')
print(df2)
print(df2.dtypes)

Assigning df2 with loc
  one two
0   a   a
1   b   b
one    category
two      object
dtype: object

Problem description

I am trying to convert my defined categorical columns to the category type. It works when the dataframe is longer than one row, but if it is only one row it keeps the datatype as object.

With only one row I get the following column outputs:

df.dtypes

  one two
0   a   a
one    object
two    object
dtype: object

Expected Output

I would expect the column to be type category even for one row.

df.dtypes

  one two
0   a   a
one    category
two    object
dtype: object

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.3.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.20.1
pytest: No

ne
pip: 9.0.1
setuptools: 35.0.2
Cython: None
numpy: 1.12.1
scipy: 0.18.1
xarray: None
IPython: 5.1.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.0
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: None
xlsxwriter: 0.9.6
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: 1.1.5
pymysql: None
psycopg2: None
jinja2: 2.9.4
s3fs: None
pandas_gbq: None
pandas_datareader: None

@TomAugspurger
Copy link
Contributor

Simpler example:

In [78]: df = pd.DataFrame({"A": [1]})

In [79]: df.loc[:, 'B'] = pd.Categorical(['a'])

In [80]: df.dtypes
Out[80]:
A     int64
B    object
dtype: object

@TomAugspurger TomAugspurger added Categorical Categorical Data Type Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves labels May 15, 2017
@TomAugspurger TomAugspurger added this to the Next Major Release milestone May 15, 2017
@jreback
Copy link
Contributor

jreback commented May 15, 2017

There is a coercion step where df.loc[:, column]= will try to do exactly df[column]= so I guess this should work.

@eamag
Copy link

eamag commented Jan 18, 2019

Also have this issue

@mroeschke
Copy link
Member

Looks to work on master. Could use a test

In [47]: In [78]: df = pd.DataFrame({"A": [1]})
    ...:
    ...: In [79]: df.loc[:, 'B'] = pd.Categorical(['a'])

In [48]: In [80]: df.dtypes
    ...:
Out[48]:
A       int64
B    category
dtype: object

In [49]: pd.__version__
Out[49]: '1.1.0.dev0+1974.g0159cba6e'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Categorical Categorical Data Type Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves labels Jun 28, 2020
@fgebhart
Copy link
Contributor

take

@jreback jreback modified the milestones: Contributions Welcome, 1.2 Nov 23, 2020
@jreback jreback added Categorical Categorical Data Type Indexing Related to indexing on series/frames, not to indexes themselves labels Nov 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type good first issue Indexing Related to indexing on series/frames, not to indexes themselves Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants