Skip to content

Merge resets column types - category is reset #12497

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
laufere opened this issue Feb 29, 2016 · 2 comments
Closed

Merge resets column types - category is reset #12497

laufere opened this issue Feb 29, 2016 · 2 comments
Labels
Categorical Categorical Data Type Duplicate Report Duplicate issue or pull request Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@laufere
Copy link

laufere commented Feb 29, 2016

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd
# Generate some random dataframes with a common "id" column to merge on
# Transform some columns into type "category"
some_strings = ['aa', 'bb', 'cc', 'dd']
d1 = {'id': np.arange(100),
      'var1': np.random.randint(0,10,size=100),
      'var2': np.random.uniform(size=100),
      'var3': np.random.randint(0,2,size=100),
      'var4': np.random.choice(some_strings, 100)}
df1 = pd.DataFrame(d1)
df1['var1'] = df1['var1'].astype('category')

d2 = {'id': np.arange(100),
      'var1': np.random.randint(0,10,size=100),
      'var2': np.random.uniform(size=100),
      'var3': np.random.randint(0,2,size=100),
      'var4': np.random.choice(some_strings, 100)}
df2 = pd.DataFrame(d2)
df2['var1'] = df2['var1'].astype('category')
df2['var4'] = df2['var4'].astype('category')

print 'df1'
print df1.info() #Shows categorical columns

print 'df2'
print df2.info() #Shows categorical columns
print pd.merge(df1, df2, on='id').info() #Categorical columns are no longer categorical

Expected Output

The merged df1 and df2 no longer has category columns, and are reset to their original types.

output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-431.23.3.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.1
nose: 1.3.7
pip: 7.1.2
setuptools: 18.5
Cython: 0.23.4
numpy: 1.10.1
scipy: 0.16.0
statsmodels: 0.6.1
IPython: 4.0.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.5.0
openpyxl: 2.2.6
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.7
lxml: 3.4.4
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.9
pymysql: None
psycopg2: None
Jinja2: None

@TomAugspurger TomAugspurger added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Feb 29, 2016
@TomAugspurger TomAugspurger modified the milestones: 0.18.0, 0.18.1 Feb 29, 2016
@TomAugspurger
Copy link
Contributor

Thanks for the report.

FWIW when we fix this, we'll want to make sure that the categories match when merging on the Categorical column. Right now this would "succeed" since the categories get cast as object.

@jreback
Copy link
Contributor

jreback commented Mar 1, 2016

dupe of #10409

@jreback jreback closed this as completed Mar 1, 2016
@jreback jreback added Duplicate Report Duplicate issue or pull request Categorical Categorical Data Type labels Mar 1, 2016
@jorisvandenbossche jorisvandenbossche modified the milestones: No action, 0.18.1 Feb 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Duplicate Report Duplicate issue or pull request Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

4 participants