-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: merge with categoricals does not preserve categories dtype #10409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
ok, this works for a simple |
Do you mean that attempting to merge on
Returning error:
Perhaps it is linked to issue #10374... Ideally this would return a new The above was tried on pandas 0.16.2 |
well aside from the bug in #10324
|
Additionally the performance benefits of the underlying int representation of the categories are lost when doing the merge. For example: # Here we are joining on a int column and appending two string columns
A = pd.DataFrame({'X': np.random.choice(range(0, 10), size=(100000,)),
'Y': np.random.choice(['one', 'two', 'three'], size=(100000,))})
B = pd.DataFrame({'X': np.random.choice(range(0, 10), size=(100000,)),
'Z': np.random.choice(['jjj', 'kkk', 'sss'], size=(100000,))})
%time r = pd.merge(A, B, on='X')
Wall time: 57.3 s
# Lets do the same but convert the appending columns 'Y' and 'Z' to a category
A['Y'] = A['Y'].astype('category')
B['Z'] = B['Z'].astype('category')
%time r = pd.merge(A, B, on='X')
Wall time: 54.9 s # Slightly less
# Now lets use the int representation of the categorical columns and merge
A['Y'] = A['Y'].cat.codes
B['Z'] = B['Z'].cat.codes
%time r = pd.merge(A, B, on='X')
Wall time: 41.7 s # 72% of the original time taken |
@joshlk all true if u would like to try an impl would be great |
Hi, is there any plans to solve this issue ? losing categorical types when merging data frames ? Thanks. JCG |
preserve the category dtype when possible closes pandas-dev#10409
preserve the category dtype when possible closes pandas-dev#10409
preserve the category dtype when possible closes pandas-dev#10409
preserve the category dtype when possible closes pandas-dev#10409
…erve the category dtype when possible closes pandas-dev#10409
…erve the category dtype when possible closes pandas-dev#10409
…erve the category dtype when possible closes pandas-dev#10409
…erve the category dtype when possible closes pandas-dev#10409
…erve the category dtype when possible closes pandas-dev#10409
…erve the category dtype when possible closes pandas-dev#10409
…erve the category dtype when possible closes pandas-dev#10409
…erve the category dtype when possible closes pandas-dev#10409
…erve the category dtype when possible closes pandas-dev#10409
…erve the category dtype when possible closes pandas-dev#10409
…erve the category dtype when possible closes pandas-dev#10409
…erve the category dtype when possible closes pandas-dev#10409
…erve the category dtype when possible closes pandas-dev#10409
…erve the category dtype when possible closes pandas-dev#10409
…erve the category dtype when possible closes pandas-dev#10409
…erve the category dtype when possible closes pandas-dev#10409
…erve the category dtype when possible closes pandas-dev#10409
…erve the category dtype when possible closes pandas-dev#10409
…erve the category dtype when possible closes pandas-dev#10409
…erve category dtype closes pandas-dev#10409 Author: Jeff Reback <[email protected]> Closes pandas-dev#15321 from jreback/merge_cat and squashes the following commits: 3671dad [Jeff Reback] DOC: merge docs a4b2ee6 [Jeff Reback] BUG/API: .merge() and .join() on category dtype columns will now preserve the category dtype when possible
The workaround was due to a bug in pandas, pandas-dev/pandas#10409 that has been fixed. When that was fixed upstream, the local fix led to another bug, pandas-dev/pandas#10696!!
xref #14351
None of the following merge operations retain the
category
types. Is this expected? How can I keep them?Merging on a
category
type:Consider the following:
if I do the merge, we end up with:
Merging on a
non-category
type:if I do the merge, we end up with:
The text was updated successfully, but these errors were encountered: