Skip to content

When merging on boolean index in right dataframe, merge key becomes an object #23884

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mattwigway opened this issue Nov 24, 2018 · 4 comments
Closed
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@mattwigway
Copy link

mattwigway commented Nov 24, 2018

Code Sample, a copy-pastable example if possible

import pandas as pd

data = pd.DataFrame(dict(a=[1, 2], b=[True, False]))
data2 = pd.DataFrame(dict(b=[False, True], c=[4, 3])).set_index('b')

print('data dtypes:')
print(data.dtypes)

print('\ndata2 dtypes')
print(data2.dtypes)

merged = data.merge(data2, left_on='b', right_index=True)

print('\nmerged dtypes')
print(merged.dtypes)

Output:

a    int64
b     bool
dtype: object

data2 dtypes
c    int64
dtype: object

merged dtypes
a     int64
b    object
c     int64
dtype: object

Problem description

I have a dataframe with which has a column of type bool, and another dataframe where the index is of type bool. When I merge the first dataframe with the second, the column is promoted to an object, rather than maintaining its boolean type. This causes issues because the ~ operator then no longer inverts true and false; I have to manually convert the column back to a boolean. I'm guessing this is because there's not a BooleanIndex type in Pandas, so the index type is object. But it's quite confusing when columns unexpectedly change types during a merge.

The boolean index is part of a MultiIndex in my actual project, but the issue occurs even with a single index as shown in the MWE above.

Expected Output

column b should remain a boolean.

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.7.0.final.0
python-bits: 64
OS: Darwin
OS-release: 18.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: None
pip: 10.0.1
setuptools: 39.2.0
Cython: None
numpy: 1.14.5
scipy: 1.1.0
pyarrow: 0.11.0
xarray: None
IPython: 6.4.0
sphinx: None
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: 0.4.0
matplotlib: 2.2.2
openpyxl: 2.5.9
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: 4.2.4
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.12
pymysql: None
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
@gfyoung gfyoung added the Dtype Conversions Unexpected or buggy dtype conversions label Nov 26, 2018
@gfyoung
Copy link
Member

gfyoung commented Nov 26, 2018

I suspect this has to deal with the fact that the dtype of data2.index is object, even though the inferred data type is data2.index is boolean. An investigation and PR would be most welcome!

@gfyoung gfyoung added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Nov 26, 2018
@mattwigway
Copy link
Author

A possible solution would be to always preserve the dtype of the column in the left dataframe during a merge (and, in the case of a right or outer merge, if there are values in the right dataframe that cannot be coerced to the dtype of the column on the left, an error could be raised).

@jreback
Copy link
Contributor

jreback commented Nov 27, 2018

this is the same issue as addressed in this PR
#21681

also believe this test car already exists there

@mroeschke
Copy link
Member

Appears that might have been closed by #21681. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

4 participants