Skip to content

Type casting after merge #17044

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
albertvillanova opened this issue Jul 21, 2017 · 2 comments · Fixed by #29030
Closed

Type casting after merge #17044

albertvillanova opened this issue Jul 21, 2017 · 2 comments · Fixed by #29030
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions

Comments

@albertvillanova
Copy link
Contributor

albertvillanova commented Jul 21, 2017

Code Sample, a copy-pastable example if possible

In [2]: df1 = pd.DataFrame({'id':[1.,2.], 'c1':[10,20]})

In [3]: df2 = pd.DataFrame({'id':[2], 'c2':[200]})

In [4]: df = df1.merge(df2, on='id', how='left')

In [5]: df['id'].dtype
Out[5]: dtype('O')

Problem description

When I merge 2 DataFrames on a column (id) which is of type float in one of the DataFrames and of type int in the other DataFrame, the resulting DataFrame column is of type Object. Why? I guess the resulting DataFrame should keep the float type.

Expected Output

In [5]: df['id'].dtype
Out[5]: dtype('float64')

Indeed, this was the behavior for pandas 0.19.2.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Linux OS-release: 2.6.32-431.29.2.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C LOCALE: None.None

pandas: 0.20.2
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.13.1
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Jul 21, 2017

This is as expected. We don't cast keys on divergent types for the keys. I suppose you could try to cast this upfront and make them both floats. So if you wanted to do a PR would be ok.

@jreback jreback added Difficulty Intermediate Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode Enhancement labels Jul 21, 2017
@jreback jreback modified the milestones: 0.21.0, Next Major Release Jul 21, 2017
@mroeschke
Copy link
Member

Looks to be fixed in master. Could use a test.

In [39]: In [2]: df1 = pd.DataFrame({'id':[1.,2.], 'c1':[10,20]})
    ...:
    ...: In [3]: df2 = pd.DataFrame({'id':[2], 'c2':[200]})
    ...:
    ...: In [4]: df = df1.merge(df2, on='id', how='left')
    ...:
    ...: In [5]: df['id'].dtype
Out[39]: dtype('float64')

In [40]: pd.__version__
Out[40]: '0.26.0.dev0+565.g8c5941cd5'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Difficulty Intermediate Dtype Conversions Unexpected or buggy dtype conversions Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Oct 15, 2019
albertvillanova added a commit to albertvillanova/pandas that referenced this issue Oct 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants