Skip to content

Left join on an integer column casts the column to float if it is in one of the indexes #9958

Closed
@bembom

Description

@bembom

When I left-join two data frames on an int64 column, this column is cast to a float if it is the index of one of the two data frames:

>>> id = [1, 143349558619761320]
>>> a = pd.DataFrame({'a': [1, 2]}, index=pd.Int64Index(id, name='id'))
>>> b = pd.DataFrame({'id': id[1], 'b': [2]})
>>> merge_index = pd.merge(a, b, left_index=True, right_on='id', how='left')
>>> print(merge_index.dtypes)
a       int64
b     float64
id    float64
dtype: object

In this case this is a problem because casting back to an int changes the value of this column:

>>> int(float(143349558619761320))
143349558619761312

If the int64 column is not in the index, the column is not cast to a float:

>>> merge_column = pd.merge(a.reset_index(), b, on='id', how='left')
>>> print(merge_column.dtypes)
id      int64
a       int64
b     float64
dtype: object

I understand that integer columns get cast to a float if NaNs are introduced (like column b here), but in this case the final column contains no missing values, so casting to a float can be avoided.

Output from pd.show_versions():

INSTALLED VERSIONS

commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-5-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.16.0
nose: 1.3.4
Cython: None
numpy: 1.9.2
scipy: 0.15.1
statsmodels: 0.6.1
IPython: 3.0.0
sphinx: None
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.2
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDtype ConversionsUnexpected or buggy dtype conversionsDuplicate ReportDuplicate issue or pull requestReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions