-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Left join on an integer column casts the column to float if it is in one of the indexes #9958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I just met exactly the same problem with having casted float64 after left joining with two int64 fields which one of them is an index of left dataframe. And here's output from show_versions. INSTALLED VERSIONScommit: None pandas: 0.16.2 |
merge has the same error: testb = pd.DataFrame([{"a":2,"c":2}])
testa = pd.DataFrame([{"a":1,"b":2}])
result = pd.merge(testa, testb, on=["a"], how="outer") |
Just hit the same problem. However, if |
Just encountered the same issue. Here is how I fixed it # old code - casts 'eCountryID' to float64
merged = pd.merge(
analyzed, ecountry_data[['CountryCode', 'eCountryID']],
how='left', on=['CountryCode'])
# new code
merged_country_id = pd.merge(
analyzed, ecountry_data[['CountryCode', 'eCountryID']],
how='inner', on=['CountryCode']).astype(object)
no_country_id = analyzed[pd.isnull(analyzed['CountryCode'])]
merged = pd.concat([merged_country_id, no_country_id]) |
this is a dupe of #8596, closed by #13170 which will be in 0.18.2 see http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#merging-changes |
Should a new issue be opened for the |
@flipdazed this looks ok
|
How to compare result['a'][1] and result['c'][1] after outer merge? they should be equal.
|
Edit: What I see happening is actually a join casting ints to floats if the result of the join contains NaN. I'm not 100% sure, but I think this is the expected behavior. Sorry for the confusion. I see this still happening in 0.23.2. Did it sneak in again? (Left join with int index as described above) |
It is happening for me in version 0.23.3, using outer join. Edit: I agree with teoguso's edit, the casting only occurs if the join subsequently contains NaNs. That explains the casting then. Edit2: https://pandas.pydata.org/pandas-docs/stable/missing_data.html explains the casting rules for anyone (such as myself) who was unaware of them. |
@Muff2n please see my edit above. |
When I left-join two data frames on an int64 column, this column is cast to a float if it is the index of one of the two data frames:
In this case this is a problem because casting back to an int changes the value of this column:
If the int64 column is not in the index, the column is not cast to a float:
I understand that integer columns get cast to a float if NaNs are introduced (like column b here), but in this case the final column contains no missing values, so casting to a float can be avoided.
Output from pd.show_versions():
INSTALLED VERSIONS
commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-5-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.16.0
nose: 1.3.4
Cython: None
numpy: 1.9.2
scipy: 0.15.1
statsmodels: 0.6.1
IPython: 3.0.0
sphinx: None
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.2
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
The text was updated successfully, but these errors were encountered: