Skip to content

BUG: pd.concat coerces ints to floats if empty DataFrame is present with different columns #42682

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task
XyLearningProgramming opened this issue Jul 23, 2021 · 2 comments
Labels
Bug Closing Candidate May be closeable, needs more eyeballs Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@XyLearningProgramming
Copy link

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note This issue is very similar to Issue #8902, which is fixed in pandas==1.3.0, but in cases where the DataFrame to concat has different columns settings as stated below, pd.concat is still acting weird?

import pandas as pd

# pd.concat with empty DataFrame won't coerce type now 
df = pd.DataFrame([{"col1": 1}, {"col1":2}])
df2 = pd.DataFrame({})
df_new = pd.concat([df,df2])
df_new.col1.dtype # dtype('int64')

# however, pd.concat with empty DataFrame with column different than the original one will still coerce type
df = pd.DataFrame([{"col1": 1}, {"col1":2}])
df2 = pd.DataFrame({}, columns=["whatever"])
df_new = pd.concat([df,df2])
df_new.col1.dtype # dtype('float64')

# pd.concat with empty DataFrame with same columns will still coerce type to `object`
df = pd.DataFrame([{"col1": 1}, {"col1":2}])
df2 = pd.DataFrame({}, columns=["col1"])
df_new = pd.concat([df,df2])
df_new.col1.dtype # dtype('object')

Problem description

pd.concat is acting inconsistently when concatenating empty DataFrame:
If the empty DataFrame is present with same column, that column is changed to dtype('O');
If the empty DataFrame is present with columns different than target, the target DataFrame has its original column values of type int64 coerced to float64;
If the empty DataFrame is present without any column, the target DataFrame keeps its dtype;

Expected Output

I suggest just not coerce values when pd.concat an empty DataFrame...

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : f00ed8f python : 3.9.2.final.0 python-bits : 64 OS : Linux OS-release : 4.14.81.bm.15-amd64 Version : #1 SMP Debian 4.14.81.bm.15 Sun Sep 8 05:02:31 UTC 2019 machine : x86_64 processor : byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.3.0
numpy : 1.19.2
pytz : 2021.1
dateutil : 2.8.1
pip : 21.0.1
setuptools : 49.6.0.post20210108
Cython : 0.29.23
pytest : 6.2.4
hypothesis : None
sphinx : 3.5.4
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : 0.9.2
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.22.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : None
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : 1.4.20
tables : None
tabulate : 0.8.9
xarray : None
xlrd : 2.0.1
xlwt : None
numba : None

@XyLearningProgramming XyLearningProgramming added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 23, 2021
@simonjayhawkins
Copy link
Member

Note This issue is very similar to Issue #8902, which is fixed in pandas==1.3.0, but in cases where the DataFrame to concat has different columns settings as stated below, pd.concat is still acting weird?

I don't think there were any changes in 1.3.0 relating to this behavior?

the output for the code samples seems to be unchanged from at least 1.0.5

@simonjayhawkins simonjayhawkins added Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 26, 2021
@mroeschke mroeschke added the Dtype Conversions Unexpected or buggy dtype conversions label Aug 21, 2021
@jbrockmendel
Copy link
Member

This behavior looks right to me.

@jbrockmendel jbrockmendel added the Closing Candidate May be closeable, needs more eyeballs label May 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Closing Candidate May be closeable, needs more eyeballs Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

4 participants