Skip to content

combine_first changes datatype #28613

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mohitanand001 opened this issue Sep 25, 2019 · 6 comments · Fixed by #45095
Closed

combine_first changes datatype #28613

mohitanand001 opened this issue Sep 25, 2019 · 6 comments · Fixed by #45095
Labels
Dtype Conversions Unexpected or buggy dtype conversions good first issue Needs Tests Unit test(s) needed to prevent regressions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@mohitanand001
Copy link
Contributor

mohitanand001 commented Sep 25, 2019

df_1 = pd.DataFrame(data = {'A' : [1, 2, 3], 'B' : [4, 5, 6]})
df_2 = pd.DataFrame(data = {'A' : [1, 20, 30], 'B' : [40, 50, 60], 'C' : [12, 34, 65]})
df_1.combine_first(df_2).dtypes
A      int64
B      int64
C    float64
dtype: object

Problem description

The combine_first changes the datatype of the the column C to float64 from int64. Is this
an expected behavior

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : None python : 3.7.1.final.0 python-bits : 64 OS : Windows OS-release : 10 machine : AMD64 processor : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : None.None

pandas : 0.25.0
numpy : 1.16.2
pytz : 2018.7
dateutil : 2.7.5
pip : 18.1
setuptools : 40.6.3
Cython : 0.29.2
pytest : 4.0.2
hypothesis : None
sphinx : 1.8.2
blosc : None
feather : 0.4.0
xlsxwriter : 1.1.2
lxml.etree : 4.2.5
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.8.3 (dt dec pq3 ext lo64)
jinja2 : 2.10
IPython : 7.2.0
pandas_datareader: None
bs4 : 4.6.3
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.2.5
matplotlib : 3.0.2
numexpr : 2.6.8
odfpy : None
openpyxl : 2.5.12
pandas_gbq : None
pyarrow : 0.13.0
pytables : None
s3fs : None
scipy : 1.1.0
sqlalchemy : 1.2.15
tables : 3.4.4
xarray : None
xlrd : 1.0.0
xlwt : 1.3.0
xlsxwriter : 1.1.2

@mroeschke mroeschke added Bug Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Nov 3, 2019
@ghost
Copy link

ghost commented Mar 5, 2020

@mroeschke I would like to work on this issue. Is this issue suitable for 1st time contributors and if so, could I get more more details on the problem and suggestions on how to fix the issue?

@mroeschke
Copy link
Member

We have a good first issue label that indicates issues that are good for first time contributors

https://github.com/pandas-dev/pandas/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22

@mroeschke
Copy link
Member

This looks fixed on master. Could use a test

In [6]: df_1 = pd.DataFrame(data = {'A' : [1, 2, 3], 'B' : [4, 5, 6]})
   ...: df_2 = pd.DataFrame(data = {'A' : [1, 20, 30], 'B' : [40, 50, 60], 'C' : [12, 34, 65]})
   ...: df_1.combine_first(df_2).dtypes
Out[6]:
A    int64
B    int64
C    int64
dtype: object

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug labels Jul 21, 2021
@goyal-aman
Copy link

@mroeschke I would like to work on it. Though this would be my first contribution. Can you suggest how to go about doing this?

@mroeschke
Copy link
Member

@goyal-aman
Copy link

goyal-aman commented Oct 17, 2021

@mroeschke Hi, does this make any sense? I have little understanding on what is acceptable test. If this is correct, then I can
write for float and other types as well.

  def test_combine_first_dtype_stay_same_int(self):
      df_1 = pd.DataFrame(data = {'A' : [1, 2, 3], 'B' : [4, 5, 6]})
      df_2 = pd.DataFrame(data = {'A' : [1, 20, 30], 'B' : [40, 50, 60], 'C' : [12, 34, 65]})
      result = df_1.combine_first(df_2)
  
      exp = pd.DataFrame(data = {
          'A':[1, 2, 3],
          'B':[4, 5, 6], 
          'C':[12, 34, 65]
      })        
      tm.assert_equal(result.dtypes, exp.dtypes)

@jreback jreback added this to the 1.4 milestone Dec 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions good first issue Needs Tests Unit test(s) needed to prevent regressions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants