Skip to content

BUG: concat of all-nan with empty frame produces object dtype #9188

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue Jan 2, 2015 · 3 comments · Fixed by #41389
Closed

BUG: concat of all-nan with empty frame produces object dtype #9188

jreback opened this issue Jan 2, 2015 · 3 comments · Fixed by #41389
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Jan 2, 2015

xref #9149

The resultant has all missing and when concat with an empty frame should result in the same dtype (and not promotion)

In [1]: df_1 = pd.DataFrame({"Row":[0,1,1], "EmptyCol":np.nan, "NumberCol":[1,2,3]})

In [2]: df_2 = pd.DataFrame(columns = df_1.columns)

In [3]: df_concat = pd.concat([df_1, df_2], axis=0)

In [4]: df_1.dtypes
Out[4]: 
EmptyCol     float64
NumberCol      int64
Row            int64
dtype: object

In [6]: df_2.dtypes
Out[6]: 
EmptyCol     object
NumberCol    object
Row          object
dtype: object

In [7]: df_concat.dtypes
Out[7]: 
EmptyCol      object
NumberCol    float64
Row          float64
dtype: object
@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Dtype Conversions Unexpected or buggy dtype conversions labels Jan 2, 2015
@jreback jreback added this to the 0.16.0 milestone Jan 2, 2015
@tvyomkesh
Copy link
Contributor

Ah just noticed (via xref), that this may be already being looked into by someone else. I will freeze this work here. Sorry for the distraction.

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@TomAugspurger
Copy link
Contributor

Just a note that this is handled correctly for Series:

In [18]: a = pd.Series(pd.date_range('2000', periods=4, tz='UTC'))

In [19]: b = pd.Series([])

In [20]: pd.concat([a, b]).dtype
Out[20]: datetime64[ns, UTC]

In [21]: pd.concat([a.to_frame(), b.to_frame()]).dtypes
Out[21]:
0    object
dtype: object

So Out[20] is correct, but the DataFrame version is not.

@mroeschke
Copy link
Member

This looks correct from Tom's example now. Could use a test

In [10]: In [18]: a = pd.Series(pd.date_range('2000', periods=4, tz='UTC'))
    ...:
    ...: In [19]: b = pd.Series([])
<ipython-input-10-74472f1f757a>:3: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
  b = pd.Series([])

In [11]: pd.concat([a.to_frame(), b.to_frame()]).dtypes
Out[11]:
0    datetime64[ns, UTC]
dtype: object

In [12]: pd.__version__
Out[12]: '1.3.0.dev0+1287.g895f0b4022'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Apr 12, 2021
@simonjayhawkins simonjayhawkins modified the milestones: Contributions Welcome, 1.3 May 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants