Skip to content

BUG: dtypes on empty frame are incorrect #4272

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue Jul 17, 2013 · 7 comments
Closed

BUG: dtypes on empty frame are incorrect #4272

jreback opened this issue Jul 17, 2013 · 7 comments
Labels
API Design Bug Dtype Conversions Unexpected or buggy dtype conversions
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Jul 17, 2013

In [1]: df = DataFrame(columns=list('xyz'))

In [2]: df
Out[2]: 
Empty DataFrame
Columns: [x, y, z]
Index: []

In [3]: df.dtypes
Out[3]: 
x   NaN
y   NaN
z   NaN
dtype: float64

In [5]: df['x'].dtype
Out[5]: dtype('O')

expected

In [11]: Series(dict([ (s,np.dtype('O')) for s in list('xyz')]))
Out[11]: 
x    object
y    object
z    object
dtype: object
@hayd
Copy link
Contributor

hayd commented Jul 17, 2013

This is bug in apply with an empty DataFrame/Series, I think I have a fix for it coming up.

I remember it was fixed in #2476 with this commit bb7eaff

@jreback
Copy link
Contributor Author

jreback commented Jul 17, 2013

I think this got messed up after that.....need an explicit test for it....thanks

@hayd
Copy link
Contributor

hayd commented Jul 18, 2013

I have put together commit which better special cases empty DataFrames and does fix this.

However it breaks test_apply_empty_infer_type whose tests seems a little sketchy to me, for example:

no_cols = DataFrame(index=['a', 'b', 'c'])
no_cols.apply(lambda x: x.mean()) 

Tests that this ought to be a DataFrame... whereas I think this ought to be a Series.

...there really is some ambiguity in when DataFrame should be pushed down to Series in an apply and same with columns names/series name.

@jreback
Copy link
Contributor Author

jreback commented Jul 18, 2013

This is a series (in master)

In [1]: no_cols = DataFrame(index=['a', 'b', 'c'])

In [2]: no_cols.apply(lambda x: x.mean()) 
Out[2]: Series([], dtype: float64)

@jreback
Copy link
Contributor Author

jreback commented Aug 22, 2013

@hayd can you revisit this....and see if you can fix the isues here? thxs

@hayd
Copy link
Contributor

hayd commented Aug 22, 2013

@jreback yeah, will have a look at the weekend, will at least write all the cases down (where I think there is ambiguity). IIRC there was an issue where fixing one thing that seemed obviously wrong broke another which had seemed obviously correct.

@jreback
Copy link
Contributor Author

jreback commented Sep 29, 2013

@hayd actually I am not so sure that this is wrong (nor does it really matter), and default dtype is np.float64, as the dtypes would be changed/inferred if stuff was added.....going to close as not a bug (if you disagree, can pls reopen and move to 0.14)

@jreback jreback closed this as completed Sep 29, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Bug Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

No branches or pull requests

2 participants