Skip to content

BUG/ER: HDFStore write with empty frame reports an error (rather than suceeding) #4273

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue Jul 17, 2013 · 3 comments · Fixed by #4660
Closed

BUG/ER: HDFStore write with empty frame reports an error (rather than suceeding) #4273

jreback opened this issue Jul 17, 2013 · 3 comments · Fixed by #4660
Labels
Bug Error Reporting Incorrect or improved errors from pandas IO HDF5 read_hdf, HDFStore
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Jul 17, 2013

writing to an HDFStore with an empty-frame with invalid dtypes raises, maybe should just proceed (or is it the dtypes call that is actually wrong here: see #4272)

came up in this question: http://stackoverflow.com/questions/17691912/problems-with-merging-on-disk-tables-with-millions-of-rows/17698740#17698740

In [26]: df = DataFrame(randn(10,2),columns=list('AB'))

In [28]: df['C'] = 'foo'

In [33]: df.to_hdf('test.h5','df',mode='w',table=True)

In [35]: pd.read_hdf('test.h5','df')
Out[35]: 
          A         B    C
0 -1.123712 -1.146515  foo
1  0.921705  1.800419  foo
2 -0.769236 -0.553307  foo
3 -0.747601 -1.783439  foo
4 -1.110340  1.601026  foo
5  0.743869 -2.135140  foo
6  1.033699  2.028479  foo
7 -0.755478 -1.060223  foo
8  0.079326 -2.671624  foo
9 -2.262756  0.406850  foo

In [36]: pd.read_hdf('test.h5','df').dtypes
Out[36]: 
A    float64
B    float64
C     object
dtype: object

In [37]: df[df.C=='bar']
Out[37]: 
Empty DataFrame
Columns: [A, B, C]
Index: []

In [38]: df[df.C=='bar'].dtypes
Out[38]: 
A   NaN
B   NaN
C   NaN
dtype: float64

In [39]: df[df.C=='bar'].to_hdf('test.h5','df',append=True)
TypeError: Cannot serialize the column [C] because
its data contents are [empty] object dtype
@abrakababra
Copy link

The DataFrame must not necessarily be completely empty. ALL NaNs are simply omitted when using 'append=True' in to_hdf.

@jreback
Copy link
Contributor Author

jreback commented Aug 21, 2013

@abrakababra that is a separate issue, I thought I had an issue for the all-nan dropping (which is done for efficiency really), but agreed should have way not to do it

@jreback
Copy link
Contributor Author

jreback commented Aug 21, 2013

see #4625

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Error Reporting Incorrect or improved errors from pandas IO HDF5 read_hdf, HDFStore
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants