Unable to write to HDF5 table if DataFrame has mixed object types (pd.Timestamp and str)

When attempting to store data in an HDF5 table, I found a problem where an error is raised if there are multiple object columns containing different data.

``` python

import pandas as pd

data = {'ints':pd.Series([1,2,3], index=index), 'Timestamps': pd.Series([pd.Timestamp('2014-1-1 12:00', tz='UTC'), pd.Timestamp('2014-1-2 12:00', tz='UTC'), pd.Timestamp('2014-1-3 12:00', tz='UTC')], index=index), 'strings': pd.Series(['r','g','b'], index=index)}

df = pd.DataFrame(data)

df.to_hdf('test.h5', 'data', format='table')
```

This leads to an exception: TypeError: Cannot serialize the column [Timestamps] because
its data contents are [datetime] object dtype

However, if I remove the string column:

``` python
del df['strings']
df.to_hdf('test.h5', 'data', format='table')
```

Now it works fine - so it isn't a problem with using the pd.Timestamp type.

Digging a little deeper, it appears the problem is that pandas.io.pytables.Table.create_axes groups the columns by data type, with all columns of type object being grouped into one set of data. Then when set_atom is called, it does this:

``` python
rvalues = block.values.ravel()
inferred_type = lib.infer_dtype(rvalues)
```

This leads to an inferred type of 'mixed' since there are multiple types of objects present, and this isn't handled and throws the exception.

As a fix, it seems that each object column should be handled separately, or at least grouped by the inferred type. I haven't committed to pandas before, or dug this deeply into this section of code, so I'm not sure of the best way to fix this and what other implications there may be, but I'd be happy to help however I can.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Unable to write to HDF5 table if DataFrame has mixed object types (pd.Timestamp and str) #8284

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Unable to write to HDF5 table if DataFrame has mixed object types (pd.Timestamp and str) #8284

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions