BUG: round-trip of tz in an index using fixed-format for HDF5 #8165

colbrac · 2014-09-03T10:47:12Z

With Python 2.7.6, pandas 0.13.1 and numpy 1.8.1, pytables 3.1.1, both 32 and 64bit (Python x,y and WinPython), I can load my hdf5 file.
With Python 3.4.1 64bit, pandas 0.14.1, numpy 1.8.2, pytables 3.1.1 (Anaconda3 2.0.1) I get the following error:

Traceback (most recent call last):

File "", line 1, in
test = pd.read_hdf('datafile.h5', 'data')

File "C:\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 330, in read_hdf
return f(store, True)

File "C:\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 322, in
key, auto_close=auto_close, **kwargs)

File "C:\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 669, in select
auto_close=auto_close).get_values()

File "C:\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 1335, in get_values
results = self.func(self.start, self.stop)

File "C:\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 658, in func
columns=columns, **kwargs)

File "C:\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 2658, in read
ax = self.read_index('axis%d' % i)

File "C:\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 2257, in read_index
_, index = self.read_index_node(getattr(self.group, key))

File "C:\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 2385, in read_index_node
_unconvert_index(data, kind, encoding=self.encoding), **kwargs)

File "C:\Anaconda3\lib\site-packages\pandas\core\index.py", line 125, in new
result = DatetimeIndex(data, copy=copy, name=name, **kwargs)

File "C:\Anaconda3\lib\site-packages\pandas\tseries\index.py", line 301, in new
infer_dst=infer_dst)

File "tslib.pyx", line 2165, in pandas.tslib.tz_localize_to_utc (pandas\tslib.c:33574)

File "tslib.pyx", line 2082, in pandas.tslib._get_deltas (pandas\tslib.c:32187)

File "tslib.pyx", line 872, in pandas.tslib._get_utcoffset (pandas\tslib.c:16036)

AttributeError: 'numpy.bytes_' object has no attribute 'utcoffset'

The index in question is:
class 'pandas.tseries.index.DatetimeIndex'
[2013-04-03 00:00:00+02:00, ..., 2013-04-04 00:00:00+02:00]
Length: 8641, Freq: 10S, Timezone: Europe/Amsterdam

colbrac · 2014-09-03T11:30:56Z

After regenerating the h5 from the source files, I get the differences in group / leaf properties as shown through ViTables above.

Note: the h5 file generated with Pandas 0.14.1 in Python 3 opens with Pandas 0.13.1 in Python 2 but not vice versa.

jreback · 2014-09-03T13:18:26Z

pls show pd.show_versions() in each session you are trying
show all code (writing & reading)
show df.info() of the data

colbrac · 2014-09-03T13:45:20Z

Great (not), while generating the output both datafiles fail to open in Py3:

############### Py2 ####################

import pandas as pd
pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.6.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: nl_NL

pandas: 0.13.1
Cython: 0.20.1
numpy: 1.8.1
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.1.0
sphinx: 1.2.2
patsy: 0.2.1
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.3
bottleneck: 0.8.0
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.3.1
openpyxl: 2.0.3
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: None
sqlalchemy: 0.9.4
lxml: 3.3.5
bs4: 4.3.2
html5lib: 0.999
bq: None
apiclient: None

data = pd.read_hdf('datafile-py2.h5', 'received')
data_py3 = pd.read_hdf('datafile-py3.h5', 'received')

No errors.
class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 10081 entries, 2014-08-11 00:00:00+02:00 to 2014-08-18 00:00:00+02:00
Freq: T
Data columns (total 1085 columns):
001e:5e09:0200:1b50 int32
001e:5e09:0200:1b51 int32
001e:5e09:0200:1b52 int32
(...)
dtypes: int32(1085)

class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 10081 entries, 2014-08-25 00:00:00+02:00 to 2014-09-01 00:00:00+02:00
Freq: T
Data columns (total 1082 columns):
001e:5e09:0200:1b50 int32
001e:5e09:0200:1b51 int32
001e:5e09:0200:1b52 int32
(...)
dtypes: int32(1082)

############### Py3 ####################
import pandas as pd

pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.4.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: nl_NL

pandas: 0.14.1
nose: 1.3.3
Cython: 0.20.1
numpy: 1.8.2
scipy: 0.14.0
statsmodels: None
IPython: 2.2.0
sphinx: 1.2.2
patsy: 0.2.1
scikits.timeseries: None
dateutil: 2.1
pytz: 2014.4
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.0
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.5.5
lxml: 3.3.5
bs4: 4.3.1
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.4
pymysql: None
psycopg2: None

data = pd.read_hdf('datafile-py2.h5', 'received')
Traceback (most recent call last):

File "", line 1, in
data = pd.read_hdf('datafile-py2.h5', 'received')