Skip to content

BUG: selecting from HDFStore with a tz-aware level of a multi-index #11926

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
iyer opened this issue Dec 30, 2015 · 3 comments · Fixed by #27144
Closed

BUG: selecting from HDFStore with a tz-aware level of a multi-index #11926

iyer opened this issue Dec 30, 2015 · 3 comments · Fixed by #27144
Labels
Bug IO HDF5 read_hdf, HDFStore Timezones Timezone data dtype
Milestone

Comments

@iyer
Copy link

iyer commented Dec 30, 2015

I'm encountering a bug when I query for a multiindex dataframe with a timezoned DatetimeIndex in one of the multiindex levels.
This only happens

  1. for a multiindex with one of the levels as timestamps with timezones (As seen in [1]). If timestamps have no timezone set, there is no issue (As seen in [2])
  2. if the query returns no rows
  3. in pandas 0.17.* This was working fine in pandas 0.16.*
In [1]: periods = 10
   ...: dts = pd.date_range('20151201', periods=periods, freq='D', tz='UTC') #WITH TIMEZONE
   ...: mi = pd.MultiIndex.from_arrays([dts, range(periods)], names = ['DATE', 'NO'])
   ...: df = pd.DataFrame({'MYCOL':0}, index=mi)
   ...: file_path = 'table.h5'
   ...: key = 'mykey'
   ...: with pd.HDFStore(file_path, 'w') as store:
   ...:     store.append(key, df, format='table', append=True)
   ...:     dfres = store.select(key, where="""DATE > '20151220'""")
   ...:     print(dfres)
   ...: 
   ...: 
Traceback (most recent call last):

  File "<ipython-input-1-e0b7db50fd4d>", line 9, in <module>
    dfres = store.select(key, where="""DATE > '20151220'""")

  File "/export/data/anaconda/anaconda3.2.4/lib/python3.5/site-packages/pandas/io/pytables.py", line 669, in select
    return it.get_result()

  File "/export/data/anaconda/anaconda3.2.4/lib/python3.5/site-packages/pandas/io/pytables.py", line 1352, in get_result
    results = self.func(self.start, self.stop, where)

  File "/export/data/anaconda/anaconda3.2.4/lib/python3.5/site-packages/pandas/io/pytables.py", line 662, in func
    columns=columns, **kwargs)

  File "/export/data/anaconda/anaconda3.2.4/lib/python3.5/site-packages/pandas/io/pytables.py", line 4170, in read
    df = super(AppendableMultiFrameTable, self).read(**kwargs)

  File "/export/data/anaconda/anaconda3.2.4/lib/python3.5/site-packages/pandas/io/pytables.py", line 4029, in read
    df = concat(frames, axis=1, verify_integrity=False).consolidate()

  File "/export/data/anaconda/anaconda3.2.4/lib/python3.5/site-packages/pandas/tools/merge.py", line 813, in concat
    return op.get_result()

  File "/export/data/anaconda/anaconda3.2.4/lib/python3.5/site-packages/pandas/tools/merge.py", line 995, in get_result
    mgrs_indexers, self.new_axes, concat_axis=self.axis, copy=self.copy)

  File "/export/data/anaconda/anaconda3.2.4/lib/python3.5/site-packages/pandas/core/internals.py", line 4456, in concatenate_block_managers
    for placement, join_units in concat_plan]

  File "/export/data/anaconda/anaconda3.2.4/lib/python3.5/site-packages/pandas/core/internals.py", line 4456, in <listcomp>
    for placement, join_units in concat_plan]

  File "/export/data/anaconda/anaconda3.2.4/lib/python3.5/site-packages/pandas/core/internals.py", line 4553, in concatenate_join_units
    for ju in join_units]

  File "/export/data/anaconda/anaconda3.2.4/lib/python3.5/site-packages/pandas/core/internals.py", line 4553, in <listcomp>
    for ju in join_units]

  File "/export/data/anaconda/anaconda3.2.4/lib/python3.5/site-packages/pandas/core/internals.py", line 4801, in get_reindexed_values
    missing_arr = np.empty(self.shape, dtype=empty_dtype)

TypeError: data type not understood


In [2]: periods = 10
   ...: dts = pd.date_range('20151201', periods=periods, freq='D') #WITHOUT TIMEZONE
   ...: mi = pd.MultiIndex.from_arrays([dts, range(periods)], names = ['DATE', 'NO'])
   ...: df = pd.DataFrame({'MYCOL':0}, index=mi)
   ...: file_path = 'table.h5'
   ...: key = 'mykey'
   ...: with pd.HDFStore(file_path, 'w') as store:
   ...:     store.append(key, df, format='table', append=True)
   ...:     dfres = store.select(key, where="""DATE > '20151220'""")
   ...:     print(dfres)
   ...: 
   ...: 
Empty DataFrame
Columns: [MYCOL]
Index: []

In [3]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-431.11.2.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.1
nose: 1.3.7
pip: 7.1.2
setuptools: 19.1.1
Cython: 0.23.4
numpy: 1.10.2
scipy: 0.16.1
statsmodels: None
IPython: 4.0.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.5.0
openpyxl: 2.2.6
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.7
lxml: 3.5.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.10
pymysql: None
psycopg2: None
Jinja2: None
@iyer iyer changed the title Saving multindex dataframe with datetimes to HDFStore Saving multindex dataframe with timestamps to HDFStore Dec 30, 2015
@jreback
Copy link
Contributor

jreback commented Dec 30, 2015

So its the readback, not the writing. I think that its taking the wrong path on the dtype conversion.

import numpy as np
import pandas as pd

periods = 10
dts = pd.date_range('20151201', periods=periods, freq='D', tz='UTC') #WITH TIMEZONE
mi = pd.MultiIndex.from_arrays([dts, range(periods)], names = ['DATE', 'NO'])
df = pd.DataFrame({'MYCOL':0}, index=mi)

file_path = 'table.h5'
key = 'mykey'

with pd.HDFStore(file_path, 'w') as store:
   store.append(key, df, format='table', append=True)

print(pd.read_hdf(file_path, key))


dfres = pd.read_hdf(file_path, key, where="DATE > 20151220")
print(dfres)

@jreback jreback added Bug IO HDF5 read_hdf, HDFStore Timezones Timezone data dtype labels Dec 30, 2015
@jreback jreback added this to the Next Major Release milestone Dec 30, 2015
@jreback jreback changed the title Saving multindex dataframe with timestamps to HDFStore BUG: selecting from HDFStore with a tz-aware level of a multi-index Dec 30, 2015
@mmongeon-aa
Copy link

Has there been any update to patch this? Any ideas on which commit broke this since 0.16* -> 0.17*?

I'm encountering the same issue when selecting datetime64[ns, tz] data using an iterator.

@jreback
Copy link
Contributor

jreback commented Feb 18, 2016

there are vast changes to the way tz's work in 0.17 vs. 0.16. see the whatsnew here.

This is a relatively simple fix however. pull-requests are welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO HDF5 read_hdf, HDFStore Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants