PyTables dates don't work when you switch to a different time zone #2852

darindillon · 2013-02-12T18:29:20Z

Create a dataframe with a date (not datetime) column as the index. Save that dataframe to the HDFStore. Now open that same file on a computer in a time zone that's behind you. (IE, write the file in New York and then read the file from Dallas). Result: all the dates are off by one.
Here is a code snippet to demonstrate the problem, and also a proposed fix to the pandas code. (But someone more qualified than me can review this fix).

To repro the problem:

write a file in New York time zone

import pandas
import datetime

def writeToFile(fileName) :
dates = [ datetime.date.today() , datetime.date.today(), datetime.date.today() ]
numbers = [ 1, 2, 3 ]
p = pandas.DataFrame({ "date" : dates, "number": numbers}, columns = ['date', 'number'])
p.set_index('date', inplace=True)
store = pandas.HDFStore(fileName)
store['obj1'] = p
store.close()

Then go to a dallas time zone and read that file:

def readFromFile(fileName) :
store = pandas.HDFStore(fileName)
return store['obj1']

Notice that the dates on the dataframe now show YESTERDAY when they ought to show today.

Proposed fix in pandas.io.pytables.py:

def _convert_index(index):
...
elif inferred_type == 'date':
#OLD LINE: CHANGE THIS LINE
#converted = np.array([time.mktime(v.timetuple()) for v in values],
# dtype=np.int32)
#NEW LINE:
converted = np.array([v.toordinal() for v in values], dtype=np.int32)

and then also change:

def _unconvert_index(data, kind):
...
elif kind == 'date':
# here we'll try from ordinal first. If the date was saved with the old
# mktime mechanism it'll throw an exception as it'll be out of bounds.
# in those cases we'll convert using the old method (with the bug!)
try:
index = np.array([date.fromordinal(v) for v in data], dtype=object)
except ValueError:
index = np.array([date.fromtimestamp(v) for v in data], dtype=object)

jreback · 2013-02-12T18:55:14Z

this is not well tested
nor are dates really well supported (datetimes are really the way to store as can represent as int64 internally)

but will take a look

thanks

On Feb 12, 2013, at 1:29 PM, tavistmorph [email protected] wrote:

Create a dataframe with a date (not datetime) column as the index. Save that dataframe to the HDFStore. Now open that same file on a computer in a time zone that's behind you. (IE, write the file in New York and then read the file from Dallas). Result: all the dates are off by one.
Here is a code snippet to demonstrate the problem, and also a proposed fix to the pandas code. (But someone more qualified than me can review this fix).

To repro the problem:

#write a file in New York time zone
import pandas
import datetime

def writeToFile(fileName) :
dates = [ datetime.date.today() , datetime.date.today(), datetime.date.today() ]
numbers = [ 1, 2, 3 ]
p = pandas.DataFrame({ "date" : dates, "number": numbers}, columns = ['date', 'number'])
p.set_index('date', inplace=True)
store = pandas.HDFStore(fileName)
store['obj1'] = p
store.close()

#Then go to a dallas time zone and read that file:

def readFromFile(fileName) :
store = pandas.HDFStore(fileName)
return store['obj1']

#Notice that the dates on the dataframe now show YESTERDAY when they ought to show today.

Proposed fix in pandas.io.pytables.py:

def _convert_index(index):
...
elif inferred_type == 'date':
#OLD LINE: CHANGE THIS LINE
#converted = np.array([time.mktime(v.timetuple()) for v in values],

dtype=np.int32)

#NEW LINE:
converted = np.array([v.toordinal() for v in values], dtype=np.int32)

and then also change:

def _unconvert_index(data, kind):
...
elif kind == 'date':

here we'll try from ordinal first. If the date was saved with the old

mktime mechanism it'll throw an exception as it'll be out of bounds.

in those cases we'll convert using the old method (with the bug!)

try:
index = np.array([date.fromordinal(v) for v in data], dtype=object)
except ValueError:
index = np.array([date.fromtimestamp(v) for v in data], dtype=object)

—
Reply to this email directly or view it on GitHub.

jreback · 2013-02-12T21:07:21Z

this solution looks reasonable

can you do a PR with this (and some tests?)

not sure how to 'fake' the timezone....prob have to do it explicity..

Also if the toordinal returns an int? then the column should be Int32
can they live in 32?

and then don't actually need the try: except: on the unconvert, instead you can check
the column type (you already now its a 'date', so if its a Time32 then
old method, Int32 new method)

darindillon · 2013-02-12T21:11:56Z

Not sure what a PR is (I assume that's a Git thing, and I'm new to Git). But yes, we have tested the fix I described above significantly in our environment with our real-world dataframes and it's working fine for us.

jreback · 2013-02-12T21:14:31Z

a pull-request
you can essentially contribute a patch to thhe main code-base and see that nothing else breaks
https://github.com/pydata/pandas/blob/master/CONTRIBUTING.md

wesm · 2013-03-12T15:38:45Z

Action here?

jreback · 2013-03-12T15:47:55Z

pushing till 0.12, have to store extra data for the column, so need to thing how general this needs to be (see #2391, @scottkidder wanted to work on this)

jreback · 2013-05-08T21:48:01Z

this is closed by #3531

numpand · 2013-09-13T13:15:11Z

This issue is not fixed in pandas 0.12.0. I just reproduced it by following the test case outlined in the description.

jreback · 2013-09-13T13:21:18Z

this particular issue is untestable so closed. The store correctly stores/retrieves timezone aware datetimes in 0.12. What exactly are you reproducing?

numpand · 2013-09-13T13:35:42Z

I am using dates (not datetimes). I stored dates in a dataframe index and then saved to an hdf store in Eastern timezone, and then retrieved it in Central timezone and got (date - 1) in the index. What exactly do you mean by this issue being untestable?

jreback · 2013-09-13T13:37:57Z

datetime.dates are not supported for this issue, only datetime.datetimes. and in general datetime.dates are not really supported by pandas for many reasons.

jreback · 2013-09-13T13:39:58Z

what's untestable is resetting the default time zone on the computer (well it IS testable, but the datetimes work fine) so its not an issue. you are having a different issue, in that datetime.date are only really supported somewhat for historical reasons

numpand · 2013-09-13T13:49:42Z

Well, at the end of the day, datetime.date is "supported" because there is an "if" statement checking for it and doing something based on that (something that's causing a problem when using the hdf store on two different timezones). Is there any harm in making the change suggested by the original submitter?

jreback · 2013-09-13T14:02:54Z

can you create a self-contained example for testing? (if so, then no problem)

numpand · 2013-09-14T18:31:34Z

https://www.wakari.io/sharing/bundle/johnv/pandas_issue_2852 - shared an ipython notebook on wakari. Let me know if you have problems accessing it (code won't work on Windows because of missing tzset). I'll repeat the code below just in case:

import datetime
import os
import time
import pandas as pd

def setTZ(tz):
    os.environ['TZ']=tz
    time.tzset()

setTZ('EST5EDT')
today = datetime.date(2013,9,10)
df = pd.DataFrame([1,2,3], index = [today, today, today])
print df
filename = 'test.hdf5'
store = pd.HDFStore(filename)
store['obj1'] = df
store.close()

setTZ('CST6CDT')
store = pd.HDFStore(filename)
read_df = store['obj1']
store.close()
print read_df

df.index[0]==read_df.index[0]

jreback · 2013-09-14T21:11:53Z

@numpand thanks...see PR #4841

just note that storing a datetime.date in HDF or using it in pandas is quite inefficient as the data is pickled.

Even manipulating in pandas is very inefficient as these are treated as object dtype. You are basically using python for everything, rather than efficient numpy dtypes.

You should ALWAYS use Timestamp objects instead, which give a dtype of datetime64[ns], and DatetimeIndex. Doing it this way doesn't allow any features of date-like indexes.

Is there a reason you and @tavistmorph store things this way?

darindillon · 2013-09-14T21:36:35Z

We want to be able to use Datetime.date because its a standard, standard, standard python type that's built in to the language and is needed to interact with hundreds of other non-pandas libraries. No question that datetime64 is more efficient, but for data less than a million rows, the efficiency gains are outweighted by the convenience of not having to convert datetime64 to date.datetime every time we want to interact with some other library.
In my case, the dataframe is built from dates provided by another library date.datetime is an official python standard so it seems odd not to support it.
At the very least, if we are not supporting it, we should at least raise a NotSupportedException or something rather than giving the wrong behavior described above. We either support it or we don't -- there is no middle ground.

On Sep 14, 2013, at 5:12 PM, jreback [email protected] wrote:

@numpand thanks...see PR #4841

just note that storing a datetime.date in HDF or using it in pandas is quite inefficient as the data is pickled.

Even manipulating in pandas is very inefficient as these are treated as object dtype. You are basically using python for everything, rather than efficient numpy dtypes.

You should ALWAYS use Timestamp objects instead, which give a dtype of datetime64[ns], and DatetimeIndex. Doing it this way doesn't allow any features of date-like indexes.

Is there a reason you and @tavistmorph store things this way?

—
Reply to this email directly or view it on GitHub.

jreback · 2013-09-14T21:39:22Z

PR was just merged. you are welcome.

numpand · 2013-09-14T21:43:28Z

@jreback thank you! I'll keep in mind the performance penalty and revisit the usage of datetime.date in DataFrames in existing code.

jreback mentioned this issue Mar 12, 2013

ENH: PyTables Enhancements for future #2391

Closed

jreback mentioned this issue May 6, 2013

ENH: HDFStore enhancements #3531

Merged

jreback closed this as completed May 8, 2013

jreback mentioned this issue Sep 14, 2013

BUG: store datetime.date objects in HDFStore as ordinals rather then timetuples to avoid timezone issues (GH2852) #4841

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyTables dates don't work when you switch to a different time zone #2852

PyTables dates don't work when you switch to a different time zone #2852

darindillon commented Feb 12, 2013

jreback commented Feb 12, 2013

dtype=np.int32)

here we'll try from ordinal first. If the date was saved with the old

mktime mechanism it'll throw an exception as it'll be out of bounds.

in those cases we'll convert using the old method (with the bug!)

jreback commented Feb 12, 2013

darindillon commented Feb 12, 2013

jreback commented Feb 12, 2013

wesm commented Mar 12, 2013

jreback commented Mar 12, 2013

jreback commented May 8, 2013

numpand commented Sep 13, 2013

jreback commented Sep 13, 2013

numpand commented Sep 13, 2013

jreback commented Sep 13, 2013

jreback commented Sep 13, 2013

numpand commented Sep 13, 2013

jreback commented Sep 13, 2013

numpand commented Sep 14, 2013

jreback commented Sep 14, 2013

darindillon commented Sep 14, 2013

jreback commented Sep 14, 2013

numpand commented Sep 14, 2013

PyTables dates don't work when you switch to a different time zone #2852

PyTables dates don't work when you switch to a different time zone #2852

Comments

darindillon commented Feb 12, 2013

write a file in New York time zone

Then go to a dallas time zone and read that file:

Notice that the dates on the dataframe now show YESTERDAY when they ought to show today.

jreback commented Feb 12, 2013

dtype=np.int32)

here we'll try from ordinal first. If the date was saved with the old

mktime mechanism it'll throw an exception as it'll be out of bounds.

in those cases we'll convert using the old method (with the bug!)

jreback commented Feb 12, 2013

darindillon commented Feb 12, 2013

jreback commented Feb 12, 2013

wesm commented Mar 12, 2013

jreback commented Mar 12, 2013

jreback commented May 8, 2013

numpand commented Sep 13, 2013

jreback commented Sep 13, 2013

numpand commented Sep 13, 2013

jreback commented Sep 13, 2013

jreback commented Sep 13, 2013

numpand commented Sep 13, 2013

jreback commented Sep 13, 2013

numpand commented Sep 14, 2013

jreback commented Sep 14, 2013

darindillon commented Sep 14, 2013

jreback commented Sep 14, 2013

numpand commented Sep 14, 2013