-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PyTables dates don't work when you switch to a different time zone #2852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
this is not well tested but will take a look thanks On Feb 12, 2013, at 1:29 PM, tavistmorph [email protected] wrote:
|
this solution looks reasonable can you do a PR with this (and some tests?) not sure how to 'fake' the timezone....prob have to do it explicity.. Also if the toordinal returns an int? then the column should be Int32 and then don't actually need the try: except: on the unconvert, instead you can check |
Not sure what a PR is (I assume that's a Git thing, and I'm new to Git). But yes, we have tested the fix I described above significantly in our environment with our real-world dataframes and it's working fine for us. |
a pull-request |
Action here? |
pushing till 0.12, have to store extra data for the column, so need to thing how general this needs to be (see #2391, @scottkidder wanted to work on this) |
this is closed by #3531 |
This issue is not fixed in pandas 0.12.0. I just reproduced it by following the test case outlined in the description. |
this particular issue is untestable so closed. The store correctly stores/retrieves timezone aware datetimes in 0.12. What exactly are you reproducing? |
I am using dates (not datetimes). I stored dates in a dataframe index and then saved to an hdf store in Eastern timezone, and then retrieved it in Central timezone and got (date - 1) in the index. What exactly do you mean by this issue being untestable? |
|
what's untestable is resetting the default time zone on the computer (well it IS testable, but the datetimes work fine) so its not an issue. you are having a different issue, in that |
Well, at the end of the day, datetime.date is "supported" because there is an "if" statement checking for it and doing something based on that (something that's causing a problem when using the hdf store on two different timezones). Is there any harm in making the change suggested by the original submitter? |
can you create a self-contained example for testing? (if so, then no problem) |
https://www.wakari.io/sharing/bundle/johnv/pandas_issue_2852 - shared an ipython notebook on wakari. Let me know if you have problems accessing it (code won't work on Windows because of missing tzset). I'll repeat the code below just in case: import datetime
import os
import time
import pandas as pd
def setTZ(tz):
os.environ['TZ']=tz
time.tzset()
setTZ('EST5EDT')
today = datetime.date(2013,9,10)
df = pd.DataFrame([1,2,3], index = [today, today, today])
print df
filename = 'test.hdf5'
store = pd.HDFStore(filename)
store['obj1'] = df
store.close()
setTZ('CST6CDT')
store = pd.HDFStore(filename)
read_df = store['obj1']
store.close()
print read_df
df.index[0]==read_df.index[0] |
@numpand thanks...see PR #4841 just note that storing a Even manipulating in pandas is very inefficient as these are treated as You should ALWAYS use Is there a reason you and @tavistmorph store things this way? |
We want to be able to use Datetime.date because its a standard, standard, standard python type that's built in to the language and is needed to interact with hundreds of other non-pandas libraries. No question that datetime64 is more efficient, but for data less than a million rows, the efficiency gains are outweighted by the convenience of not having to convert datetime64 to date.datetime every time we want to interact with some other library. On Sep 14, 2013, at 5:12 PM, jreback [email protected] wrote:
|
PR was just merged. you are welcome. |
@jreback thank you! I'll keep in mind the performance penalty and revisit the usage of |
Create a dataframe with a date (not datetime) column as the index. Save that dataframe to the HDFStore. Now open that same file on a computer in a time zone that's behind you. (IE, write the file in New York and then read the file from Dallas). Result: all the dates are off by one.
Here is a code snippet to demonstrate the problem, and also a proposed fix to the pandas code. (But someone more qualified than me can review this fix).
To repro the problem:
write a file in New York time zone
import pandas
import datetime
def writeToFile(fileName) :
dates = [ datetime.date.today() , datetime.date.today(), datetime.date.today() ]
numbers = [ 1, 2, 3 ]
p = pandas.DataFrame({ "date" : dates, "number": numbers}, columns = ['date', 'number'])
p.set_index('date', inplace=True)
store = pandas.HDFStore(fileName)
store['obj1'] = p
store.close()
Then go to a dallas time zone and read that file:
def readFromFile(fileName) :
store = pandas.HDFStore(fileName)
return store['obj1']
Notice that the dates on the dataframe now show YESTERDAY when they ought to show today.
Proposed fix in pandas.io.pytables.py:
def _convert_index(index):
...
elif inferred_type == 'date':
#OLD LINE: CHANGE THIS LINE
#converted = np.array([time.mktime(v.timetuple()) for v in values],
# dtype=np.int32)
#NEW LINE:
converted = np.array([v.toordinal() for v in values], dtype=np.int32)
and then also change:
def _unconvert_index(data, kind):
...
elif kind == 'date':
# here we'll try from ordinal first. If the date was saved with the old
# mktime mechanism it'll throw an exception as it'll be out of bounds.
# in those cases we'll convert using the old method (with the bug!)
try:
index = np.array([date.fromordinal(v) for v in data], dtype=object)
except ValueError:
index = np.array([date.fromtimestamp(v) for v in data], dtype=object)
The text was updated successfully, but these errors were encountered: