Skip to content

Timezone info is lost when pickling datetime indices #8367

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
edrevo opened this issue Sep 23, 2014 · 8 comments · Fixed by #8370
Closed

Timezone info is lost when pickling datetime indices #8367

edrevo opened this issue Sep 23, 2014 · 8 comments · Fixed by #8370
Labels
Bug Compat pandas objects compatability with Numpy or Python functions MultiIndex Timezones Timezone data dtype
Milestone

Comments

@edrevo
Copy link

edrevo commented Sep 23, 2014

>>> c.head(3)
                                                           end_time
location_id sensor_id view_id start_time
54          1000305   0       2014-07-21 07:00:00+00:00    2014-07-21 07:15:00+00:00
                              2014-07-21 07:15:00+00:00    2014-07-21 07:30:00+00:00
                              2014-07-21 07:30:00+00:00    2014-07-21 07:45:00+00:00

>>> c.to_pickle(r"C:\temp\fff")
>>> pd.read_pickle(r"C:\temp\fff").head(3)
                                                   end_time
location_id sensor_id view_id start_time
54          1000305   0       2014-07-21 07:00:00  2014-07-21 07:15:00+00:00
                              2014-07-21 07:15:00  2014-07-21 07:30:00+00:00
                              2014-07-21 07:30:00  2014-07-21 07:45:00+00:00

Please note that the timezone is lost from start_time.

@edrevo
Copy link
Author

edrevo commented Sep 23, 2014

I forgot my versions information:

>>> pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.7.final.0
python-bits: 64
OS: Windows
OS-release: 8
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.14.0
nose: 1.3.3
Cython: 0.20.1
numpy: 1.8.1
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.1.0
sphinx: 1.2.2
patsy: 0.2.1
scikits.timeseries: None
dateutil: 1.5
pytz: 2014.3
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.3.1
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.5.5
lxml: 3.3.5
bs4: 4.3.1
html5lib: None
bq: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.4
pymysql: None
psycopg2: None

@jreback
Copy link
Contributor

jreback commented Sep 23, 2014

can u show how this was created in the first place? also df.info()

@edrevo
Copy link
Author

edrevo commented Sep 23, 2014

The dataframe was created with:

import pytz
from dateutil.parser import parse
def parse_time(v):
    return parse(v).replace(tzinfo=pytz.utc)

df = pd.read_csv(r"C:\counts\raw.data", index_col=[2, 3, 4, 0], date_parser=parse_time,
                 parse_dates=[0, 1], header=0)

Here's the result of df.info() before serializing:

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 1931 entries, (54, 1000320, 0, 2014-07-21 08:00:00+00:00) to (54, 1000317, 0, 2014-07-27 19:15:00+00:00)
Data columns (total 3 columns):
end_time       1931 non-null object
plus_count     1931 non-null int64
minus_count    1931 non-null int64
dtypes: int64(2), object(1)

And after deserializing:

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 1931 entries, (54, 1000320, 0, 2014-07-21 08:00:00) to (54, 1000317, 0, 2014-07-27 19:15:00)
Data columns (total 3 columns):
end_time       1931 non-null object
plus_count     1931 non-null int64
minus_count    1931 non-null int64
dtypes: int64(2), object(1)

@jreback
Copy link
Contributor

jreback commented Sep 23, 2014

try with a pre-release of 0.15.0 and see if the error persists: https://github.com/pydata/pandas/releases

@jreback
Copy link
Contributor

jreback commented Sep 23, 2014

Hmm, this does seem buggy.

In [16]: df = DataFrame(np.random.randn(12,2),index=pd.MultiIndex.from_product([[1,2],['a','b'],date_range('20130101',periods=3,tz='US/Eastern')]))

In [17]: df
Out[17]: 
                                      0         1
1 a 2013-01-01 00:00:00-05:00 -0.093854  0.181606
    2013-01-02 00:00:00-05:00  1.950901 -1.415333
    2013-01-03 00:00:00-05:00 -1.631278  1.487381
  b 2013-01-01 00:00:00-05:00  0.040334  2.698802
    2013-01-02 00:00:00-05:00 -0.249099  0.212568
    2013-01-03 00:00:00-05:00  0.003727 -0.012983
2 a 2013-01-01 00:00:00-05:00  0.268326 -0.670518
    2013-01-02 00:00:00-05:00 -0.776091 -1.688146
    2013-01-03 00:00:00-05:00 -0.371485 -0.963880
  b 2013-01-01 00:00:00-05:00  0.430981 -1.735434
    2013-01-02 00:00:00-05:00  0.974573 -0.226431
    2013-01-03 00:00:00-05:00 -0.294460 -0.627837

In [18]: df.to_pickle('test.pkl')

In [19]: pd.read_pickle('test.pkl')
Out[19]: 
                                0         1
1 a 2013-01-01 05:00:00 -0.093854  0.181606
    2013-01-02 05:00:00  1.950901 -1.415333
    2013-01-03 05:00:00 -1.631278  1.487381
  b 2013-01-01 05:00:00  0.040334  2.698802
    2013-01-02 05:00:00 -0.249099  0.212568
    2013-01-03 05:00:00  0.003727 -0.012983
2 a 2013-01-01 05:00:00  0.268326 -0.670518
    2013-01-02 05:00:00 -0.776091 -1.688146
    2013-01-03 05:00:00 -0.371485 -0.963880
  b 2013-01-01 05:00:00  0.430981 -1.735434
    2013-01-02 05:00:00  0.974573 -0.226431
    2013-01-03 05:00:00 -0.294460 -0.627837

In [20]: pd.read_pickle('test.pkl').index
Out[20]: 
MultiIndex(levels=[[1, 2], [u'a', u'b'], [2013-01-01 05:00:00, 2013-01-02 05:00:00, 2013-01-03 05:00:00]],
           labels=[[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1], [0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]])

In [21]: pd.read_pickle('test.pkl').index.levels[2]
Out[21]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01 05:00:00, ..., 2013-01-03 05:00:00]
Length: 3, Freq: None, Timezone: None

@jreback jreback added Bug Timezones Timezone data dtype MultiIndex Compat pandas objects compatability with Numpy or Python functions labels Sep 23, 2014
@jreback jreback added this to the 0.15.0 milestone Sep 23, 2014
@jreback
Copy link
Contributor

jreback commented Sep 23, 2014

thanks for this report. It is now working in master. I will post updated binaries (maybe in a week of so)

@edrevo
Copy link
Author

edrevo commented Sep 23, 2014

Wow! That was fast! Many thanks for addressing this so quickly!

@jreback
Copy link
Contributor

jreback commented Sep 23, 2014

well thanks for bringing it up
some changes in 0.15 were causing a regression actually as this was working for pickling a DatetimeIndex (but not a multi index that had a DatetimeIndex with a tz)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Compat pandas objects compatability with Numpy or Python functions MultiIndex Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants