Skip to content

BUG: DatetimeIndex.shift(freq=...) raises near DST boundary #8616

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ischwabacher opened this issue Oct 23, 2014 · 3 comments · Fixed by #21491
Closed

BUG: DatetimeIndex.shift(freq=...) raises near DST boundary #8616

ischwabacher opened this issue Oct 23, 2014 · 3 comments · Fixed by #21491
Labels
Timezones Timezone data dtype
Milestone

Comments

@ischwabacher
Copy link
Contributor

xref #5694 #8531 (?)
xref #8817

This is presumably caused by the fact that pytz time zones internalize the offset of the current time.

In [1]: import pandas as pd

In [2]: idx = pd.date_range('2013-11-03', tz='America/Chicago',
   ...:                     periods=6, freq='H')

In [3]: pd.Series(index=idx)
Out[3]: 
2013-11-03 00:00:00-05:00   NaN
2013-11-03 01:00:00-05:00   NaN
2013-11-03 01:00:00-06:00   NaN
2013-11-03 02:00:00-06:00   NaN
2013-11-03 03:00:00-06:00   NaN
2013-11-03 04:00:00-06:00   NaN
Freq: H, dtype: float64

In [4]: pd.Series(index=idx).shift(freq='H')
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-4-19ee418c9aa1> in <module>()
----> 1 pd.Series(index=idx).shift(freq='H')

/Users/afni/homebrew/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.pyc in shift(self, periods, freq, axis, **kwds)
   3290             new_data = self._data.shift(periods=periods, axis=block_axis)
   3291         else:
-> 3292             return self.tshift(periods, freq, **kwds)
   3293 
   3294         return self._constructor(new_data).__finalize__(self)

/Users/afni/homebrew/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.pyc in tshift(self, periods, freq, axis, **kwds)
   3386         else:
   3387             new_data = self._data.copy()
-> 3388             new_data.axes[block_axis] = index.shift(periods, offset)
   3389 
   3390         return self._constructor(new_data).__finalize__(self)

/Users/afni/homebrew/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/tseries/index.pyc in shift(self, n, freq)
    855         end = self[-1] + n * self.offset
    856         return DatetimeIndex(start=start, end=end, freq=self.offset,
--> 857                              name=self.name, tz=self.tz)
    858 
    859     def repeat(self, repeats, axis=None):

/Users/afni/homebrew/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/tseries/index.pyc in __new__(cls, data, freq, start, end, periods, copy, name, tz, verify_integrity, normalize, closed, **kwds)
    204             return cls._generate(start, end, periods, name, freq,
    205                                  tz=tz, normalize=normalize, closed=closed,
--> 206                                  infer_dst=infer_dst)
    207 
    208         if not isinstance(data, (np.ndarray, ABCSeries)):

/Users/afni/homebrew/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/tseries/index.pyc in _generate(cls, start, end, periods, name, offset, tz, normalize, infer_dst, closed)
    369         if tz is not None and inferred_tz is not None:
    370             if not inferred_tz == tz:
--> 371                 raise AssertionError("Inferred time zone not equal to passed "
    372                                      "time zone")
    373 

AssertionError: Inferred time zone not equal to passed time zone

Playing with dateutil and pytz timezones makes me despair of repairing that assertion to be anything sane. Is it actually needed, or can we just turn it into a warning?

Observations:

  • two pytz.DstTzInfos of the same zoneinfo zone that have been .normalize()ed to different times may compare different
  • a pytz.*TzInfo has a zone member containing the name of the zoneinfo zone from which it was constructed
  • a dateutil.tzfile has no public members beyond the tzinfo API, which is insufficient to tell whether two time zones are equal
  • a pytz time zone and a dateutil time zone appear to compare different under all circumstances, regardless of whether they represent the same time zone
  • the various dateutil time zone classes have no common base class within the dateutil package
  • pytz zones constructed from different names for the same zoneinfo zone (e.g. UTC and Etc/UTC) compare different

Ugh.

from #8817

import pandas as pd
import pytz
import datetime

dt = datetime.datetime(2014, 11, 14, 0)
dt_est = pytz.timezone('EST').localize(dt)
s = pd.Series(data=[1], index=[dt_est])

s.shift(0, freq='h')  # 2014-11-14 00:00:00-05:00 (seems okay) 
s.shift(-1, freq='h')  # 2014-11-13 18:00:00-05:00 (expected 2014-11-13 23:00:00)
s.shift(1, freq='h')  # 2014-11-13 20:00:00-05:00 (expected 2014-11-14 01:00:00)

s.shift(-1, freq='s')  # 2014-11-13 18:59:59-05:00 (same with other freq)
@jreback
Copy link
Contributor

jreback commented Oct 24, 2014

sounds like you need an ambiguous kw here to figure out what the user wants? or is this just not well defined ? or is it just implemted incorrectly

eg seems to me you convert to utc, shift, convert back to the tz

@rockg
Copy link
Contributor

rockg commented Oct 24, 2014

Additionally the same time zone across different versions of pytz may not be equal. This causes problems when manipulating for data stored in pickles/HDF5 from prior versions (see #7620). May not be a problem in this particular spot, but certainly an additional comparison issue to add to your list.

@ischwabacher
Copy link
Contributor Author

sounds like you need an ambiguous kw here to figure out what the user wants? or is this just not well defined ? or is it just implemted incorrectly

It's just a vectorized addition of an offset to a bunch of Timestamps, which is well-defined regardless of whether the Timestamps are aware or not. The problem isn't that the computation is wrong; it's that the assertion at the end is too strict.

Trying to fix this pushed PEP 431 several notches up my list of things to be excited about.

@jreback jreback added this to the 0.16.0 milestone Nov 29, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@jreback jreback modified the milestones: Next Major Release, 0.23.2 Jun 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Timezones Timezone data dtype
Projects
None yet
3 participants