Skip to content

API/BUG: Enforce "normalized" pytz timezones for DatetimeIndex #20510

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Apr 11, 2018
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
b878540
DOC: update the Index.isin docstring (#20249)
noemielteto Mar 18, 2018
0b0fb83
MAINT: Remove weird pd file
gfyoung Mar 19, 2018
8316501
BUG: Retain timezone when resampling
mroeschke Mar 21, 2018
0bb1b13
Normalize pytz timezone for Datetimeindexes
mroeschke Mar 22, 2018
4d6f0d1
Merge remote-tracking branch 'upstream/master' into resample_timezone
mroeschke Mar 23, 2018
4a202b0
Change construction of tz
mroeschke Mar 26, 2018
29560c4
Merge remote-tracking branch 'upstream/master' into resample_timezone
mroeschke Mar 27, 2018
87cacf8
lint and add whatsnew
mroeschke Mar 27, 2018
e8990fc
Merge remote-tracking branch 'upstream/master' into resample_timezone
mroeschke Mar 28, 2018
b1f2724
Adjust whatsnew and add additional test
mroeschke Mar 28, 2018
c1241f9
Adjust test
mroeschke Mar 28, 2018
9833f01
Address review and failing CI test
mroeschke Mar 29, 2018
43fab89
Merge remote-tracking branch 'upstream/master' into resample_timezone
mroeschke Mar 29, 2018
f1a5ca7
Merge remote-tracking branch 'upstream/master' into resample_timezone
mroeschke Mar 30, 2018
bba5da5
Merge remote-tracking branch 'upstream/master' into resample_timezone
mroeschke Mar 31, 2018
8b397d4
Remove extra copy
mroeschke Mar 31, 2018
464a91b
Merge remote-tracking branch 'upstream/master' into resample_timezone
mroeschke Mar 31, 2018
c1db598
add same tz error to timestamp
mroeschke Mar 31, 2018
bf1ec9e
Merge remote-tracking branch 'upstream/master' into resample_timezone
mroeschke Apr 2, 2018
81ccb21
Add description of issue
mroeschke Apr 2, 2018
12f697b
Merge remote-tracking branch 'upstream/master' into resample_timezone
mroeschke Apr 3, 2018
867ef19
Merge remote-tracking branch 'upstream/master' into resample_timezone
mroeschke Apr 4, 2018
360c295
Use cache_readonly
mroeschke Apr 4, 2018
67a29d5
standardize ._tz directly
mroeschke Apr 6, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -767,6 +767,8 @@ Datetimelike API Changes
- :func:`pandas.merge` provides a more informative error message when trying to merge on timezone-aware and timezone-naive columns (:issue:`15800`)
- For :class:`DatetimeIndex` and :class:`TimedeltaIndex` with ``freq=None``, addition or subtraction of integer-dtyped array or ``Index`` will raise ``NullFrequencyError`` instead of ``TypeError`` (:issue:`19895`)
- :class:`Timestamp` constructor now accepts a `nanosecond` keyword or positional argument (:issue:`18898`)
- :class:`DatetimeIndex` will now raise an ``AttributeError`` when the ``tz`` attribute is set after instantiation (:issue:`3746`)
- :class:`DatetimeIndex` with a ``pytz`` timezone will now return a consistent ``pytz`` timezone (:issue:`18595`)

.. _whatsnew_0230.api.other:

Expand Down Expand Up @@ -1122,6 +1124,7 @@ Groupby/Resample/Rolling
- Bug in :func:`DataFrame.resample().aggregate` not raising a ``KeyError`` when aggregating a non-existent column (:issue:`16766`, :issue:`19566`)
- Fixed a performance regression for ``GroupBy.nth`` and ``GroupBy.last`` with some object columns (:issue:`19283`)
- Bug in :func:`DataFrameGroupBy.cumsum` and :func:`DataFrameGroupBy.cumprod` when ``skipna`` was passed (:issue:`19806`)
- Bug in :func:`Dataframe.resample` that dropped timezone information (:issue:`13238`)

Sparse
^^^^^^
Expand Down
6 changes: 6 additions & 0 deletions pandas/_libs/tslibs/timestamps.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -700,6 +700,12 @@ class Timestamp(_Timestamp):
"""
return self.tzinfo

@tz.setter
def tz(self, value):
# GH 3746: Prevent localizing or converting the index by setting tz
raise AttributeError("Cannot directly set timezone. Use tz_localize() "
"or tz_convert() as appropriate")

def __setstate__(self, state):
self.value = state[0]
self.freq = state[1]
Expand Down
38 changes: 38 additions & 0 deletions pandas/_libs/tslibs/timezones.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -314,3 +314,41 @@ cpdef bint tz_compare(object start, object end):
"""
# GH 18523
return get_timezone(start) == get_timezone(end)


cpdef tz_standardize(object tz):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be a cpdef? Can it just be cdef? I don't know why end users would need to be able to do this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is called within a python file (pandas/core/indexes/datetimelike.py), so it can't just be a cdef

"""
If the passed tz is a pytz timezone object, "normalize" it to the a
consistent version

Parameters
----------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some examples

tz : tz object

Returns:
-------
tz object

Examples:
--------
>>> tz
<DstTzInfo 'US/Pacific' PST-1 day, 16:00:00 STD>

>>> tz_standardize(tz)
<DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>

>>> tz
<DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>

>>> tz_standardize(tz)
<DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>

>>> tz
dateutil.tz.tz.tzutc

>>> tz_standardize(tz)
dateutil.tz.tz.tzutc
"""
if treat_tz_as_pytz(tz):
return pytz.timezone(str(tz))
return tz
2 changes: 1 addition & 1 deletion pandas/core/indexes/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -1005,7 +1005,7 @@ def shift(self, n, freq=None):
result = self + offset

if hasattr(self, 'tz'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally avoid hasattr in projects that support Python 2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this impl is shared by DTI and TDI so this could be restructured a bit. Please file an issue .

result.tz = self.tz
result._tz = self.tz

return result

Expand Down
27 changes: 16 additions & 11 deletions pandas/core/indexes/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -511,13 +511,7 @@ def _generate(cls, start, end, periods, name, offset,
'different timezones')

inferred_tz = timezones.maybe_get_tz(inferred_tz)

# these may need to be localized
tz = timezones.maybe_get_tz(tz)
if tz is not None:
date = start or end
if date.tzinfo is not None and hasattr(tz, 'localize'):
tz = tz.localize(date.replace(tzinfo=None)).tzinfo

if tz is not None and inferred_tz is not None:
if not timezones.tz_compare(inferred_tz, tz):
Expand Down Expand Up @@ -654,7 +648,7 @@ def _simple_new(cls, values, name=None, freq=None, tz=None,
result._data = values
result.name = name
result.offset = freq
result.tz = timezones.maybe_get_tz(tz)
result._tz = timezones.maybe_get_tz(tz)
result._reset_identity()
return result

Expand Down Expand Up @@ -684,6 +678,17 @@ def _values(self):
else:
return self.values

@property
def tz(self):
# GH 18595
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe we now have this in setter / getter versions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the only caching decorator I could find was implemented here:

cdef class CachedProperty(object):
cdef readonly:
object func, name, __doc__
def __init__(self, func):
self.func = func
self.name = func.__name__
self.__doc__ = getattr(func, '__doc__', None)
def __get__(self, obj, typ):
if obj is None:
# accessed on the class, not the instance
return self
# Get the cache or set a default one if needed
cache = getattr(obj, '_cache', None)
if cache is None:
try:
cache = obj._cache = {}
except (AttributeError):
return self
if PyDict_Contains(cache, self.name):
# not necessary to Py_INCREF
val = <object> PyDict_GetItem(cache, self.name)
else:
val = self.func(obj)
PyDict_SetItem(cache, self.name, val)
return val
def __set__(self, obj, value):
raise AttributeError("Can't set attribute")
cache_readonly = CachedProperty

return timezones.tz_standardize(self._tz)

@tz.setter
def tz(self, value):
# GH 3746: Prevent localizing or converting the index by setting tz
raise AttributeError("Cannot directly set timezone. Use tz_localize() "
"or tz_convert() as appropriate")

@property
def tzinfo(self):
"""
Expand Down Expand Up @@ -754,7 +759,7 @@ def _cached_range(cls, start=None, end=None, periods=None, offset=None,

cachedRange = DatetimeIndex._simple_new(arr)
cachedRange.offset = offset
cachedRange.tz = None
cachedRange._tz = None
cachedRange.name = None
drc[offset] = cachedRange
else:
Expand Down Expand Up @@ -831,7 +836,7 @@ def __setstate__(self, state):

self.name = own_state[0]
self.offset = own_state[1]
self.tz = own_state[2]
self._tz = own_state[2]

# provide numpy < 1.7 compat
if nd_state[2] == 'M8[us]':
Expand Down Expand Up @@ -1175,7 +1180,7 @@ def union(self, other):
else:
result = Index.union(this, other)
if isinstance(result, DatetimeIndex):
result.tz = this.tz
result._tz = this.tz
if (result.freq is None and
(this.freq is not None or other.freq is not None)):
result.offset = to_offset(result.inferred_freq)
Expand Down Expand Up @@ -1223,7 +1228,7 @@ def union_many(self, others):
tz = this.tz
this = Index.union(this, other)
if isinstance(this, DatetimeIndex):
this.tz = tz
this._tz = tz

if this.freq is None:
this.offset = to_offset(this.inferred_freq)
Expand Down
4 changes: 2 additions & 2 deletions pandas/tests/frame/test_alter_axes.py
Original file line number Diff line number Diff line change
Expand Up @@ -249,8 +249,8 @@ def test_set_index_cast_datetimeindex(self):
# convert to utc
df['C'] = i.to_series().reset_index(drop=True)
result = df['C']
comp = pd.DatetimeIndex(expected.values).copy()
comp.tz = None
comp = pd.DatetimeIndex(expected.values)
comp = comp.tz_localize(None)
tm.assert_numpy_array_equal(result.values, comp.values)

# list of datetimes with a tz
Expand Down
28 changes: 28 additions & 0 deletions pandas/tests/indexes/datetimes/test_construction.py
Original file line number Diff line number Diff line change
Expand Up @@ -441,6 +441,34 @@ def test_000constructor_resolution(self):

assert idx.nanosecond[0] == t1.nanosecond

def test_disallow_setting_tz(self):
# GH 3746
dti = DatetimeIndex(['2010'], tz='UTC')
with pytest.raises(AttributeError):
dti.tz = pytz.timezone('US/Pacific')

@pytest.mark.parametrize('tz', [
None, 'America/Los_Angeles', pytz.timezone('America/Los_Angeles'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a similar battery of these tests to Timestamp as well (disabled setting already is disabled, but see if we have a test for it).

Timestamp('2000', tz='America/Los_Angeles').tz])
def test_constructor_start_end_with_tz(self, tz):
# GH 18595
start = Timestamp('2013-01-01 06:00:00', tz='America/Los_Angeles')
end = Timestamp('2013-01-02 06:00:00', tz='America/Los_Angeles')
result = DatetimeIndex(freq='D', start=start, end=end, tz=tz)
expected = DatetimeIndex(['2013-01-01 06:00:00',
'2013-01-02 06:00:00'],
tz='America/Los_Angeles')
tm.assert_index_equal(result, expected)
# Especially assert that the timezone is consistent for pytz
assert pytz.timezone('America/Los_Angeles') is result.tz

@pytest.mark.parametrize('tz', ['US/Pacific', 'US/Eastern', 'Asia/Tokyo'])
def test_constructor_with_non_normalized_pytz(self, tz):
# GH 18595
non_norm_tz = Timestamp('2010', tz=tz).tz
result = DatetimeIndex(['2010'], tz=non_norm_tz)
assert pytz.timezone(tz) is result.tz


class TestTimeSeries(object):

Expand Down
7 changes: 7 additions & 0 deletions pandas/tests/scalar/timestamp/test_timestamp.py
Original file line number Diff line number Diff line change
Expand Up @@ -521,6 +521,13 @@ def test_today(self):
assert (abs(ts_from_string_tz.tz_localize(None) -
ts_from_method_tz.tz_localize(None)) < delta)

@pytest.mark.parametrize('tz', [None, pytz.timezone('US/Pacific')])
def test_disallow_setting_tz(self, tz):
# GH 3746
ts = Timestamp('2010')
with pytest.raises(AttributeError):
ts.tz = tz


class TestTimestamp(object):

Expand Down
12 changes: 12 additions & 0 deletions pandas/tests/test_resample.py
Original file line number Diff line number Diff line change
Expand Up @@ -2532,6 +2532,18 @@ def test_with_local_timezone_pytz(self):
expected = Series(1, index=expected_index)
assert_series_equal(result, expected)

def test_resample_with_pytz(self):
# GH 13238
s = Series(2, index=pd.date_range('2017-01-01', periods=48, freq="H",
tz="US/Eastern"))
result = s.resample("D").mean()
expected = Series(2, index=pd.DatetimeIndex(['2017-01-01',
'2017-01-02'],
tz="US/Eastern"))
assert_series_equal(result, expected)
# Especially assert that the timezone is LMT for pytz
assert result.index.tz == pytz.timezone('US/Eastern')

def test_with_local_timezone_dateutil(self):
# see gh-5430
local_timezone = 'dateutil/America/Los_Angeles'
Expand Down