Skip to content

Commit 41cc8cc

Browse files
Merge pull request #7963 from rockg/master
Support is_dst indicators in tz_localize
2 parents abd5333 + 5d32eab commit 41cc8cc

File tree

7 files changed

+274
-108
lines changed

7 files changed

+274
-108
lines changed

doc/source/timeseries.rst

+37-13
Original file line numberDiff line numberDiff line change
@@ -1357,6 +1357,9 @@ Pandas provides rich support for working with timestamps in different time zones
13571357
``dateutil`` support is new [in 0.14.1] and currently only supported for fixed offset and tzfile zones. The default library is ``pytz``.
13581358
Support for ``dateutil`` is provided for compatibility with other applications e.g. if you use ``dateutil`` in other python packages.
13591359

1360+
Working with Time Zones
1361+
~~~~~~~~~~~~~~~~~~~~~~~
1362+
13601363
By default, pandas objects are time zone unaware:
13611364

13621365
.. ipython:: python
@@ -1488,10 +1491,29 @@ TimeSeries, aligning the data on the UTC timestamps:
14881491
result
14891492
result.index
14901493
1494+
To remove timezone from tz-aware ``DatetimeIndex``, use ``tz_localize(None)`` or ``tz_convert(None)``.
1495+
``tz_localize(None)`` will remove timezone holding local time representations.
1496+
``tz_convert(None)`` will remove timezone after converting to UTC time.
1497+
1498+
.. ipython:: python
1499+
1500+
didx = DatetimeIndex(start='2014-08-01 09:00', freq='H', periods=10, tz='US/Eastern')
1501+
didx
1502+
didx.tz_localize(None)
1503+
didx.tz_convert(None)
1504+
1505+
# tz_convert(None) is identical with tz_convert('UTC').tz_localize(None)
1506+
didx.tz_convert('UCT').tz_localize(None)
1507+
1508+
.. _timeseries.timezone_ambiguous:
1509+
1510+
Ambiguous Times when Localizing
1511+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1512+
14911513
In some cases, localize cannot determine the DST and non-DST hours when there are
1492-
duplicates. This often happens when reading files that simply duplicate the hours.
1493-
The infer_dst argument in tz_localize will attempt
1494-
to determine the right offset.
1514+
duplicates. This often happens when reading files or database records that simply
1515+
duplicate the hours. Passing ``ambiguous='infer'`` (``infer_dst`` argument in prior
1516+
releases) into ``tz_localize`` will attempt to determine the right offset.
14951517

14961518
.. ipython:: python
14971519
:okexcept:
@@ -1500,21 +1522,23 @@ to determine the right offset.
15001522
'11/06/2011 01:00', '11/06/2011 02:00',
15011523
'11/06/2011 03:00'])
15021524
rng_hourly.tz_localize('US/Eastern')
1503-
rng_hourly_eastern = rng_hourly.tz_localize('US/Eastern', infer_dst=True)
1525+
rng_hourly_eastern = rng_hourly.tz_localize('US/Eastern', ambiguous='infer')
15041526
rng_hourly_eastern.values
15051527
1506-
1507-
To remove timezone from tz-aware ``DatetimeIndex``, use ``tz_localize(None)`` or ``tz_convert(None)``. ``tz_localize(None)`` will remove timezone holding local time representations. ``tz_convert(None)`` will remove timezone after converting to UTC time.
1528+
In addition to 'infer', there are several other arguments supported. Passing
1529+
an array-like of bools or 0s/1s where True represents a DST hour and False a
1530+
non-DST hour, allows for distinguishing more than one DST
1531+
transition (e.g., if you have multiple records in a database each with their
1532+
own DST transition). Or passing 'NaT' will fill in transition times
1533+
with not-a-time values. These methods are available in the ``DatetimeIndex``
1534+
constructor as well as ``tz_localize``.
15081535

15091536
.. ipython:: python
1537+
1538+
rng_hourly_dst = np.array([1, 1, 0, 0, 0])
1539+
rng_hourly.tz_localize('US/Eastern', ambiguous=rng_hourly_dst).values
1540+
rng_hourly.tz_localize('US/Eastern', ambiguous='NaT').values
15101541
1511-
didx = DatetimeIndex(start='2014-08-01 09:00', freq='H', periods=10, tz='US/Eastern')
1512-
didx
1513-
didx.tz_localize(None)
1514-
didx.tz_convert(None)
1515-
1516-
# tz_convert(None) is identical with tz_convert('UTC').tz_localize(None)
1517-
didx.tz_convert('UCT').tz_localize(None)
15181542
15191543
.. _timeseries.timedeltas:
15201544

doc/source/v0.15.0.txt

+8-2
Original file line numberDiff line numberDiff line change
@@ -346,7 +346,6 @@ API changes
346346
- ``Series.to_csv()`` now returns a string when ``path=None``, matching the behaviour of
347347
``DataFrame.to_csv()`` (:issue:`8215`).
348348

349-
350349
.. _whatsnew_0150.index_set_ops:
351350

352351
- The Index set operations ``+`` and ``-`` were deprecated in order to provide these for numeric type operations on certain index types. ``+`` can be replace by ``.union()`` or ``|``, and ``-`` by ``.difference()``. Further the method name ``Index.diff()`` is deprecated and can be replaced by ``Index.difference()`` (:issue:`8226`)
@@ -470,6 +469,10 @@ Deprecations
470469

471470
- The ``convert_dummies`` method has been deprecated in favor of
472471
``get_dummies`` (:issue:`8140`)
472+
- The ``infer_dst`` argument in ``tz_localize`` will be deprecated in favor of
473+
``ambiguous`` to allow for more flexibility in dealing with DST transitions.
474+
Replace ``infer_dst=True`` with ``ambiguous='infer'`` for the same behavior (:issue:`7943`).
475+
See :ref:`the docs<timeseries.timezone_ambiguous>` for more details.
473476

474477
.. _whatsnew_0150.knownissues:
475478

@@ -548,7 +551,10 @@ Enhancements
548551

549552

550553

551-
554+
- ``tz_localize`` now accepts the ``ambiguous`` keyword which allows for passing an array of bools
555+
indicating whether the date belongs in DST or not, 'NaT' for setting transition times to NaT,
556+
'infer' for inferring DST/non-DST, and 'raise' (default) for an AmbiguousTimeError to be raised (:issue:`7943`).
557+
See :ref:`the docs<timeseries.timezone_ambiguous>` for more details.
552558

553559

554560

pandas/core/generic.py

+19-10
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
_maybe_box_datetimelike, ABCSeries,
2424
SettingWithCopyError, SettingWithCopyWarning)
2525
import pandas.core.nanops as nanops
26-
from pandas.util.decorators import Appender, Substitution
26+
from pandas.util.decorators import Appender, Substitution, deprecate_kwarg
2727
from pandas.core import config
2828

2929
# goal is to be able to define the docs close to function, while still being
@@ -3558,8 +3558,11 @@ def _tz_convert(ax, tz):
35583558
result = self._constructor(self._data, copy=copy)
35593559
result.set_axis(axis,ax)
35603560
return result.__finalize__(self)
3561-
3562-
def tz_localize(self, tz, axis=0, level=None, copy=True, infer_dst=False):
3561+
3562+
@deprecate_kwarg(old_arg_name='infer_dst', new_arg_name='ambiguous',
3563+
mapping={True: 'infer', False: 'raise'})
3564+
def tz_localize(self, tz, axis=0, level=None, copy=True,
3565+
ambiguous='raise'):
35633566
"""
35643567
Localize tz-naive TimeSeries to target time zone
35653568
@@ -3572,16 +3575,22 @@ def tz_localize(self, tz, axis=0, level=None, copy=True, infer_dst=False):
35723575
must be None
35733576
copy : boolean, default True
35743577
Also make a copy of the underlying data
3575-
infer_dst : boolean, default False
3576-
Attempt to infer fall dst-transition times based on order
3577-
3578+
ambiguous : 'infer', bool-ndarray, 'NaT', default 'raise'
3579+
- 'infer' will attempt to infer fall dst-transition hours based on order
3580+
- bool-ndarray where True signifies a DST time, False designates
3581+
a non-DST time (note that this flag is only applicable for ambiguous times)
3582+
- 'NaT' will return NaT where there are ambiguous times
3583+
- 'raise' will raise an AmbiguousTimeError if there are ambiguous times
3584+
infer_dst : boolean, default False (DEPRECATED)
3585+
Attempt to infer fall dst-transition hours based on order
3586+
35783587
Returns
35793588
-------
35803589
"""
35813590
axis = self._get_axis_number(axis)
35823591
ax = self._get_axis(axis)
35833592

3584-
def _tz_localize(ax, tz, infer_dst):
3593+
def _tz_localize(ax, tz, ambiguous):
35853594
if not hasattr(ax, 'tz_localize'):
35863595
if len(ax) > 0:
35873596
ax_name = self._get_axis_name(axis)
@@ -3590,19 +3599,19 @@ def _tz_localize(ax, tz, infer_dst):
35903599
else:
35913600
ax = DatetimeIndex([],tz=tz)
35923601
else:
3593-
ax = ax.tz_localize(tz, infer_dst=infer_dst)
3602+
ax = ax.tz_localize(tz, ambiguous=ambiguous)
35943603
return ax
35953604

35963605
# if a level is given it must be a MultiIndex level or
35973606
# equivalent to the axis name
35983607
if isinstance(ax, MultiIndex):
35993608
level = ax._get_level_number(level)
3600-
new_level = _tz_localize(ax.levels[level], tz, infer_dst)
3609+
new_level = _tz_localize(ax.levels[level], tz, ambiguous)
36013610
ax = ax.set_levels(new_level, level=level)
36023611
else:
36033612
if level not in (None, 0, ax.name):
36043613
raise ValueError("The level {0} is not valid".format(level))
3605-
ax = _tz_localize(ax, tz, infer_dst)
3614+
ax = _tz_localize(ax, tz, ambiguous)
36063615

36073616
result = self._constructor(self._data, copy=copy)
36083617
result.set_axis(axis,ax)

pandas/tseries/index.py

+34-11
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@
66

77
import numpy as np
88

9+
import warnings
10+
911
from pandas.core.common import (_NS_DTYPE, _INT64_DTYPE,
1012
_values_from_object, _maybe_box,
1113
ABCSeries)
@@ -18,7 +20,7 @@
1820
from pandas.core.base import DatetimeIndexOpsMixin
1921
from pandas.tseries.offsets import DateOffset, generate_range, Tick, CDay
2022
from pandas.tseries.tools import parse_time_string, normalize_date
21-
from pandas.util.decorators import cache_readonly
23+
from pandas.util.decorators import cache_readonly, deprecate_kwarg
2224
import pandas.core.common as com
2325
import pandas.tseries.offsets as offsets
2426
import pandas.tseries.tools as tools
@@ -145,6 +147,15 @@ class DatetimeIndex(DatetimeIndexOpsMixin, Int64Index):
145147
closed : string or None, default None
146148
Make the interval closed with respect to the given frequency to
147149
the 'left', 'right', or both sides (None)
150+
tz : pytz.timezone or dateutil.tz.tzfile
151+
ambiguous : 'infer', bool-ndarray, 'NaT', default 'raise'
152+
- 'infer' will attempt to infer fall dst-transition hours based on order
153+
- bool-ndarray where True signifies a DST time, False signifies
154+
a non-DST time (note that this flag is only applicable for ambiguous times)
155+
- 'NaT' will return NaT where there are ambiguous times
156+
- 'raise' will raise an AmbiguousTimeError if there are ambiguous times
157+
infer_dst : boolean, default False (DEPRECATED)
158+
Attempt to infer fall dst-transition hours based on order
148159
name : object
149160
Name to be stored in the index
150161
"""
@@ -180,15 +191,17 @@ class DatetimeIndex(DatetimeIndexOpsMixin, Int64Index):
180191
'is_quarter_start','is_quarter_end','is_year_start','is_year_end']
181192
_is_numeric_dtype = False
182193

194+
195+
@deprecate_kwarg(old_arg_name='infer_dst', new_arg_name='ambiguous',
196+
mapping={True: 'infer', False: 'raise'})
183197
def __new__(cls, data=None,
184198
freq=None, start=None, end=None, periods=None,
185199
copy=False, name=None, tz=None,
186200
verify_integrity=True, normalize=False,
187-
closed=None, **kwargs):
201+
closed=None, ambiguous='raise', **kwargs):
188202

189203
dayfirst = kwargs.pop('dayfirst', None)
190204
yearfirst = kwargs.pop('yearfirst', None)
191-
infer_dst = kwargs.pop('infer_dst', False)
192205

193206
freq_infer = False
194207
if not isinstance(freq, DateOffset):
@@ -214,7 +227,7 @@ def __new__(cls, data=None,
214227
if data is None:
215228
return cls._generate(start, end, periods, name, freq,
216229
tz=tz, normalize=normalize, closed=closed,
217-
infer_dst=infer_dst)
230+
ambiguous=ambiguous)
218231

219232
if not isinstance(data, (np.ndarray, Index, ABCSeries)):
220233
if np.isscalar(data):
@@ -240,7 +253,7 @@ def __new__(cls, data=None,
240253
data.name = name
241254

242255
if tz is not None:
243-
return data.tz_localize(tz, infer_dst=infer_dst)
256+
return data.tz_localize(tz, ambiguous=ambiguous)
244257

245258
return data
246259

@@ -309,7 +322,7 @@ def __new__(cls, data=None,
309322
# Convert tz-naive to UTC
310323
ints = subarr.view('i8')
311324
subarr = tslib.tz_localize_to_utc(ints, tz,
312-
infer_dst=infer_dst)
325+
ambiguous=ambiguous)
313326

314327
subarr = subarr.view(_NS_DTYPE)
315328

@@ -333,7 +346,7 @@ def __new__(cls, data=None,
333346

334347
@classmethod
335348
def _generate(cls, start, end, periods, name, offset,
336-
tz=None, normalize=False, infer_dst=False, closed=None):
349+
tz=None, normalize=False, ambiguous='raise', closed=None):
337350
if com._count_not_none(start, end, periods) != 2:
338351
raise ValueError('Must specify two of start, end, or periods')
339352

@@ -447,7 +460,7 @@ def _generate(cls, start, end, periods, name, offset,
447460

448461
if tz is not None and getattr(index, 'tz', None) is None:
449462
index = tslib.tz_localize_to_utc(com._ensure_int64(index), tz,
450-
infer_dst=infer_dst)
463+
ambiguous=ambiguous)
451464
index = index.view(_NS_DTYPE)
452465

453466
index = cls._simple_new(index, name=name, freq=offset, tz=tz)
@@ -1645,7 +1658,9 @@ def tz_convert(self, tz):
16451658
# No conversion since timestamps are all UTC to begin with
16461659
return self._shallow_copy(tz=tz)
16471660

1648-
def tz_localize(self, tz, infer_dst=False):
1661+
@deprecate_kwarg(old_arg_name='infer_dst', new_arg_name='ambiguous',
1662+
mapping={True: 'infer', False: 'raise'})
1663+
def tz_localize(self, tz, ambiguous='raise'):
16491664
"""
16501665
Localize tz-naive DatetimeIndex to given time zone (using pytz/dateutil),
16511666
or remove timezone from tz-aware DatetimeIndex
@@ -1656,7 +1671,13 @@ def tz_localize(self, tz, infer_dst=False):
16561671
Time zone for time. Corresponding timestamps would be converted to
16571672
time zone of the TimeSeries.
16581673
None will remove timezone holding local time.
1659-
infer_dst : boolean, default False
1674+
ambiguous : 'infer', bool-ndarray, 'NaT', default 'raise'
1675+
- 'infer' will attempt to infer fall dst-transition hours based on order
1676+
- bool-ndarray where True signifies a DST time, False signifies
1677+
a non-DST time (note that this flag is only applicable for ambiguous times)
1678+
- 'NaT' will return NaT where there are ambiguous times
1679+
- 'raise' will raise an AmbiguousTimeError if there are ambiguous times
1680+
infer_dst : boolean, default False (DEPRECATED)
16601681
Attempt to infer fall dst-transition hours based on order
16611682
16621683
Returns
@@ -1671,7 +1692,9 @@ def tz_localize(self, tz, infer_dst=False):
16711692
else:
16721693
tz = tslib.maybe_get_tz(tz)
16731694
# Convert to UTC
1674-
new_dates = tslib.tz_localize_to_utc(self.asi8, tz, infer_dst=infer_dst)
1695+
1696+
new_dates = tslib.tz_localize_to_utc(self.asi8, tz,
1697+
ambiguous=ambiguous)
16751698
new_dates = new_dates.view(_NS_DTYPE)
16761699
return self._shallow_copy(new_dates, tz=tz)
16771700

0 commit comments

Comments
 (0)