Skip to content

Support is_dst indicators in tz_localize #7963

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 13, 2014
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 37 additions & 13 deletions doc/source/timeseries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1357,6 +1357,9 @@ Pandas provides rich support for working with timestamps in different time zones
``dateutil`` support is new [in 0.14.1] and currently only supported for fixed offset and tzfile zones. The default library is ``pytz``.
Support for ``dateutil`` is provided for compatibility with other applications e.g. if you use ``dateutil`` in other python packages.

Working with Time Zones
~~~~~~~~~~~~~~~~~~~~~~~

By default, pandas objects are time zone unaware:

.. ipython:: python
Expand Down Expand Up @@ -1488,10 +1491,29 @@ TimeSeries, aligning the data on the UTC timestamps:
result
result.index

To remove timezone from tz-aware ``DatetimeIndex``, use ``tz_localize(None)`` or ``tz_convert(None)``.
``tz_localize(None)`` will remove timezone holding local time representations.
``tz_convert(None)`` will remove timezone after converting to UTC time.

.. ipython:: python

didx = DatetimeIndex(start='2014-08-01 09:00', freq='H', periods=10, tz='US/Eastern')
didx
didx.tz_localize(None)
didx.tz_convert(None)

# tz_convert(None) is identical with tz_convert('UTC').tz_localize(None)
didx.tz_convert('UCT').tz_localize(None)

.. _timeseries.timezone_ambiguous:

Ambiguous Times when Localizing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In some cases, localize cannot determine the DST and non-DST hours when there are
duplicates. This often happens when reading files that simply duplicate the hours.
The infer_dst argument in tz_localize will attempt
to determine the right offset.
duplicates. This often happens when reading files or database records that simply
duplicate the hours. Passing ``ambiguous='infer'`` (``infer_dst`` argument in prior
releases) into ``tz_localize`` will attempt to determine the right offset.

.. ipython:: python
:okexcept:
Expand All @@ -1500,21 +1522,23 @@ to determine the right offset.
'11/06/2011 01:00', '11/06/2011 02:00',
'11/06/2011 03:00'])
rng_hourly.tz_localize('US/Eastern')
rng_hourly_eastern = rng_hourly.tz_localize('US/Eastern', infer_dst=True)
rng_hourly_eastern = rng_hourly.tz_localize('US/Eastern', ambiguous='infer')
rng_hourly_eastern.values


To remove timezone from tz-aware ``DatetimeIndex``, use ``tz_localize(None)`` or ``tz_convert(None)``. ``tz_localize(None)`` will remove timezone holding local time representations. ``tz_convert(None)`` will remove timezone after converting to UTC time.
In addition to 'infer', there are several other arguments supported. Passing
an array-like of bools or 0s/1s where True represents a DST hour and False a
non-DST hour, allows for distinguishing more than one DST
transition (e.g., if you have multiple records in a database each with their
own DST transition). Or passing 'NaT' will fill in transition times
with not-a-time values. These methods are available in the ``DatetimeIndex``
constructor as well as ``tz_localize``.

.. ipython:: python

rng_hourly_dst = np.array([1, 1, 0, 0, 0])
rng_hourly.tz_localize('US/Eastern', ambiguous=rng_hourly_dst).values
rng_hourly.tz_localize('US/Eastern', ambiguous='NaT').values

didx = DatetimeIndex(start='2014-08-01 09:00', freq='H', periods=10, tz='US/Eastern')
didx
didx.tz_localize(None)
didx.tz_convert(None)

# tz_convert(None) is identical with tz_convert('UTC').tz_localize(None)
didx.tz_convert('UCT').tz_localize(None)

.. _timeseries.timedeltas:

Expand Down
10 changes: 8 additions & 2 deletions doc/source/v0.15.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -344,7 +344,6 @@ API changes
- ``Series.to_csv()`` now returns a string when ``path=None``, matching the behaviour of
``DataFrame.to_csv()`` (:issue:`8215`).


.. _whatsnew_0150.index_set_ops:

- The Index set operations ``+`` and ``-`` were deprecated in order to provide these for numeric type operations on certain index types. ``+`` can be replace by ``.union()`` or ``|``, and ``-`` by ``.difference()``. Further the method name ``Index.diff()`` is deprecated and can be replaced by ``Index.difference()`` (:issue:`8226`)
Expand Down Expand Up @@ -466,6 +465,10 @@ Deprecations

- The ``convert_dummies`` method has been deprecated in favor of
``get_dummies`` (:issue:`8140`)
- The ``infer_dst`` argument in ``tz_localize`` will be deprecated in favor of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a link to the new doc section as well

``ambiguous`` to allow for more flexibility in dealing with DST transitions.
Replace ``infer_dst=True`` with ``ambiguous='infer'`` for the same behavior (:issue:`7943`).
See :ref:`the docs<timeseries.timezone_ambiguous>` for more details.

.. _whatsnew_0150.knownissues:

Expand Down Expand Up @@ -544,7 +547,10 @@ Enhancements




- ``tz_localize`` now accepts the ``ambiguous`` keyword which allows for passing an array of bools
indicating whether the date belongs in DST or not, 'NaT' for setting transition times to NaT,
'infer' for inferring DST/non-DST, and 'raise' (default) for an AmbiguousTimeError to be raised (:issue:`7943`).
See :ref:`the docs<timeseries.timezone_ambiguous>` for more details.



Expand Down
29 changes: 19 additions & 10 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
_maybe_box_datetimelike, ABCSeries,
SettingWithCopyError, SettingWithCopyWarning)
import pandas.core.nanops as nanops
from pandas.util.decorators import Appender, Substitution
from pandas.util.decorators import Appender, Substitution, deprecate_kwarg
from pandas.core import config

# goal is to be able to define the docs close to function, while still being
Expand Down Expand Up @@ -3558,8 +3558,11 @@ def _tz_convert(ax, tz):
result = self._constructor(self._data, copy=copy)
result.set_axis(axis,ax)
return result.__finalize__(self)

def tz_localize(self, tz, axis=0, level=None, copy=True, infer_dst=False):

@deprecate_kwarg(old_arg_name='infer_dst', new_arg_name='ambiguous',
mapping={True: 'infer', False: 'raise'})
def tz_localize(self, tz, axis=0, level=None, copy=True,
ambiguous='raise'):
"""
Localize tz-naive TimeSeries to target time zone

Expand All @@ -3572,16 +3575,22 @@ def tz_localize(self, tz, axis=0, level=None, copy=True, infer_dst=False):
must be None
copy : boolean, default True
Also make a copy of the underlying data
infer_dst : boolean, default False
Attempt to infer fall dst-transition times based on order

ambiguous : 'infer', bool-ndarray, 'NaT', default 'raise'
- 'infer' will attempt to infer fall dst-transition hours based on order
- bool-ndarray where True signifies a DST time, False designates
a non-DST time (note that this flag is only applicable for ambiguous times)
- 'NaT' will return NaT where there are ambiguous times
- 'raise' will raise an AmbiguousTimeError if there are ambiguous times
infer_dst : boolean, default False (DEPRECATED)
Attempt to infer fall dst-transition hours based on order

Returns
-------
"""
axis = self._get_axis_number(axis)
ax = self._get_axis(axis)

def _tz_localize(ax, tz, infer_dst):
def _tz_localize(ax, tz, ambiguous):
if not hasattr(ax, 'tz_localize'):
if len(ax) > 0:
ax_name = self._get_axis_name(axis)
Expand All @@ -3590,19 +3599,19 @@ def _tz_localize(ax, tz, infer_dst):
else:
ax = DatetimeIndex([],tz=tz)
else:
ax = ax.tz_localize(tz, infer_dst=infer_dst)
ax = ax.tz_localize(tz, ambiguous=ambiguous)
return ax

# if a level is given it must be a MultiIndex level or
# equivalent to the axis name
if isinstance(ax, MultiIndex):
level = ax._get_level_number(level)
new_level = _tz_localize(ax.levels[level], tz, infer_dst)
new_level = _tz_localize(ax.levels[level], tz, ambiguous)
ax = ax.set_levels(new_level, level=level)
else:
if level not in (None, 0, ax.name):
raise ValueError("The level {0} is not valid".format(level))
ax = _tz_localize(ax, tz, infer_dst)
ax = _tz_localize(ax, tz, ambiguous)

result = self._constructor(self._data, copy=copy)
result.set_axis(axis,ax)
Expand Down
45 changes: 34 additions & 11 deletions pandas/tseries/index.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@

import numpy as np

import warnings

from pandas.core.common import (_NS_DTYPE, _INT64_DTYPE,
_values_from_object, _maybe_box,
ABCSeries)
Expand All @@ -18,7 +20,7 @@
from pandas.core.base import DatetimeIndexOpsMixin
from pandas.tseries.offsets import DateOffset, generate_range, Tick, CDay
from pandas.tseries.tools import parse_time_string, normalize_date
from pandas.util.decorators import cache_readonly
from pandas.util.decorators import cache_readonly, deprecate_kwarg
import pandas.core.common as com
import pandas.tseries.offsets as offsets
import pandas.tseries.tools as tools
Expand Down Expand Up @@ -145,6 +147,15 @@ class DatetimeIndex(DatetimeIndexOpsMixin, Int64Index):
closed : string or None, default None
Make the interval closed with respect to the given frequency to
the 'left', 'right', or both sides (None)
tz : pytz.timezone or dateutil.tz.tzfile
ambiguous : 'infer', bool-ndarray, 'NaT', default 'raise'
- 'infer' will attempt to infer fall dst-transition hours based on order
- bool-ndarray where True signifies a DST time, False signifies
a non-DST time (note that this flag is only applicable for ambiguous times)
- 'NaT' will return NaT where there are ambiguous times
- 'raise' will raise an AmbiguousTimeError if there are ambiguous times
infer_dst : boolean, default False (DEPRECATED)
Attempt to infer fall dst-transition hours based on order
name : object
Name to be stored in the index
"""
Expand Down Expand Up @@ -180,15 +191,17 @@ class DatetimeIndex(DatetimeIndexOpsMixin, Int64Index):
'is_quarter_start','is_quarter_end','is_year_start','is_year_end']
_is_numeric_dtype = False


@deprecate_kwarg(old_arg_name='infer_dst', new_arg_name='ambiguous',
mapping={True: 'infer', False: 'raise'})
def __new__(cls, data=None,
freq=None, start=None, end=None, periods=None,
copy=False, name=None, tz=None,
verify_integrity=True, normalize=False,
closed=None, **kwargs):
closed=None, ambiguous='raise', **kwargs):

dayfirst = kwargs.pop('dayfirst', None)
yearfirst = kwargs.pop('yearfirst', None)
infer_dst = kwargs.pop('infer_dst', False)

freq_infer = False
if not isinstance(freq, DateOffset):
Expand All @@ -214,7 +227,7 @@ def __new__(cls, data=None,
if data is None:
return cls._generate(start, end, periods, name, freq,
tz=tz, normalize=normalize, closed=closed,
infer_dst=infer_dst)
ambiguous=ambiguous)

if not isinstance(data, (np.ndarray, Index, ABCSeries)):
if np.isscalar(data):
Expand All @@ -240,7 +253,7 @@ def __new__(cls, data=None,
data.name = name

if tz is not None:
return data.tz_localize(tz, infer_dst=infer_dst)
return data.tz_localize(tz, ambiguous=ambiguous)

return data

Expand Down Expand Up @@ -309,7 +322,7 @@ def __new__(cls, data=None,
# Convert tz-naive to UTC
ints = subarr.view('i8')
subarr = tslib.tz_localize_to_utc(ints, tz,
infer_dst=infer_dst)
ambiguous=ambiguous)

subarr = subarr.view(_NS_DTYPE)

Expand All @@ -333,7 +346,7 @@ def __new__(cls, data=None,

@classmethod
def _generate(cls, start, end, periods, name, offset,
tz=None, normalize=False, infer_dst=False, closed=None):
tz=None, normalize=False, ambiguous='raise', closed=None):
if com._count_not_none(start, end, periods) != 2:
raise ValueError('Must specify two of start, end, or periods')

Expand Down Expand Up @@ -447,7 +460,7 @@ def _generate(cls, start, end, periods, name, offset,

if tz is not None and getattr(index, 'tz', None) is None:
index = tslib.tz_localize_to_utc(com._ensure_int64(index), tz,
infer_dst=infer_dst)
ambiguous=ambiguous)
index = index.view(_NS_DTYPE)

index = cls._simple_new(index, name=name, freq=offset, tz=tz)
Expand Down Expand Up @@ -1645,7 +1658,9 @@ def tz_convert(self, tz):
# No conversion since timestamps are all UTC to begin with
return self._shallow_copy(tz=tz)

def tz_localize(self, tz, infer_dst=False):
@deprecate_kwarg(old_arg_name='infer_dst', new_arg_name='ambiguous',
mapping={True: 'infer', False: 'raise'})
def tz_localize(self, tz, ambiguous='raise'):
"""
Localize tz-naive DatetimeIndex to given time zone (using pytz/dateutil),
or remove timezone from tz-aware DatetimeIndex
Expand All @@ -1656,7 +1671,13 @@ def tz_localize(self, tz, infer_dst=False):
Time zone for time. Corresponding timestamps would be converted to
time zone of the TimeSeries.
None will remove timezone holding local time.
infer_dst : boolean, default False
ambiguous : 'infer', bool-ndarray, 'NaT', default 'raise'
- 'infer' will attempt to infer fall dst-transition hours based on order
- bool-ndarray where True signifies a DST time, False signifies
a non-DST time (note that this flag is only applicable for ambiguous times)
- 'NaT' will return NaT where there are ambiguous times
- 'raise' will raise an AmbiguousTimeError if there are ambiguous times
infer_dst : boolean, default False (DEPRECATED)
Attempt to infer fall dst-transition hours based on order

Returns
Expand All @@ -1671,7 +1692,9 @@ def tz_localize(self, tz, infer_dst=False):
else:
tz = tslib.maybe_get_tz(tz)
# Convert to UTC
new_dates = tslib.tz_localize_to_utc(self.asi8, tz, infer_dst=infer_dst)

new_dates = tslib.tz_localize_to_utc(self.asi8, tz,
ambiguous=ambiguous)
new_dates = new_dates.view(_NS_DTYPE)
return self._shallow_copy(new_dates, tz=tz)

Expand Down
Loading