Skip to content

ENH: Ability to tz localize when index is implicility in tz #4706

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 2, 2013
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
146 changes: 146 additions & 0 deletions doc/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -868,3 +868,149 @@ Serialization / IO / Conversion
Panel.to_frame
Panel.to_clipboard

.. currentmodule:: pandas.core.index

.. _api.index

Index
-----

**Many of these methods or variants thereof are available on the objects that contain an index (Series/Dataframe)
and those should most likely be used before calling these methods directly.**

* **values**
Modifying and Computations
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: generated/

Index.copy
Index.delete
Index.diff
Index.drop
Index.equals
Index.identical
Index.insert
Index.order
Index.reindex
Index.repeat
Index.set_names
Index.unique

Conversion
~~~~~~~~~~
.. autosummary::
:toctree: generated/

Index.astype
Index.tolist
Index.to_datetime
Index.to_series

Sorting
~~~~~~~
.. autosummary::
:toctree: generated/

Index.argsort
Index.order
Index.sort

Time-specific operations
~~~~~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: generated/

Index.shift

Combining / joining / merging
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: generated/

Index.append
Index.intersection
Index.join
Index.union

Selecting
~~~~~~~~~
.. autosummary::
:toctree: generated/

Index.get_indexer
Index.get_indexer_non_unique
Index.get_level_values
Index.get_loc
Index.get_value
Index.isin
Index.slice_indexer
Index.slice_locs

Properties
~~~~~~~~~~
.. autosummary::
:toctree: generated/

Index.is_monotonic
Index.is_numeric

.. currentmodule:: pandas.tseries.index

.. _api.datetimeindex:

DatetimeIndex
-------------

Time/Date Components
~~~~~~~~~~~~~~~~~~~~
* **year**
* **month**
* **day**
* **hour**
* **minute**
* **second**
* **microsecond**
* **nanosecond**

* **weekofyear**
* **week**: Same as weekofyear
* **dayofweek**: (0=Monday, 6=Sunday)
* **weekday**: (0=Monday, 6=Sunday)
* **dayofyear**
* **quarter**

* **date**: Returns date component of Timestamps
* **time**: Returns time component of Timestamps


Selecting
~~~~~~~~~
.. autosummary::
:toctree: generated/

DatetimeIndex.indexer_at_time
DatetimeIndex.indexer_between_time


Time-specific operations
~~~~~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: generated/

DatetimeIndex.normalize
DatetimeIndex.snap
DatetimeIndex.tz_convert
DatetimeIndex.tz_localize


Conversion
~~~~~~~~~~
.. autosummary::
:toctree: generated/

DatetimeIndex.to_datetime
DatetimeIndex.to_period
DatetimeIndex.to_pydatetime


3 changes: 3 additions & 0 deletions doc/source/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,9 @@ Improvements to existing features
:issue:`4998`)
- ``to_dict`` now takes ``records`` as a possible outtype. Returns an array
of column-keyed dictionaries. (:issue:`4936`)
- ``tz_localize`` can infer a fall daylight savings transition based on the
structure of unlocalized data (:issue:`4230`)
- DatetimeIndex is now in the API documentation

API Changes
~~~~~~~~~~~
Expand Down
14 changes: 14 additions & 0 deletions doc/source/timeseries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1108,6 +1108,20 @@ TimeSeries, aligning the data on the UTC timestamps:

.. _timeseries.timedeltas:

In some cases, localize cannot determine the DST and non-DST hours when there are
duplicates. This often happens when reading files that simply duplicate the hours.
The infer_dst argument in tz_localize will attempt
to determine the right offset.

.. ipython:: python

rng_hourly = DatetimeIndex(['11/06/2011 00:00', '11/06/2011 01:00',
'11/06/2011 01:00', '11/06/2011 02:00',
'11/06/2011 03:00'])
rng_hourly.tz_localize('US/Eastern')
rng_hourly_eastern = rng_hourly.tz_localize('US/Eastern', infer_dst=True)
rng_hourly_eastern.values

Time Deltas
-----------

Expand Down
6 changes: 5 additions & 1 deletion doc/source/v0.13.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ enhancements along with a large number of bug fixes.

.. warning::

In 0.13.0 ``Series`` has internaly been refactored to no longer sub-class ``ndarray``
In 0.13.0 ``Series`` has internally been refactored to no longer sub-class ``ndarray``
but instead subclass ``NDFrame``, similarly to the rest of the pandas containers. This should be
a transparent change with only very limited API implications. See :ref:`Internal Refactoring<whatsnew_0130.refactoring>`

Expand Down Expand Up @@ -481,6 +481,10 @@ Enhancements

:ref:`See the docs<indexing.basics.indexing_isin>` for more.

- ``tz_localize`` can infer a fall daylight savings transition based on the structure
of the unlocalized data (:issue:`4230`), see :ref:`here<timeseries.timezone>`
- DatetimeIndex is now in the API documentation, see :ref:`here<api.datetimeindex>`

.. _whatsnew_0130.experimental:

Experimental
Expand Down
6 changes: 4 additions & 2 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -2752,7 +2752,7 @@ def tz_convert(self, tz, axis=0, copy=True):

return new_obj

def tz_localize(self, tz, axis=0, copy=True):
def tz_localize(self, tz, axis=0, copy=True, infer_dst=False):
"""
Localize tz-naive TimeSeries to target time zone

Expand All @@ -2761,6 +2761,8 @@ def tz_localize(self, tz, axis=0, copy=True):
tz : string or pytz.timezone object
copy : boolean, default True
Also make a copy of the underlying data
infer_dst : boolean, default False
Attempt to infer fall dst-transition times based on order

Returns
-------
Expand All @@ -2778,7 +2780,7 @@ def tz_localize(self, tz, axis=0, copy=True):
new_data = new_data.copy()

new_obj = self._constructor(new_data)
new_ax = ax.tz_localize(tz)
new_ax = ax.tz_localize(tz, infer_dst=infer_dst)

if axis == 0:
new_obj._set_axis(1, new_ax)
Expand Down
6 changes: 4 additions & 2 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -2331,7 +2331,7 @@ def tz_convert(self, tz, copy=True):

return self._constructor(new_values, index=new_index, name=self.name)

def tz_localize(self, tz, copy=True):
def tz_localize(self, tz, copy=True, infer_dst=False):
"""
Localize tz-naive TimeSeries to target time zone
Entries will retain their "naive" value but will be annotated as
Expand All @@ -2345,6 +2345,8 @@ def tz_localize(self, tz, copy=True):
tz : string or pytz.timezone object
copy : boolean, default True
Also make a copy of the underlying data
infer_dst : boolean, default False
Attempt to infer fall dst-transition hours based on order

Returns
-------
Expand All @@ -2358,7 +2360,7 @@ def tz_localize(self, tz, copy=True):

new_index = DatetimeIndex([], tz=tz)
else:
new_index = self.index.tz_localize(tz)
new_index = self.index.tz_localize(tz, infer_dst=infer_dst)

new_values = self.values
if copy:
Expand Down
26 changes: 19 additions & 7 deletions pandas/tseries/index.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,7 @@ def __new__(cls, data=None,

dayfirst = kwds.pop('dayfirst', None)
yearfirst = kwds.pop('yearfirst', None)
infer_dst = kwds.pop('infer_dst', False)
warn = False
if 'offset' in kwds and kwds['offset']:
freq = kwds['offset']
Expand Down Expand Up @@ -183,7 +184,8 @@ def __new__(cls, data=None,

if data is None:
return cls._generate(start, end, periods, name, offset,
tz=tz, normalize=normalize)
tz=tz, normalize=normalize,
infer_dst=infer_dst)

if not isinstance(data, np.ndarray):
if np.isscalar(data):
Expand All @@ -209,7 +211,7 @@ def __new__(cls, data=None,
data.name = name

if tz is not None:
return data.tz_localize(tz)
return data.tz_localize(tz, infer_dst=infer_dst)

return data

Expand Down Expand Up @@ -261,7 +263,8 @@ def __new__(cls, data=None,
getattr(data, 'tz', None) is None):
# Convert tz-naive to UTC
ints = subarr.view('i8')
subarr = tslib.tz_localize_to_utc(ints, tz)
subarr = tslib.tz_localize_to_utc(ints, tz,
infer_dst=infer_dst)

subarr = subarr.view(_NS_DTYPE)

Expand All @@ -286,7 +289,7 @@ def __new__(cls, data=None,

@classmethod
def _generate(cls, start, end, periods, name, offset,
tz=None, normalize=False):
tz=None, normalize=False, infer_dst=False):
if com._count_not_none(start, end, periods) != 2:
raise ValueError('Must specify two of start, end, or periods')

Expand Down Expand Up @@ -375,7 +378,8 @@ def _generate(cls, start, end, periods, name, offset,
index = _generate_regular_range(start, end, periods, offset)

if tz is not None and getattr(index, 'tz', None) is None:
index = tslib.tz_localize_to_utc(com._ensure_int64(index), tz)
index = tslib.tz_localize_to_utc(com._ensure_int64(index), tz,
infer_dst=infer_dst)
index = index.view(_NS_DTYPE)

index = index.view(cls)
Expand Down Expand Up @@ -1537,9 +1541,17 @@ def tz_convert(self, tz):
# No conversion since timestamps are all UTC to begin with
return self._simple_new(self.values, self.name, self.offset, tz)

def tz_localize(self, tz):
def tz_localize(self, tz, infer_dst=False):
"""
Localize tz-naive DatetimeIndex to given time zone (using pytz)

Parameters
----------
tz : string or pytz.timezone
Time zone for time. Corresponding timestamps would be converted to
time zone of the TimeSeries
infer_dst : boolean, default False
Attempt to infer fall dst-transition hours based on order

Returns
-------
Expand All @@ -1550,7 +1562,7 @@ def tz_localize(self, tz):
tz = tools._maybe_get_tz(tz)

# Convert to UTC
new_dates = tslib.tz_localize_to_utc(self.asi8, tz)
new_dates = tslib.tz_localize_to_utc(self.asi8, tz, infer_dst=infer_dst)
new_dates = new_dates.view(_NS_DTYPE)

return self._simple_new(new_dates, self.name, self.offset, tz)
Expand Down
26 changes: 26 additions & 0 deletions pandas/tseries/tests/test_timezones.py
Original file line number Diff line number Diff line change
Expand Up @@ -360,6 +360,32 @@ def test_with_tz_ambiguous_times(self):
dr = date_range(datetime(2011, 3, 13), periods=48,
freq=datetools.Minute(30), tz=pytz.utc)

def test_infer_dst(self):
# November 6, 2011, fall back, repeat 2 AM hour
# With no repeated hours, we cannot infer the transition
tz = pytz.timezone('US/Eastern')
dr = date_range(datetime(2011, 11, 6, 0), periods=5,
freq=datetools.Hour())
self.assertRaises(pytz.AmbiguousTimeError, dr.tz_localize,
tz, infer_dst=True)

# With repeated hours, we can infer the transition
dr = date_range(datetime(2011, 11, 6, 0), periods=5,
freq=datetools.Hour(), tz=tz)
di = DatetimeIndex(['11/06/2011 00:00', '11/06/2011 01:00',
'11/06/2011 01:00', '11/06/2011 02:00',
'11/06/2011 03:00'])
localized = di.tz_localize(tz, infer_dst=True)
self.assert_(np.array_equal(dr, localized))

# When there is no dst transition, nothing special happens
dr = date_range(datetime(2011, 6, 1, 0), periods=10,
freq=datetools.Hour())
localized = dr.tz_localize(tz)
localized_infer = dr.tz_localize(tz, infer_dst=True)
self.assert_(np.array_equal(localized, localized_infer))


# test utility methods
def test_infer_tz(self):
eastern = pytz.timezone('US/Eastern')
Expand Down
Loading