Skip to content

ENH: Support dateutil timezones. GH4688. #6968

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 4, 2014
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 44 additions & 6 deletions doc/source/timeseries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1244,21 +1244,53 @@ the quarter end:
Time Zone Handling
------------------

Using ``pytz``, pandas provides rich support for working with timestamps in
different time zones. By default, pandas objects are time zone unaware:
Pandas provides rich support for working with timestamps in different time zones using ``pytz`` and ``dateutil`` libraries.
``dateutil`` support is new [in 0.14.1] and currently only supported for fixed offset and tzfile zones. The default library is ``pytz``.
Support for ``dateutil`` is provided for compatibility with other applications e.g. if you use ``dateutil`` in other python packages.

By default, pandas objects are time zone unaware:

.. ipython:: python

rng = date_range('3/6/2012 00:00', periods=15, freq='D')
print(rng.tz)
rng.tz is None

To supply the time zone, you can use the ``tz`` keyword to ``date_range`` and
other functions:
other functions. Dateutil time zone strings are distinguished from ``pytz``
time zones by starting with ``dateutil/``. In ``pytz`` you can find a list of
common (and less common) time zones using ``from pytz import common_timezones, all_timezones``.
``dateutil`` uses the OS timezones so there isn't a fixed list available. For
common zones, the names are the same as ``pytz``.

.. ipython:: python


# pytz
rng_utc = date_range('3/6/2012 00:00', periods=10, freq='D', tz='UTC')
print(rng_utc.tz)
rng_utc.tz

# dateutil
rng_utc_dateutil = date_range('3/6/2012 00:00', periods=10, freq='D',
tz='dateutil/UTC')
rng_utc_dateutil.tz

You can also construct the timezone explicitly first, which gives you more control over which
time zone is used:

.. ipython:: python

# pytz
import pytz
tz_pytz = pytz.timezone('UTC')
rng_utc = date_range('3/6/2012 00:00', periods=10, freq='D', tz=tz_pytz)
rng_utc.tz

# dateutil
import dateutil
tz_dateutil = dateutil.tz.gettz('UTC')
rng_utc_dateutil = date_range('3/6/2012 00:00', periods=10, freq='D',
tz=tz_dateutil)
rng_utc_dateutil.tz


Timestamps, like Python's ``datetime.datetime`` object can be either time zone
naive or time zone aware. Naive time series and DatetimeIndex objects can be
Expand All @@ -1271,13 +1303,19 @@ naive or time zone aware. Naive time series and DatetimeIndex objects can be
ts_utc = ts.tz_localize('UTC')
ts_utc

Again, you can explicitly construct the timezone object first.
You can use the ``tz_convert`` method to convert pandas objects to convert
tz-aware data to another time zone:

.. ipython:: python

ts_utc.tz_convert('US/Eastern')

.. warning::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a note (about why one would choose to use dateutil over pytz (the default)?

Be very wary of conversions between libraries as ``pytz`` and ``dateutil``
may have different definitions of the time zones. This is more of a problem for
unusual timezones than for 'standard' zones like ``US/Eastern``.

Under the hood, all timestamps are stored in UTC. Scalar values from a
``DatetimeIndex`` with a time zone will have their fields (day, hour, minute)
localized to the time zone. However, timestamps with the same UTC value are
Expand Down
13 changes: 13 additions & 0 deletions doc/source/v0.14.1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ users upgrade to this version.

- Highlights include:

- Support for ``dateutil`` timezones.

- :ref:`Other Enhancements <whatsnew_0141.enhancements>`

- :ref:`API Changes <whatsnew_0141.api>`
Expand Down Expand Up @@ -53,6 +55,17 @@ Enhancements
~~~~~~~~~~~~
- Tests for basic reading of public S3 buckets now exist (:issue:`7281`).

- Support for dateutil timezones, which can now be used in the same way as
pytz timezones across pandas. (:issue:`4688`)

.. ipython:: python

rng_utc_dateutil = date_range('3/6/2012 00:00', periods=10, freq='D',
tz='dateutil/UTC')
rng_utc_dateutil.tz

See :ref:`the docs <timeseries.timezone>`.

.. _whatsnew_0141.performance:

Performance
Expand Down
6 changes: 3 additions & 3 deletions pandas/index.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -605,11 +605,11 @@ cdef inline _to_i8(object val):
return get_datetime64_value(val)
elif PyDateTime_Check(val):
tzinfo = getattr(val, 'tzinfo', None)
val = _pydatetime_to_dts(val, &dts)
ival = _pydatetime_to_dts(val, &dts) # Save the original date value so we can get the utcoffset from it.
if tzinfo is not None and not _is_utc(tzinfo):
offset = tslib._get_utcoffset(tzinfo, val)
val -= tslib._delta_to_nanoseconds(offset)

ival -= tslib._delta_to_nanoseconds(offset)
return ival
return val

cdef inline bint _is_utc(object tz):
Expand Down
22 changes: 17 additions & 5 deletions pandas/io/pytables.py
Original file line number Diff line number Diff line change
Expand Up @@ -1729,11 +1729,23 @@ def set_atom(self, block, block_items, existing_col, min_itemsize,
if getattr(rvalues[0], 'tzinfo', None) is not None:

# if this block has more than one timezone, raise
if len(set([r.tzinfo for r in rvalues])) != 1:
raise TypeError(
"too many timezones in this block, create separate "
"data columns"
)
try:
# pytz timezones: compare on zone name (to avoid issues with DST being a different zone to STD).
zones = [r.tzinfo.zone for r in rvalues]
except:
# dateutil timezones: compare on ==
zones = [r.tzinfo for r in rvalues]
if any(zones[0] != zone_i for zone_i in zones[1:]):
raise TypeError(
"too many timezones in this block, create separate "
"data columns"
)
else:
if len(set(zones)) != 1:
raise TypeError(
"too many timezones in this block, create separate "
"data columns"
)

# convert this column to datetime64[ns] utc, and save the tz
index = DatetimeIndex(rvalues)
Expand Down
5 changes: 4 additions & 1 deletion pandas/io/tests/test_json/test_ujson.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
assert_array_almost_equal_nulp,
assert_approx_equal)
import pytz
import dateutil
from pandas import DataFrame, Series, Index, NaT, DatetimeIndex
import pandas.util.testing as tm

Expand Down Expand Up @@ -361,7 +362,9 @@ def test_encodeTimeConversion(self):
datetime.time(),
datetime.time(1, 2, 3),
datetime.time(10, 12, 15, 343243),
datetime.time(10, 12, 15, 343243, pytz.utc)]
datetime.time(10, 12, 15, 343243, pytz.utc),
# datetime.time(10, 12, 15, 343243, dateutil.tz.gettz('UTC')), # this segfaults! No idea why.
]
for test in tests:
output = ujson.encode(test)
expected = '"%s"' % test.isoformat()
Expand Down
77 changes: 75 additions & 2 deletions pandas/io/tests/test_pytables.py
Original file line number Diff line number Diff line change
Expand Up @@ -1991,7 +1991,7 @@ def test_unimplemented_dtypes_table_columns(self):
# this fails because we have a date in the object block......
self.assertRaises(TypeError, store.append, 'df_unimplemented', df)

def test_append_with_timezones(self):
def test_append_with_timezones_pytz(self):

from datetime import timedelta

Expand Down Expand Up @@ -2020,7 +2020,8 @@ def compare(a,b):
compare(store.select('df_tz',where=Term('A>=df.A[3]')),df[df.A>=df.A[3]])

_maybe_remove(store, 'df_tz')
df = DataFrame(dict(A = Timestamp('20130102',tz='US/Eastern'), B = Timestamp('20130103',tz='US/Eastern')),index=range(5))
# ensure we include dates in DST and STD time here.
df = DataFrame(dict(A = Timestamp('20130102',tz='US/Eastern'), B = Timestamp('20130603',tz='US/Eastern')),index=range(5))
store.append('df_tz',df)
result = store['df_tz']
compare(result,df)
Expand Down Expand Up @@ -2057,6 +2058,78 @@ def compare(a,b):
result = store.select('df')
assert_frame_equal(result,df)

def test_append_with_timezones_dateutil(self):

from datetime import timedelta

try:
import dateutil
except ImportError:
raise nose.SkipTest

def compare(a, b):
tm.assert_frame_equal(a, b)

# compare the zones on each element
for c in a.columns:
for i in a.index:
a_e = a[c][i]
b_e = b[c][i]
if not (a_e == b_e and a_e.tz == b_e.tz):
raise AssertionError("invalid tz comparsion [%s] [%s]" % (a_e, b_e))

# as columns
with ensure_clean_store(self.path) as store:

_maybe_remove(store, 'df_tz')
df = DataFrame(dict(A=[ Timestamp('20130102 2:00:00', tz=dateutil.tz.gettz('US/Eastern')) + timedelta(hours=1) * i for i in range(5) ]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do these tz's actually store differently? (except that they use the dateutil tzs of course)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dateutil doesn't distinguish DST vs STD with different timezone objects, so I think all five times will store with the same tz. pytz definitely does distinguish though so the test has value.

This is just a duplicate of the original test (test_append_with_timezones_pytz) but using dateutil not pytz to ensure we cover the same areas of the code for both timezone libraries.

store.append('df_tz', df, data_columns=['A'])
result = store['df_tz']
compare(result, df)
assert_frame_equal(result, df)

# select with tz aware
compare(store.select('df_tz', where=Term('A>=df.A[3]')), df[df.A >= df.A[3]])

_maybe_remove(store, 'df_tz')
# ensure we include dates in DST and STD time here.
df = DataFrame(dict(A=Timestamp('20130102', tz=dateutil.tz.gettz('US/Eastern')), B=Timestamp('20130603', tz=dateutil.tz.gettz('US/Eastern'))), index=range(5))
store.append('df_tz', df)
result = store['df_tz']
compare(result, df)
assert_frame_equal(result, df)

_maybe_remove(store, 'df_tz')
df = DataFrame(dict(A=Timestamp('20130102', tz=dateutil.tz.gettz('US/Eastern')), B=Timestamp('20130102', tz=dateutil.tz.gettz('EET'))), index=range(5))
self.assertRaises(TypeError, store.append, 'df_tz', df)

# this is ok
_maybe_remove(store, 'df_tz')
store.append('df_tz', df, data_columns=['A', 'B'])
result = store['df_tz']
compare(result, df)
assert_frame_equal(result, df)

# can't append with diff timezone
df = DataFrame(dict(A=Timestamp('20130102', tz=dateutil.tz.gettz('US/Eastern')), B=Timestamp('20130102', tz=dateutil.tz.gettz('CET'))), index=range(5))
self.assertRaises(ValueError, store.append, 'df_tz', df)

# as index
with ensure_clean_store(self.path) as store:

# GH 4098 example
df = DataFrame(dict(A=Series(lrange(3), index=date_range('2000-1-1', periods=3, freq='H', tz=dateutil.tz.gettz('US/Eastern')))))

_maybe_remove(store, 'df')
store.put('df', df)
result = store.select('df')
assert_frame_equal(result, df)

_maybe_remove(store, 'df')
store.append('df', df)
result = store.select('df')
assert_frame_equal(result, df)

def test_store_timezone(self):
# GH2852
# issue storing datetime.date with a timezone as it resets when read back in a new timezone
Expand Down
23 changes: 22 additions & 1 deletion pandas/tests/test_format.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,13 @@ def _skip_if_no_pytz():
except ImportError:
raise nose.SkipTest("pytz not installed")

def _skip_if_no_dateutil():
try:
import dateutil
except ImportError:
raise nose.SkipTest("dateutil not installed")


class TestDataFrameFormatting(tm.TestCase):
_multiprocess_can_split_ = True

Expand Down Expand Up @@ -2922,7 +2929,7 @@ def test_no_tz(self):
ts_nanos_micros = Timestamp(1200)
self.assertEqual(str(ts_nanos_micros), "1970-01-01 00:00:00.000001200")

def test_tz(self):
def test_tz_pytz(self):
_skip_if_no_pytz()

import pytz
Expand All @@ -2936,6 +2943,20 @@ def test_tz(self):
dt_datetime_us = datetime(2013, 1, 2, 12, 1, 3, 45, tzinfo=pytz.utc)
self.assertEqual(str(dt_datetime_us), str(Timestamp(dt_datetime_us)))

def test_tz_dateutil(self):
_skip_if_no_dateutil()
import dateutil
utc = dateutil.tz.gettz('UTC')

dt_date = datetime(2013, 1, 2, tzinfo=utc)
self.assertEqual(str(dt_date), str(Timestamp(dt_date)))

dt_datetime = datetime(2013, 1, 2, 12, 1, 3, tzinfo=utc)
self.assertEqual(str(dt_datetime), str(Timestamp(dt_datetime)))

dt_datetime_us = datetime(2013, 1, 2, 12, 1, 3, 45, tzinfo=utc)
self.assertEqual(str(dt_datetime_us), str(Timestamp(dt_datetime_us)))

if __name__ == '__main__':
import nose
nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'],
Expand Down
41 changes: 40 additions & 1 deletion pandas/tests/test_series.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,12 @@ def _skip_if_no_pytz():
except ImportError:
raise nose.SkipTest("pytz not installed")

def _skip_if_no_dateutil():
try:
import dateutil
except ImportError:
raise nose.SkipTest("dateutil not installed")

#------------------------------------------------------------------------------
# Series test cases

Expand Down Expand Up @@ -4573,7 +4579,7 @@ def test_getitem_setitem_datetimeindex(self):
result["1990-01-02"] = ts[24:48]
assert_series_equal(result, ts)

def test_getitem_setitem_datetime_tz(self):
def test_getitem_setitem_datetime_tz_pytz(self):
_skip_if_no_pytz();
from pytz import timezone as tz

Expand Down Expand Up @@ -4608,6 +4614,39 @@ def test_getitem_setitem_datetime_tz(self):
result[date] = ts[4]
assert_series_equal(result, ts)


def test_getitem_setitem_datetime_tz_dateutil(self):
_skip_if_no_dateutil();
from dateutil.tz import gettz as tz

from pandas import date_range
N = 50
# testing with timezone, GH #2785
rng = date_range('1/1/1990', periods=N, freq='H', tz='US/Eastern')
ts = Series(np.random.randn(N), index=rng)

# also test Timestamp tz handling, GH #2789
result = ts.copy()
result["1990-01-01 09:00:00+00:00"] = 0
result["1990-01-01 09:00:00+00:00"] = ts[4]
assert_series_equal(result, ts)

result = ts.copy()
result["1990-01-01 03:00:00-06:00"] = 0
result["1990-01-01 03:00:00-06:00"] = ts[4]
assert_series_equal(result, ts)

# repeat with datetimes
result = ts.copy()
result[datetime(1990, 1, 1, 9, tzinfo=tz('UTC'))] = 0
result[datetime(1990, 1, 1, 9, tzinfo=tz('UTC'))] = ts[4]
assert_series_equal(result, ts)

result = ts.copy()
result[datetime(1990, 1, 1, 3, tzinfo=tz('US/Central'))] = 0
result[datetime(1990, 1, 1, 3, tzinfo=tz('US/Central'))] = ts[4]
assert_series_equal(result, ts)

def test_getitem_setitem_periodindex(self):
from pandas import period_range
N = 50
Expand Down
Loading