Skip to content

Commit e420855

Browse files
committed
ENH: Add dateutil timezone support GH4688
Adds support for dateutil timezones across pandas. Support conversion between dateutil and pytz timezones. Ensure timezone functionality is tested with both dateutil and pytz.
1 parent 1754bb5 commit e420855

15 files changed

+913
-170
lines changed

doc/source/timeseries.rst

+44-6
Original file line numberDiff line numberDiff line change
@@ -1244,21 +1244,53 @@ the quarter end:
12441244
Time Zone Handling
12451245
------------------
12461246

1247-
Using ``pytz``, pandas provides rich support for working with timestamps in
1248-
different time zones. By default, pandas objects are time zone unaware:
1247+
Pandas provides rich support for working with timestamps in different time zones using ``pytz`` and ``dateutil`` libraries.
1248+
``dateutil`` support is new [in 0.14.1] and currently only supported for fixed offset and tzfile zones. The default library is ``pytz``.
1249+
Support for ``dateutil`` is provided for compatibility with other applications e.g. if you use ``dateutil`` in other python packages.
1250+
1251+
By default, pandas objects are time zone unaware:
12491252

12501253
.. ipython:: python
12511254
12521255
rng = date_range('3/6/2012 00:00', periods=15, freq='D')
1253-
print(rng.tz)
1256+
rng.tz is None
12541257
12551258
To supply the time zone, you can use the ``tz`` keyword to ``date_range`` and
1256-
other functions:
1259+
other functions. Dateutil time zone strings are distinguished from ``pytz``
1260+
time zones by starting with ``dateutil/``. In ``pytz`` you can find a list of
1261+
common (and less common) time zones using ``from pytz import common_timezones, all_timezones``.
1262+
``dateutil`` uses the OS timezones so there isn't a fixed list available. For
1263+
common zones, the names are the same as ``pytz``.
12571264

12581265
.. ipython:: python
1259-
1266+
1267+
# pytz
12601268
rng_utc = date_range('3/6/2012 00:00', periods=10, freq='D', tz='UTC')
1261-
print(rng_utc.tz)
1269+
rng_utc.tz
1270+
1271+
# dateutil
1272+
rng_utc_dateutil = date_range('3/6/2012 00:00', periods=10, freq='D',
1273+
tz='dateutil/UTC')
1274+
rng_utc_dateutil.tz
1275+
1276+
You can also construct the timezone explicitly first, which gives you more control over which
1277+
time zone is used:
1278+
1279+
.. ipython:: python
1280+
1281+
# pytz
1282+
import pytz
1283+
tz_pytz = pytz.timezone('UTC')
1284+
rng_utc = date_range('3/6/2012 00:00', periods=10, freq='D', tz=tz_pytz)
1285+
rng_utc.tz
1286+
1287+
# dateutil
1288+
import dateutil
1289+
tz_dateutil = dateutil.tz.gettz('UTC')
1290+
rng_utc_dateutil = date_range('3/6/2012 00:00', periods=10, freq='D',
1291+
tz=tz_dateutil)
1292+
rng_utc_dateutil.tz
1293+
12621294
12631295
Timestamps, like Python's ``datetime.datetime`` object can be either time zone
12641296
naive or time zone aware. Naive time series and DatetimeIndex objects can be
@@ -1271,13 +1303,19 @@ naive or time zone aware. Naive time series and DatetimeIndex objects can be
12711303
ts_utc = ts.tz_localize('UTC')
12721304
ts_utc
12731305
1306+
Again, you can explicitly construct the timezone object first.
12741307
You can use the ``tz_convert`` method to convert pandas objects to convert
12751308
tz-aware data to another time zone:
12761309

12771310
.. ipython:: python
12781311
12791312
ts_utc.tz_convert('US/Eastern')
12801313
1314+
.. warning::
1315+
Be very wary of conversions between libraries as ``pytz`` and ``dateutil``
1316+
may have different definitions of the time zones. This is more of a problem for
1317+
unusual timezones than for 'standard' zones like ``US/Eastern``.
1318+
12811319
Under the hood, all timestamps are stored in UTC. Scalar values from a
12821320
``DatetimeIndex`` with a time zone will have their fields (day, hour, minute)
12831321
localized to the time zone. However, timestamps with the same UTC value are

doc/source/v0.14.1.txt

+13
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ users upgrade to this version.
99

1010
- Highlights include:
1111

12+
- Support for ``dateutil`` timezones.
13+
1214
- :ref:`Other Enhancements <whatsnew_0141.enhancements>`
1315

1416
- :ref:`API Changes <whatsnew_0141.api>`
@@ -53,6 +55,17 @@ Enhancements
5355
~~~~~~~~~~~~
5456
- Tests for basic reading of public S3 buckets now exist (:issue:`7281`).
5557

58+
- Support for dateutil timezones, which can now be used in the same way as
59+
pytz timezones across pandas. (:issue:`4688`)
60+
61+
.. ipython:: python
62+
63+
rng_utc_dateutil = date_range('3/6/2012 00:00', periods=10, freq='D',
64+
tz='dateutil/UTC')
65+
rng_utc_dateutil.tz
66+
67+
See :ref:`the docs <timeseries.timezone>`.
68+
5669
.. _whatsnew_0141.performance:
5770

5871
Performance

pandas/index.pyx

+3-3
Original file line numberDiff line numberDiff line change
@@ -605,11 +605,11 @@ cdef inline _to_i8(object val):
605605
return get_datetime64_value(val)
606606
elif PyDateTime_Check(val):
607607
tzinfo = getattr(val, 'tzinfo', None)
608-
val = _pydatetime_to_dts(val, &dts)
608+
ival = _pydatetime_to_dts(val, &dts) # Save the original date value so we can get the utcoffset from it.
609609
if tzinfo is not None and not _is_utc(tzinfo):
610610
offset = tslib._get_utcoffset(tzinfo, val)
611-
val -= tslib._delta_to_nanoseconds(offset)
612-
611+
ival -= tslib._delta_to_nanoseconds(offset)
612+
return ival
613613
return val
614614

615615
cdef inline bint _is_utc(object tz):

pandas/io/pytables.py

+17-5
Original file line numberDiff line numberDiff line change
@@ -1729,11 +1729,23 @@ def set_atom(self, block, block_items, existing_col, min_itemsize,
17291729
if getattr(rvalues[0], 'tzinfo', None) is not None:
17301730

17311731
# if this block has more than one timezone, raise
1732-
if len(set([r.tzinfo for r in rvalues])) != 1:
1733-
raise TypeError(
1734-
"too many timezones in this block, create separate "
1735-
"data columns"
1736-
)
1732+
try:
1733+
# pytz timezones: compare on zone name (to avoid issues with DST being a different zone to STD).
1734+
zones = [r.tzinfo.zone for r in rvalues]
1735+
except:
1736+
# dateutil timezones: compare on ==
1737+
zones = [r.tzinfo for r in rvalues]
1738+
if any(zones[0] != zone_i for zone_i in zones[1:]):
1739+
raise TypeError(
1740+
"too many timezones in this block, create separate "
1741+
"data columns"
1742+
)
1743+
else:
1744+
if len(set(zones)) != 1:
1745+
raise TypeError(
1746+
"too many timezones in this block, create separate "
1747+
"data columns"
1748+
)
17371749

17381750
# convert this column to datetime64[ns] utc, and save the tz
17391751
index = DatetimeIndex(rvalues)

pandas/io/tests/test_json/test_ujson.py

+4-1
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
assert_array_almost_equal_nulp,
2626
assert_approx_equal)
2727
import pytz
28+
import dateutil
2829
from pandas import DataFrame, Series, Index, NaT, DatetimeIndex
2930
import pandas.util.testing as tm
3031

@@ -361,7 +362,9 @@ def test_encodeTimeConversion(self):
361362
datetime.time(),
362363
datetime.time(1, 2, 3),
363364
datetime.time(10, 12, 15, 343243),
364-
datetime.time(10, 12, 15, 343243, pytz.utc)]
365+
datetime.time(10, 12, 15, 343243, pytz.utc),
366+
# datetime.time(10, 12, 15, 343243, dateutil.tz.gettz('UTC')), # this segfaults! No idea why.
367+
]
365368
for test in tests:
366369
output = ujson.encode(test)
367370
expected = '"%s"' % test.isoformat()

pandas/io/tests/test_pytables.py

+75-2
Original file line numberDiff line numberDiff line change
@@ -1991,7 +1991,7 @@ def test_unimplemented_dtypes_table_columns(self):
19911991
# this fails because we have a date in the object block......
19921992
self.assertRaises(TypeError, store.append, 'df_unimplemented', df)
19931993

1994-
def test_append_with_timezones(self):
1994+
def test_append_with_timezones_pytz(self):
19951995

19961996
from datetime import timedelta
19971997

@@ -2020,7 +2020,8 @@ def compare(a,b):
20202020
compare(store.select('df_tz',where=Term('A>=df.A[3]')),df[df.A>=df.A[3]])
20212021

20222022
_maybe_remove(store, 'df_tz')
2023-
df = DataFrame(dict(A = Timestamp('20130102',tz='US/Eastern'), B = Timestamp('20130103',tz='US/Eastern')),index=range(5))
2023+
# ensure we include dates in DST and STD time here.
2024+
df = DataFrame(dict(A = Timestamp('20130102',tz='US/Eastern'), B = Timestamp('20130603',tz='US/Eastern')),index=range(5))
20242025
store.append('df_tz',df)
20252026
result = store['df_tz']
20262027
compare(result,df)
@@ -2057,6 +2058,78 @@ def compare(a,b):
20572058
result = store.select('df')
20582059
assert_frame_equal(result,df)
20592060

2061+
def test_append_with_timezones_dateutil(self):
2062+
2063+
from datetime import timedelta
2064+
2065+
try:
2066+
import dateutil
2067+
except ImportError:
2068+
raise nose.SkipTest
2069+
2070+
def compare(a, b):
2071+
tm.assert_frame_equal(a, b)
2072+
2073+
# compare the zones on each element
2074+
for c in a.columns:
2075+
for i in a.index:
2076+
a_e = a[c][i]
2077+
b_e = b[c][i]
2078+
if not (a_e == b_e and a_e.tz == b_e.tz):
2079+
raise AssertionError("invalid tz comparsion [%s] [%s]" % (a_e, b_e))
2080+
2081+
# as columns
2082+
with ensure_clean_store(self.path) as store:
2083+
2084+
_maybe_remove(store, 'df_tz')
2085+
df = DataFrame(dict(A=[ Timestamp('20130102 2:00:00', tz=dateutil.tz.gettz('US/Eastern')) + timedelta(hours=1) * i for i in range(5) ]))
2086+
store.append('df_tz', df, data_columns=['A'])
2087+
result = store['df_tz']
2088+
compare(result, df)
2089+
assert_frame_equal(result, df)
2090+
2091+
# select with tz aware
2092+
compare(store.select('df_tz', where=Term('A>=df.A[3]')), df[df.A >= df.A[3]])
2093+
2094+
_maybe_remove(store, 'df_tz')
2095+
# ensure we include dates in DST and STD time here.
2096+
df = DataFrame(dict(A=Timestamp('20130102', tz=dateutil.tz.gettz('US/Eastern')), B=Timestamp('20130603', tz=dateutil.tz.gettz('US/Eastern'))), index=range(5))
2097+
store.append('df_tz', df)
2098+
result = store['df_tz']
2099+
compare(result, df)
2100+
assert_frame_equal(result, df)
2101+
2102+
_maybe_remove(store, 'df_tz')
2103+
df = DataFrame(dict(A=Timestamp('20130102', tz=dateutil.tz.gettz('US/Eastern')), B=Timestamp('20130102', tz=dateutil.tz.gettz('EET'))), index=range(5))
2104+
self.assertRaises(TypeError, store.append, 'df_tz', df)
2105+
2106+
# this is ok
2107+
_maybe_remove(store, 'df_tz')
2108+
store.append('df_tz', df, data_columns=['A', 'B'])
2109+
result = store['df_tz']
2110+
compare(result, df)
2111+
assert_frame_equal(result, df)
2112+
2113+
# can't append with diff timezone
2114+
df = DataFrame(dict(A=Timestamp('20130102', tz=dateutil.tz.gettz('US/Eastern')), B=Timestamp('20130102', tz=dateutil.tz.gettz('CET'))), index=range(5))
2115+
self.assertRaises(ValueError, store.append, 'df_tz', df)
2116+
2117+
# as index
2118+
with ensure_clean_store(self.path) as store:
2119+
2120+
# GH 4098 example
2121+
df = DataFrame(dict(A=Series(lrange(3), index=date_range('2000-1-1', periods=3, freq='H', tz=dateutil.tz.gettz('US/Eastern')))))
2122+
2123+
_maybe_remove(store, 'df')
2124+
store.put('df', df)
2125+
result = store.select('df')
2126+
assert_frame_equal(result, df)
2127+
2128+
_maybe_remove(store, 'df')
2129+
store.append('df', df)
2130+
result = store.select('df')
2131+
assert_frame_equal(result, df)
2132+
20602133
def test_store_timezone(self):
20612134
# GH2852
20622135
# issue storing datetime.date with a timezone as it resets when read back in a new timezone

pandas/tests/test_format.py

+22-1
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,13 @@ def _skip_if_no_pytz():
9292
except ImportError:
9393
raise nose.SkipTest("pytz not installed")
9494

95+
def _skip_if_no_dateutil():
96+
try:
97+
import dateutil
98+
except ImportError:
99+
raise nose.SkipTest("dateutil not installed")
100+
101+
95102
class TestDataFrameFormatting(tm.TestCase):
96103
_multiprocess_can_split_ = True
97104

@@ -2922,7 +2929,7 @@ def test_no_tz(self):
29222929
ts_nanos_micros = Timestamp(1200)
29232930
self.assertEqual(str(ts_nanos_micros), "1970-01-01 00:00:00.000001200")
29242931

2925-
def test_tz(self):
2932+
def test_tz_pytz(self):
29262933
_skip_if_no_pytz()
29272934

29282935
import pytz
@@ -2936,6 +2943,20 @@ def test_tz(self):
29362943
dt_datetime_us = datetime(2013, 1, 2, 12, 1, 3, 45, tzinfo=pytz.utc)
29372944
self.assertEqual(str(dt_datetime_us), str(Timestamp(dt_datetime_us)))
29382945

2946+
def test_tz_dateutil(self):
2947+
_skip_if_no_dateutil()
2948+
import dateutil
2949+
utc = dateutil.tz.gettz('UTC')
2950+
2951+
dt_date = datetime(2013, 1, 2, tzinfo=utc)
2952+
self.assertEqual(str(dt_date), str(Timestamp(dt_date)))
2953+
2954+
dt_datetime = datetime(2013, 1, 2, 12, 1, 3, tzinfo=utc)
2955+
self.assertEqual(str(dt_datetime), str(Timestamp(dt_datetime)))
2956+
2957+
dt_datetime_us = datetime(2013, 1, 2, 12, 1, 3, 45, tzinfo=utc)
2958+
self.assertEqual(str(dt_datetime_us), str(Timestamp(dt_datetime_us)))
2959+
29392960
if __name__ == '__main__':
29402961
import nose
29412962
nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'],

pandas/tests/test_series.py

+40-1
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,12 @@ def _skip_if_no_pytz():
4747
except ImportError:
4848
raise nose.SkipTest("pytz not installed")
4949

50+
def _skip_if_no_dateutil():
51+
try:
52+
import dateutil
53+
except ImportError:
54+
raise nose.SkipTest("dateutil not installed")
55+
5056
#------------------------------------------------------------------------------
5157
# Series test cases
5258

@@ -4573,7 +4579,7 @@ def test_getitem_setitem_datetimeindex(self):
45734579
result["1990-01-02"] = ts[24:48]
45744580
assert_series_equal(result, ts)
45754581

4576-
def test_getitem_setitem_datetime_tz(self):
4582+
def test_getitem_setitem_datetime_tz_pytz(self):
45774583
_skip_if_no_pytz();
45784584
from pytz import timezone as tz
45794585

@@ -4608,6 +4614,39 @@ def test_getitem_setitem_datetime_tz(self):
46084614
result[date] = ts[4]
46094615
assert_series_equal(result, ts)
46104616

4617+
4618+
def test_getitem_setitem_datetime_tz_dateutil(self):
4619+
_skip_if_no_dateutil();
4620+
from dateutil.tz import gettz as tz
4621+
4622+
from pandas import date_range
4623+
N = 50
4624+
# testing with timezone, GH #2785
4625+
rng = date_range('1/1/1990', periods=N, freq='H', tz='US/Eastern')
4626+
ts = Series(np.random.randn(N), index=rng)
4627+
4628+
# also test Timestamp tz handling, GH #2789
4629+
result = ts.copy()
4630+
result["1990-01-01 09:00:00+00:00"] = 0
4631+
result["1990-01-01 09:00:00+00:00"] = ts[4]
4632+
assert_series_equal(result, ts)
4633+
4634+
result = ts.copy()
4635+
result["1990-01-01 03:00:00-06:00"] = 0
4636+
result["1990-01-01 03:00:00-06:00"] = ts[4]
4637+
assert_series_equal(result, ts)
4638+
4639+
# repeat with datetimes
4640+
result = ts.copy()
4641+
result[datetime(1990, 1, 1, 9, tzinfo=tz('UTC'))] = 0
4642+
result[datetime(1990, 1, 1, 9, tzinfo=tz('UTC'))] = ts[4]
4643+
assert_series_equal(result, ts)
4644+
4645+
result = ts.copy()
4646+
result[datetime(1990, 1, 1, 3, tzinfo=tz('US/Central'))] = 0
4647+
result[datetime(1990, 1, 1, 3, tzinfo=tz('US/Central'))] = ts[4]
4648+
assert_series_equal(result, ts)
4649+
46114650
def test_getitem_setitem_periodindex(self):
46124651
from pandas import period_range
46134652
N = 50

0 commit comments

Comments
 (0)