Skip to content

BUG/TST: Add more timezone fixtures and use is_utc more consistently #23807

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Nov 23, 2018
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1248,6 +1248,8 @@ Timezones
- Bug when indexing a :class:`Series` with a DST transition (:issue:`21846`)
- Bug in :meth:`DataFrame.resample` and :meth:`Series.resample` where an ``AmbiguousTimeError`` or ``NonExistentTimeError`` would raise if a timezone aware timeseries ended on a DST transition (:issue:`19375`, :issue:`10117`)
- Bug in :meth:`DataFrame.drop` and :meth:`Series.drop` when specifying a tz-aware Timestamp key to drop from a :class:`DatetimeIndex` with a DST transition (:issue:`21761`)
- Bug in :class:`DatetimeIndex` constructor where :class:`NaT` and ``dateutil.tz.tzlocal`` would raise an ``OutOfBoundsDatetime`` error (:issue:`23807`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: I'm guessing this happens for users for whom tzlocal is UTC-N but not those for whom tzlocal is UTC+N (@jorisvandenbossche can you check this when convenient?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was actually because NaT wasn't handled in tz_localize_to_utc so NaT was converted to its i8 value which was less than Timestamp.min

https://github.com/pandas-dev/pandas/pull/23807/files#diff-66d289312fd0b02e7721bf45fb281797R911

- Bug in :meth:`DatetimeIndex.tz_localize` and :meth:`Timestamp.tz_localize` with ``dateutil.tz.tzlocal`` near a DST transition that would return an incorrectly localized datetime (:issue:`23807`)

Offsets
^^^^^^^
Expand Down
5 changes: 3 additions & 2 deletions pandas/_libs/tslib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ from tslibs.parsing import parse_datetime_string

from tslibs.timedeltas cimport cast_from_unit
from tslibs.timezones cimport is_utc, is_tzlocal, get_dst_info
from tslibs.timezones import UTC
from tslibs.conversion cimport (tz_convert_single, _TSObject,
convert_datetime_to_tsobject,
get_datetime64_nanos,
Expand Down Expand Up @@ -211,7 +212,7 @@ def _test_parse_iso8601(object ts):
check_dts_bounds(&obj.dts)
if out_local == 1:
obj.tzinfo = pytz.FixedOffset(out_tzoffset)
obj.value = tz_convert_single(obj.value, obj.tzinfo, 'UTC')
obj.value = tz_convert_single(obj.value, obj.tzinfo, UTC)
return Timestamp(obj.value, tz=obj.tzinfo)
else:
return Timestamp(obj.value)
Expand Down Expand Up @@ -673,7 +674,7 @@ cpdef array_to_datetime(ndarray[object] values, errors='raise',
# dateutil.tz.tzoffset objects
out_tzoffset_vals.add(out_tzoffset * 60.)
tz = pytz.FixedOffset(out_tzoffset)
value = tz_convert_single(value, tz, 'UTC')
value = tz_convert_single(value, tz, UTC)
else:
# Add a marker for naive string, to track if we are
# parsing mixed naive and aware strings
Expand Down
67 changes: 45 additions & 22 deletions pandas/_libs/tslibs/conversion.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ from numpy cimport int64_t, int32_t, ndarray
cnp.import_array()

import pytz
from dateutil.tz import tzutc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you switching this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No switching, this is just used here for timezone conversion between tzlocal and tzutc (making sure they are both dateutil timezones so they play nice) https://github.com/pandas-dev/pandas/pull/23807/files/de3c83bb946a7791d29a18db5f004228b24c9b02#diff-66d289312fd0b02e7721bf45fb281797R705

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


# stdlib datetime imports
from datetime import time as datetime_time
Expand Down Expand Up @@ -35,6 +36,7 @@ from timedeltas cimport cast_from_unit
from timezones cimport (is_utc, is_tzlocal, is_fixed_offset,
get_utcoffset, get_dst_info,
get_timezone, maybe_get_tz, tz_compare)
from timezones import UTC
from parsing import parse_datetime_string

from nattype import nat_strings, NaT
Expand All @@ -46,8 +48,6 @@ from nattype cimport NPY_NAT, checknull_with_nat
NS_DTYPE = np.dtype('M8[ns]')
TD_DTYPE = np.dtype('m8[ns]')

UTC = pytz.UTC


# ----------------------------------------------------------------------
# Misc Helpers
Expand Down Expand Up @@ -442,7 +442,7 @@ cdef _TSObject convert_str_to_tsobject(object ts, object tz, object unit,
check_dts_bounds(&obj.dts)
if out_local == 1:
obj.tzinfo = pytz.FixedOffset(out_tzoffset)
obj.value = tz_convert_single(obj.value, obj.tzinfo, 'UTC')
obj.value = tz_convert_single(obj.value, obj.tzinfo, UTC)
if tz is None:
check_dts_bounds(&obj.dts)
check_overflows(obj)
Expand Down Expand Up @@ -540,7 +540,8 @@ cdef inline void localize_tso(_TSObject obj, tzinfo tz):
elif obj.value == NPY_NAT:
pass
elif is_tzlocal(tz):
local_val = _tz_convert_tzlocal_utc(obj.value, tz, to_utc=False)
local_val = _tz_convert_tzlocal_utc(obj.value, tz, to_utc=False,
from_utc=True)
dt64_to_dtstruct(local_val, &obj.dts)
else:
# Adjust datetime64 timestamp, recompute datetimestruct
Expand Down Expand Up @@ -576,7 +577,7 @@ cdef inline datetime _localize_pydatetime(datetime dt, tzinfo tz):
identically, i.e. discards nanos from Timestamps.
It also assumes that the `tz` input is not None.
"""
if tz == 'UTC' or tz is UTC:
if is_utc(tz):
return UTC.localize(dt)
try:
# datetime.replace with pytz may be incorrect result
Expand All @@ -603,7 +604,7 @@ cpdef inline datetime localize_pydatetime(datetime dt, object tz):
elif not PyDateTime_CheckExact(dt):
# i.e. is a Timestamp
return dt.tz_localize(tz)
elif tz == 'UTC' or tz is UTC:
elif is_utc(tz):
return UTC.localize(dt)
try:
# datetime.replace with pytz may be incorrect result
Expand Down Expand Up @@ -642,15 +643,20 @@ cdef inline int64_t[:] _tz_convert_dst(int64_t[:] values, tzinfo tz,
int64_t[:] deltas
int64_t v

trans, deltas, typ = get_dst_info(tz)
if not to_utc:
# We add `offset` below instead of subtracting it
deltas = -1 * np.array(deltas, dtype='i8')
if not is_tzlocal(tz):
# get_dst_info cannot extract offsets from tzlocal because its
# dependent on a datetime
trans, deltas, typ = get_dst_info(tz)
if not to_utc:
# We add `offset` below instead of subtracting it
deltas = -1 * np.array(deltas, dtype='i8')

for i in range(n):
v = values[i]
if v == NPY_NAT:
result[i] = v
elif is_tzlocal(tz):
result[i] = _tz_convert_tzlocal_utc(v, tz, to_utc=to_utc)
else:
# TODO: Is it more efficient to call searchsorted pointwise or
# on `values` outside the loop? We are not consistent about this.
Expand All @@ -664,7 +670,8 @@ cdef inline int64_t[:] _tz_convert_dst(int64_t[:] values, tzinfo tz,


cdef inline int64_t _tz_convert_tzlocal_utc(int64_t val, tzinfo tz,
bint to_utc=True):
bint to_utc=True,
bint from_utc=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't from_utc just equal to not to_utc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm it appears that way in general. I can try to remove the new keyword arg for not to_utc

"""
Convert the i8 representation of a datetime from a tzlocal timezone to
UTC, or vice-versa.
Expand All @@ -677,6 +684,8 @@ cdef inline int64_t _tz_convert_tzlocal_utc(int64_t val, tzinfo tz,
tz : tzinfo
to_utc : bint
True if converting tzlocal _to_ UTC, False if going the other direction
from_utc: bint
True if val is a UTC timestamp, False if val is a wall/naive timestamp

Returns
-------
Expand All @@ -689,7 +698,12 @@ cdef inline int64_t _tz_convert_tzlocal_utc(int64_t val, tzinfo tz,

dt64_to_dtstruct(val, &dts)
dt = datetime(dts.year, dts.month, dts.day, dts.hour,
dts.min, dts.sec, dts.us, tz)
dts.min, dts.sec, dts.us)
# get_utcoffset (tz.utcoffset under the hood) only makes sense if datetime
# is _wall time_, so if val is a UTC timestamp convert to wall time
if from_utc:
dt = dt.replace(tzinfo=tzutc())
dt = dt.astimezone(tz)
delta = int(get_utcoffset(tz, dt).total_seconds()) * 1000000000

if not to_utc:
Expand All @@ -708,7 +722,7 @@ cdef inline int64_t tz_convert_utc_to_tzlocal(int64_t utc_val, tzinfo tz):
-------
local_val : int64_t
"""
return _tz_convert_tzlocal_utc(utc_val, tz, to_utc=False)
return _tz_convert_tzlocal_utc(utc_val, tz, to_utc=False, from_utc=True)


cpdef int64_t tz_convert_single(int64_t val, object tz1, object tz2):
Expand All @@ -735,24 +749,25 @@ cpdef int64_t tz_convert_single(int64_t val, object tz1, object tz2):
int64_t arr[1]

# See GH#17734 We should always be converting either from UTC or to UTC
assert (is_utc(tz1) or tz1 == 'UTC') or (is_utc(tz2) or tz2 == 'UTC')
assert is_utc(tz1) or is_utc(tz2)

if val == NPY_NAT:
return val

# Convert to UTC
if is_tzlocal(tz1):
utc_date = _tz_convert_tzlocal_utc(val, tz1, to_utc=True)
elif get_timezone(tz1) != 'UTC':
elif not is_utc(get_timezone(tz1)):
arr[0] = val
utc_date = _tz_convert_dst(arr, tz1, to_utc=True)[0]
else:
utc_date = val

if get_timezone(tz2) == 'UTC':
if is_utc(get_timezone(tz2)):
return utc_date
elif is_tzlocal(tz2):
return _tz_convert_tzlocal_utc(utc_date, tz2, to_utc=False)
return _tz_convert_tzlocal_utc(utc_date, tz2, to_utc=False,
from_utc=True)
else:
# Convert UTC to other timezone
arr[0] = utc_date
Expand All @@ -766,7 +781,7 @@ cpdef int64_t tz_convert_single(int64_t val, object tz1, object tz2):
@cython.boundscheck(False)
@cython.wraparound(False)
cdef inline int64_t[:] _tz_convert_one_way(int64_t[:] vals, object tz,
bint to_utc):
bint to_utc, bint from_utc=False):
"""
Convert the given values (in i8) either to UTC or from UTC.

Expand All @@ -775,6 +790,9 @@ cdef inline int64_t[:] _tz_convert_one_way(int64_t[:] vals, object tz,
vals : int64 ndarray
tz1 : string / timezone object
to_utc : bint
from_utc : bint
True if vals are UTC timestamps, False if vals are wall timestamps
Only applicable for tz=dateutil.tz.tzlocal()

Returns
-------
Expand All @@ -785,15 +803,16 @@ cdef inline int64_t[:] _tz_convert_one_way(int64_t[:] vals, object tz,
Py_ssize_t i, n = len(vals)
int64_t val

if get_timezone(tz) != 'UTC':
if not is_utc(get_timezone(tz)):
converted = np.empty(n, dtype=np.int64)
if is_tzlocal(tz):
for i in range(n):
val = vals[i]
if val == NPY_NAT:
converted[i] = NPY_NAT
else:
converted[i] = _tz_convert_tzlocal_utc(val, tz, to_utc)
converted[i] = _tz_convert_tzlocal_utc(val, tz, to_utc,
from_utc=from_utc)
else:
converted = _tz_convert_dst(vals, tz, to_utc)
else:
Expand Down Expand Up @@ -826,7 +845,8 @@ def tz_convert(int64_t[:] vals, object tz1, object tz2):

# Convert to UTC
utc_dates = _tz_convert_one_way(vals, tz1, to_utc=True)
converted = _tz_convert_one_way(utc_dates, tz2, to_utc=False)
converted = _tz_convert_one_way(utc_dates, tz2, to_utc=False,
from_utc=True)
return np.array(converted, dtype=np.int64)


Expand Down Expand Up @@ -890,7 +910,10 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
if is_tzlocal(tz):
for i in range(n):
v = vals[i]
result[i] = _tz_convert_tzlocal_utc(v, tz, to_utc=True)
if v == NPY_NAT:
result[i] = NPY_NAT
else:
result[i] = _tz_convert_tzlocal_utc(v, tz, to_utc=True)
return result

if is_string_object(ambiguous):
Expand Down
3 changes: 2 additions & 1 deletion pandas/_libs/tslibs/offsets.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ from conversion cimport tz_convert_single, pydt_to_i8, localize_pydatetime
from nattype cimport NPY_NAT
from np_datetime cimport (npy_datetimestruct,
dtstruct_to_dt64, dt64_to_dtstruct)
from timezones import UTC

# ---------------------------------------------------------------------
# Constants
Expand Down Expand Up @@ -211,7 +212,7 @@ def _to_dt64(dt, dtype='datetime64'):
# Thus astype is needed to cast datetime to datetime64[D]
if getattr(dt, 'tzinfo', None) is not None:
i8 = pydt_to_i8(dt)
dt = tz_convert_single(i8, 'UTC', dt.tzinfo)
dt = tz_convert_single(i8, UTC, dt.tzinfo)
dt = np.int64(dt).astype('datetime64[ns]')
else:
dt = np.datetime64(dt)
Expand Down
9 changes: 5 additions & 4 deletions pandas/_libs/tslibs/timestamps.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ from timedeltas import Timedelta
from timedeltas cimport delta_to_nanoseconds
from timezones cimport (
get_timezone, is_utc, maybe_get_tz, treat_tz_as_pytz, tz_compare)
from timezones import UTC

# ----------------------------------------------------------------------
# Constants
Expand Down Expand Up @@ -416,7 +417,7 @@ cdef class _Timestamp(datetime):
int64_t val
val = self.value
if self.tz is not None and not is_utc(self.tz):
val = tz_convert_single(self.value, 'UTC', self.tz)
val = tz_convert_single(self.value, UTC, self.tz)
return val

cpdef bint _get_start_end_field(self, str field):
Expand Down Expand Up @@ -633,7 +634,7 @@ class Timestamp(_Timestamp):
Return a new Timestamp representing UTC day and time.
"""
return cls.now('UTC')
return cls.now(UTC)

@classmethod
def utcfromtimestamp(cls, ts):
Expand Down Expand Up @@ -1108,7 +1109,7 @@ class Timestamp(_Timestamp):
else:
if tz is None:
# reset tz
value = tz_convert_single(self.value, 'UTC', self.tz)
value = tz_convert_single(self.value, UTC, self.tz)
return Timestamp(value, tz=None)
else:
raise TypeError('Cannot localize tz-aware Timestamp, use '
Expand Down Expand Up @@ -1178,7 +1179,7 @@ class Timestamp(_Timestamp):
_tzinfo = self.tzinfo
value = self.value
if _tzinfo is not None:
value_tz = tz_convert_single(value, _tzinfo, 'UTC')
value_tz = tz_convert_single(value, _tzinfo, UTC)
value += value - value_tz

# setup components
Expand Down
2 changes: 1 addition & 1 deletion pandas/_libs/tslibs/timezones.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ cpdef inline object get_timezone(object tz):
UJSON/pytables. maybe_get_tz (below) is the inverse of this process.
"""
if is_utc(tz):
return 'UTC'
return tz
else:
if treat_tz_as_dateutil(tz):
if '.tar.gz' in tz._filename:
Expand Down
9 changes: 5 additions & 4 deletions pandas/conftest.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
import importlib
import os

from dateutil.tz import tzutc
from dateutil.tz import tzutc, tzlocal
import hypothesis
from hypothesis import strategies as st
import numpy as np
import pytest
from pytz import utc
from pytz import utc, FixedOffset

from pandas.compat import PY3
import pandas.util._test_decorators as td
Expand Down Expand Up @@ -245,7 +245,7 @@ def datetime_tz_utc():
return timezone.utc


utc_objs = ['utc', utc, tzutc()]
utc_objs = ['utc', 'dateutil/UTC', utc, tzutc()]
if PY3:
from datetime import timezone
utc_objs.append(timezone.utc)
Expand Down Expand Up @@ -354,7 +354,8 @@ def unique_nulls_fixture(request):


TIMEZONES = [None, 'UTC', 'US/Eastern', 'Asia/Tokyo', 'dateutil/US/Pacific',
'dateutil/Asia/Singapore']
'dateutil/Asia/Singapore', tzutc(), tzlocal(), FixedOffset(300),
FixedOffset(0), FixedOffset(-300)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brainstorm:

  • non-round numbers for FixedOffset
  • we have a small number of tests with tzinfos from psycopg2.tz and matplotlib (not sure where in the matplotlib namespace)
  • random subset of pytz.all_timezones (592 of them)
  • equivalent from dateutil.tz (not sure off the top of my head how to access that, but if we figure it out I'll make a PR there)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how much additional time do the tests take with these added fixtures? how many more tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add these in a follow up and further discuss.



@td.parametrize_fixture_doc(str(TIMEZONES))
Expand Down
3 changes: 2 additions & 1 deletion pandas/core/arrays/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -790,7 +790,8 @@ def tz_localize(self, tz, ambiguous='raise', nonexistent='raise',

if self.tz is not None:
if tz is None:
new_dates = conversion.tz_convert(self.asi8, 'UTC', self.tz)
new_dates = conversion.tz_convert(self.asi8, timezones.UTC,
self.tz)
else:
raise TypeError("Already tz-aware, use tz_convert to convert.")
else:
Expand Down
7 changes: 7 additions & 0 deletions pandas/tests/indexes/datetimes/test_construction.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import numpy as np
import pytest
import pytz
import dateutil

from pandas._libs.tslib import OutOfBoundsDatetime
from pandas._libs.tslibs import conversion
Expand Down Expand Up @@ -527,6 +528,12 @@ def test_construction_with_tz_and_tz_aware_dti(self):
with pytest.raises(TypeError):
DatetimeIndex(dti, tz='Asia/Tokyo')

def test_construction_with_nat_and_tzlocal(self):
tz = dateutil.tz.tzlocal()
result = DatetimeIndex(['2018', 'NaT'], tz=tz)
expected = DatetimeIndex([Timestamp('2018', tz=tz), pd.NaT])
tm.assert_index_equal(result, expected)


class TestTimeSeries(object):

Expand Down
Loading