Skip to content

ENH/BUG: Add is_dst method to DatetimeIndex and Timestamp to solve AmbiguousTimeError #22560

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from
3 changes: 3 additions & 0 deletions doc/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -582,6 +582,7 @@ These can be accessed like ``Series.dt.<property>``.
Series.dt.to_pydatetime
Series.dt.tz_localize
Series.dt.tz_convert
Series.dt.is_dst
Series.dt.normalize
Series.dt.strftime
Series.dt.round
Expand Down Expand Up @@ -1778,6 +1779,7 @@ Time-specific operations
DatetimeIndex.snap
DatetimeIndex.tz_convert
DatetimeIndex.tz_localize
DatetimeIndex.is_dst
DatetimeIndex.round
DatetimeIndex.floor
DatetimeIndex.ceil
Expand Down Expand Up @@ -1985,6 +1987,7 @@ Methods
Timestamp.isocalendar
Timestamp.isoformat
Timestamp.isoweekday
Timestamp.is_dst
Timestamp.month_name
Timestamp.normalize
Timestamp.now
Expand Down
3 changes: 3 additions & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,7 @@ Other Enhancements
- :class:`DatetimeIndex` gained :attr:`DatetimeIndex.timetz` attribute. Returns local time with timezone information. (:issue:`21358`)
- :class:`Resampler` now is iterable like :class:`GroupBy` (:issue:`15314`).
- :ref:`Series.resample` and :ref:`DataFrame.resample` have gained the :meth:`Resampler.quantile` (:issue:`15023`).
- :class:`DatetimeIndex` and :class:`Timestamp` have gained an ``is_dst`` method (:issue:`18885`, :issue:`18946`)

.. _whatsnew_0240.api_breaking:

Expand Down Expand Up @@ -619,6 +620,8 @@ Timezones
- Bug when setting a new value with :meth:`DataFrame.loc` with a :class:`DatetimeIndex` with a DST transition (:issue:`18308`, :issue:`20724`)
- Bug in :meth:`DatetimeIndex.unique` that did not re-localize tz-aware dates correctly (:issue:`21737`)
- Bug when indexing a :class:`Series` with a DST transition (:issue:`21846`)
- Bug in :meth:`DatetimeIndex.floor` that raised an ``AmbiguousTimeError`` during a DST transition (:issue:`18946`)
- Bug in :func:`merge` when merging ``datetime64[ns, tz]`` data that contained a DST transition (:issue:`18885`)

Offsets
^^^^^^^
Expand Down
2 changes: 1 addition & 1 deletion pandas/_libs/tslib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ def ints_to_pydatetime(int64_t[:] arr, tz=None, freq=None, box="datetime"):
dt64_to_dtstruct(local_value, &dts)
result[i] = func_create(value, dts, tz, freq)
else:
trans, deltas, typ = get_dst_info(tz)
trans, deltas, typ = get_dst_info(tz, False)

if typ not in ['pytz', 'dateutil']:
# static/fixed; in this case we know that len(delta) == 1
Expand Down
13 changes: 6 additions & 7 deletions pandas/_libs/tslibs/conversion.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,10 @@ from np_datetime import OutOfBoundsDatetime

from util cimport (is_string_object,
is_datetime64_object,
is_integer_object, is_float_object, is_array)
is_integer_object, is_float_object)

from timedeltas cimport cast_from_unit
from timezones cimport (is_utc, is_tzlocal, is_fixed_offset,
treat_tz_as_dateutil, treat_tz_as_pytz,
get_utcoffset, get_dst_info,
get_timezone, maybe_get_tz, tz_compare)
from parsing import parse_datetime_string
Expand Down Expand Up @@ -540,7 +539,7 @@ cdef inline void localize_tso(_TSObject obj, tzinfo tz):
dt64_to_dtstruct(local_val, &obj.dts)
else:
# Adjust datetime64 timestamp, recompute datetimestruct
trans, deltas, typ = get_dst_info(tz)
trans, deltas, typ = get_dst_info(tz, False)

if is_fixed_offset(tz):
# static/fixed tzinfo; in this case we know len(deltas) == 1
Expand Down Expand Up @@ -636,7 +635,7 @@ cdef inline int64_t[:] _tz_convert_dst(int64_t[:] values, tzinfo tz,
int64_t[:] deltas
int64_t v

trans, deltas, typ = get_dst_info(tz)
trans, deltas, typ = get_dst_info(tz, False)
if not to_utc:
# We add `offset` below instead of subtracting it
deltas = -1 * np.array(deltas, dtype='i8')
Expand Down Expand Up @@ -888,7 +887,7 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
"the same size as vals")
ambiguous_array = np.asarray(ambiguous)

trans, deltas, typ = get_dst_info(tz)
trans, deltas, typ = get_dst_info(tz, False)

tdata = <int64_t*> cnp.PyArray_DATA(trans)
ntrans = len(trans)
Expand Down Expand Up @@ -1150,7 +1149,7 @@ cdef int64_t[:] _normalize_local(int64_t[:] stamps, object tz):
result[i] = _normalized_stamp(&dts)
else:
# Adjust datetime64 timestamp, recompute datetimestruct
trans, deltas, typ = get_dst_info(tz)
trans, deltas, typ = get_dst_info(tz, False)

if typ not in ['pytz', 'dateutil']:
# static/fixed; in this case we know that len(delta) == 1
Expand Down Expand Up @@ -1227,7 +1226,7 @@ def is_date_array_normalized(int64_t[:] stamps, tz=None):
if (dts.hour + dts.min + dts.sec + dts.us) > 0:
return False
else:
trans, deltas, typ = get_dst_info(tz)
trans, deltas, typ = get_dst_info(tz, False)

if typ not in ['pytz', 'dateutil']:
# static/fixed; in this case we know that len(delta) == 1
Expand Down
14 changes: 14 additions & 0 deletions pandas/_libs/tslibs/nattype.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -260,6 +260,20 @@ class NaTType(_NaT):
def is_year_end(self):
return False

def is_dst(self):
"""
Returns a boolean indicating if the Timestamp is in daylight savings
time. Naive timestamps are considered not to be in daylight savings
time.

Returns
-------
Boolean
True if the Timestamp is in daylight savings time
False if the Timestamp is naive or not in daylight savings time
"""
return False

def __rdiv__(self, other):
return _nat_rdivide_op(self, other)

Expand Down
2 changes: 1 addition & 1 deletion pandas/_libs/tslibs/period.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -1516,7 +1516,7 @@ cdef int64_t[:] localize_dt64arr_to_period(int64_t[:] stamps,
result[i] = get_period_ordinal(&dts, freq)
else:
# Adjust datetime64 timestamp, recompute datetimestruct
trans, deltas, typ = get_dst_info(tz)
trans, deltas, typ = get_dst_info(tz, False)

if typ not in ['pytz', 'dateutil']:
# static/fixed; in this case we know that len(delta) == 1
Expand Down
2 changes: 1 addition & 1 deletion pandas/_libs/tslibs/resolution.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ cdef _reso_local(int64_t[:] stamps, object tz):
reso = curr_reso
else:
# Adjust datetime64 timestamp, recompute datetimestruct
trans, deltas, typ = get_dst_info(tz)
trans, deltas, typ = get_dst_info(tz, False)

if typ not in ['pytz', 'dateutil']:
# static/fixed; in this case we know that len(delta) == 1
Expand Down
14 changes: 14 additions & 0 deletions pandas/_libs/tslibs/timestamps.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -722,6 +722,20 @@ class Timestamp(_Timestamp):
raise AttributeError("Cannot directly set timezone. Use tz_localize() "
"or tz_convert() as appropriate")

def is_dst(self):
"""
Returns a boolean indicating if the Timestamp is in daylight savings
time. Naive timestamps are considered not to be in daylight savings
time.

Returns
-------
Boolean
True if the Timestamp is in daylight savings time
False if the Timestamp is naive or not in daylight savings time
"""
return bool(self.dst())

def __setstate__(self, state):
self.value = state[0]
self.freq = state[1]
Expand Down
2 changes: 1 addition & 1 deletion pandas/_libs/tslibs/timezones.pxd
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,4 @@ cpdef object maybe_get_tz(object tz)
cdef get_utcoffset(tzinfo, obj)
cdef bint is_fixed_offset(object tz)

cdef object get_dst_info(object tz)
cdef object get_dst_info(object tz, bint dst)
84 changes: 76 additions & 8 deletions pandas/_libs/tslibs/timezones.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,8 @@ def _p_tz_cache_key(tz):
return tz_cache_key(tz)


# Timezone data caches, key is the pytz string or dateutil file name.
# Timezone data (UTC offset) caches
# key is the pytz string or dateutil file name.
dst_cache = {}


Expand Down Expand Up @@ -186,16 +187,30 @@ cdef object get_utc_trans_times_from_dateutil_tz(object tz):
return new_trans


cdef int64_t[:] unbox_utcoffsets(object transinfo):
cdef int64_t[:] unbox_utcoffsets(object transinfo, bint dst):
"""
Unpack the offset information from the _transition_info attribute of pytz
timezones

Parameters
----------
transinfo : list of tuples
Each tuple contains (UTC offset, DST offset, tz abbreviation)
dst : boolean
True returns an array of the DST offsets
False returns an array of UTC offsets
"""
cdef:
Py_ssize_t i, sz
int64_t[:] arr
int key

sz = len(transinfo)
arr = np.empty(sz, dtype='i8')

for i in range(sz):
arr[i] = int(transinfo[i][0].total_seconds()) * 1000000000
# If dst == True, extract the DST shift in nanoseconds
# If dst == False, extract the UTC offset in nanoseconds
arr[i] = int(transinfo[i][dst].total_seconds()) * 1000000000

return arr

Expand All @@ -204,9 +219,23 @@ cdef int64_t[:] unbox_utcoffsets(object transinfo):
# Daylight Savings


cdef object get_dst_info(object tz):
cdef object get_dst_info(object tz, bint dst):
"""
return a tuple of :
Return DST info from a timezone

Parameters
----------
tz : object
timezone object
dst : bint
True returns the DST specific offset and will NOT store the results in
dst_cache. dst_cache is reserved for caching UTC offsets.
False returns the UTC offset
Specific for pytz timezones only

Returns
-------
tuple
(UTC times of DST transitions,
UTC offsets in microseconds corresponding to DST transitions,
string of type of transitions)
Expand All @@ -221,7 +250,7 @@ cdef object get_dst_info(object tz):
np.array([num], dtype=np.int64),
None)

if cache_key not in dst_cache:
if cache_key not in dst_cache or dst:
if treat_tz_as_pytz(tz):
trans = np.array(tz._utc_transition_times, dtype='M8[ns]')
trans = trans.view('i8')
Expand All @@ -230,7 +259,7 @@ cdef object get_dst_info(object tz):
trans[0] = NPY_NAT + 1
except Exception:
pass
deltas = unbox_utcoffsets(tz._transition_info)
deltas = unbox_utcoffsets(tz._transition_info, dst)
typ = 'pytz'

elif treat_tz_as_dateutil(tz):
Expand Down Expand Up @@ -273,11 +302,50 @@ cdef object get_dst_info(object tz):
deltas = np.array([num], dtype=np.int64)
typ = 'static'

if dst:
return trans, deltas, typ
dst_cache[cache_key] = (trans, deltas, typ)

return dst_cache[cache_key]


def is_dst(int64_t[:] values, object tz):
"""
Return a boolean array indicating whether each epoch timestamp is in
daylight savings time with respect with the passed timezone.

Parameters
----------
values : ndarray
i8 representation of the datetimes
tz : object
timezone

Returns
-------
ndarray of booleans
True indicates daylight savings time
"""
cdef:
Py_ssize_t n = len(values)
object typ

result = np.zeros(n, dtype=bool)
if tz is None:
return result
transitions, offsets, typ = get_dst_info(tz, True)
offsets = np.array(offsets)

# Fixed timezone offsets do not have DST transitions
if typ not in {'pytz', 'dateutil'}:
return result
positions = transitions.searchsorted(values, side='right') - 1

# DST has nonzero offset
result = offsets[positions] != 0
return result


def infer_tzinfo(start, end):
if start is not None and end is not None:
tz = start.tzinfo
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/indexes/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -284,7 +284,7 @@ def _ensure_localized(self, result):
if getattr(self, 'tz', None) is not None:
if not isinstance(result, ABCIndexClass):
result = self._simple_new(result)
result = result.tz_localize(self.tz)
result = result.tz_localize(self.tz, ambiguous=self.is_dst())
return result

def _box_values_as_index(self):
Expand Down
32 changes: 31 additions & 1 deletion pandas/core/indexes/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -266,7 +266,7 @@ def _add_comparison_methods(cls):
_datetimelike_methods = ['to_period', 'tz_localize',
'tz_convert',
'normalize', 'strftime', 'round', 'floor',
'ceil', 'month_name', 'day_name']
'ceil', 'month_name', 'day_name', 'is_dst']

_is_numeric_dtype = False
_infer_as_myclass = True
Expand Down Expand Up @@ -443,6 +443,36 @@ def tz(self, value):
raise AttributeError("Cannot directly set timezone. Use tz_localize() "
"or tz_convert() as appropriate")

def is_dst(self):
"""
Returns an Index of booleans indicating if each corresponding timestamp
is in daylight savings time.

If the DatetimeIndex does not have a timezone, returns an Index
who's values are all False.

Returns
-------
Index
True if the timestamp is in daylight savings time else False

Example
-------
>>> dti = pd.date_range('2018-11-04', periods=4, freq='H',
tz='US/Pacific')

>>> dti
DatetimeIndex(['2018-11-04 00:00:00-07:00',
'2018-11-04 01:00:00-07:00',
'2018-11-04 01:00:00-08:00',
'2018-11-04 02:00:00-08:00'],
dtype='datetime64[ns, US/Pacific]', freq='H')

>>> dti.is_dst()
Index([True, True, False, False], dtype='object')
"""
return Index(timezones.is_dst(self.asi8, self.tz))

@property
def size(self):
# TODO: Remove this when we have a DatetimeTZArray
Expand Down
16 changes: 16 additions & 0 deletions pandas/tests/indexes/datetimes/test_timezones.py
Original file line number Diff line number Diff line change
Expand Up @@ -1012,6 +1012,22 @@ def test_iteration_preserves_nanoseconds(self, tz):
for i, ts in enumerate(index):
assert ts == index[i]

@pytest.mark.parametrize('arg, expected_arg', [
[[], []],
[date_range('2018-11-04', periods=4, freq='H', tz='US/Pacific'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you try with a dateutil as well (I think we default to pytz)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pganssle Do dateutil timezones provide a way to access the possible daylight savings time shifts like pytz.timezone._transition_info does?

# I'm interested in the 2nd element of the tuples below
In [4]: tz._transition_info
Out[4]:
[(datetime.timedelta(-1, 58020), datetime.timedelta(0), 'LMT'),
 (datetime.timedelta(-1, 57600), datetime.timedelta(0), 'PST'),
 (datetime.timedelta(-1, 61200), datetime.timedelta(0, 3600), 'PDT'),
...

Context: I am trying to create a DatetimeIndex.is_dst method that indicates whether a particular timestamp is in daylight savings time. I know it's possible to do bool(datetime.dst()) on each timestamp of the DatetimeIndex, but I am curious if there's a more efficient way determine DST for a dateutil timezone than iterating.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke Yes and no. Yes in the sense that this is Python and obviously dateutil has to store that information somewhere. No in the sense that neither dateutil nor pytz exposes that information publicly. Note that _transition_info is a private member.

That said:

  1. I'm not entirely sure why you want to implement a "fast isdst" - I'm not sure I've ever heard of a case where it was useful to know whether or not a time had or did not have DST.
  2. I don't really see what this would buy you - the member you list looks like what you would get from utcoffset, dst and tzname, respectively. How do you expect to make use of this information?

[True, True, False, False]],
[date_range('2018-11-04', periods=4, freq='H'),
[False] * 4],
[date_range('2018-11-04', periods=4, freq='H', tz=pytz.FixedOffset(3)),
[False] * 4],
[[pd.NaT], [False]]
])
def test_is_dst(self, arg, expected_arg):
dti = DatetimeIndex(arg)
result = dti.is_dst()
expected = Index(expected_arg)
tm.assert_index_equal(result, expected)


class TestDateRange(object):
"""Tests for date_range with timezones"""
Expand Down
Loading