-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH/BUG: Add is_dst method to DatetimeIndex and Timestamp to solve AmbiguousTimeError #22560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/_libs/tslibs/nattype.pyx
Outdated
@@ -353,7 +353,6 @@ class NaTType(_NaT): | |||
strptime = _make_error_func('strptime', datetime) | |||
strftime = _make_error_func('strftime', datetime) | |||
isocalendar = _make_error_func('isocalendar', datetime) | |||
dst = _make_error_func('dst', datetime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why removing?
pandas/_libs/tslibs/timezones.pyx
Outdated
Py_ssize_t n = len(values) | ||
# Cython boolean memoryviews are not supported yet | ||
# https://github.com/cython/cython/issues/2204 | ||
# bint[:] result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the workaround is to use a uint8
pandas/core/indexes/datetimes.py
Outdated
>>> dti.is_dst() | ||
Index([True, True, False, False], dtype='object') | ||
""" | ||
return Index(timezones._is_dst(self.asi8, self.tz)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make is_dst public?
Is this related to the “fold” arg added in py3? |
Yup fairly similar. The These |
Codecov Report
@@ Coverage Diff @@
## master #22560 +/- ##
=========================================
Coverage ? 92.04%
=========================================
Files ? 169
Lines ? 50789
Branches ? 0
=========================================
Hits ? 46747
Misses ? 4042
Partials ? 0
Continue to review full report at Codecov.
|
Ready for another look @jbrockmendel when you get the chance. |
pandas/_libs/tslibs/timezones.pyx
Outdated
@@ -186,16 +187,28 @@ cdef object get_utc_trans_times_from_dateutil_tz(object tz): | |||
return new_trans | |||
|
|||
|
|||
cdef int64_t[:] unbox_utcoffsets(object transinfo): | |||
cdef int64_t[:] unbox_utcoffsets(object transinfo, dst): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add bint to type dst
pandas/_libs/tslibs/timezones.pyx
Outdated
|
||
sz = len(transinfo) | ||
arr = np.empty(sz, dtype='i8') | ||
|
||
key = int(dst) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you may not need this if typing bint
pandas/_libs/tslibs/timezones.pyx
Outdated
|
||
sz = len(transinfo) | ||
arr = np.empty(sz, dtype='i8') | ||
|
||
key = int(dst) | ||
for i in range(sz): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment about what you are extracting
pandas/_libs/tslibs/timezones.pyx
Outdated
---------- | ||
tz : object | ||
timezone | ||
dst : bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type with bitn
pandas/_libs/tslibs/timezones.pyx
Outdated
Parameters | ||
---------- | ||
tz : object | ||
timezone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we accept only timezone objects and not strings here?
pandas/_libs/tslibs/timezones.pyx
Outdated
return result | ||
transitions, offsets, typ = get_dst_info(tz, True) | ||
offsets = np.array(offsets) | ||
# Fixed timezone offsets do not have DST transitions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new line here
pandas/_libs/tslibs/timezones.pyx
Outdated
if typ not in {'pytz', 'dateutil'}: | ||
return result | ||
positions = transitions.searchsorted(values, side='right') - 1 | ||
# DST has nonzero offset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
@@ -1012,6 +1012,22 @@ def test_iteration_preserves_nanoseconds(self, tz): | |||
for i, ts in enumerate(index): | |||
assert ts == index[i] | |||
|
|||
@pytest.mark.parametrize('arg, expected_arg', [ | |||
[[], []], | |||
[date_range('2018-11-04', periods=4, freq='H', tz='US/Pacific'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you try with a dateutil as well (I think we default to pytz)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pganssle Do dateutil timezones provide a way to access the possible daylight savings time shifts like pytz.timezone._transition_info
does?
# I'm interested in the 2nd element of the tuples below
In [4]: tz._transition_info
Out[4]:
[(datetime.timedelta(-1, 58020), datetime.timedelta(0), 'LMT'),
(datetime.timedelta(-1, 57600), datetime.timedelta(0), 'PST'),
(datetime.timedelta(-1, 61200), datetime.timedelta(0, 3600), 'PDT'),
...
Context: I am trying to create a DatetimeIndex.is_dst
method that indicates whether a particular timestamp is in daylight savings time. I know it's possible to do bool(datetime.dst())
on each timestamp of the DatetimeIndex
, but I am curious if there's a more efficient way determine DST for a dateutil timezone than iterating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mroeschke Yes and no. Yes in the sense that this is Python and obviously dateutil
has to store that information somewhere. No in the sense that neither dateutil
nor pytz
exposes that information publicly. Note that _transition_info
is a private member.
That said:
- I'm not entirely sure why you want to implement a "fast isdst" - I'm not sure I've ever heard of a case where it was useful to know whether or not a time had or did not have DST.
- I don't really see what this would buy you - the member you list looks like what you would get from
utcoffset
,dst
andtzname
, respectively. How do you expect to make use of this information?
assert ts_naive.is_dst() is False | ||
|
||
ts_aware = ts_naive.tz_localize('US/Pacific') | ||
assert ts_aware.is_dst() is True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you test dateutil
Hello @mroeschke! Thanks for updating the PR.
|
I would recommend against adding an In general DST/not DST is only a proxy for the real information you want in all cases I can think of - the offset, the tzname or which side of an ambiguity you are on. All of those can be independently queried and/or specified without mentioning DST. |
I'm not clear on what problem this is solving. Can you give an example of the internal routines that produce ambiguous times? @pganssle and I don't always see eye-to-eye on how to treat public vs private attributes, but in general if he's got a strong opinion, it is worth listening to. |
I also wanted to mention that we deliberately added an |
When rounding a tz-aware date time to a particular frequency, essentially the steps are:
You can't necessarily convert to UTC first because it can affect the rounding (unless I am missing something). So in #18946, a date that sits on the DST border
I guess #18946 can be solved by added an Using DST information may be a too hand-wavy solution to solve both #18946 and #18885. At minimum, I had also though having a |
@mroeschke I'm guessing that rounding issues in UTC is mainly at the day-or-higher level, right? Since rounding to the nearest minute or hour will cause no problems. I think the correct solution here is to introduce the I see a few good ways forward:
In all cases, I think I am very strongly hoping that within a year I'll be able to get a fast, compiled Rust or C backend for many of |
I think there may be an implicit assumption in here that should be made explicit. Does the user want the rounding to be based on wall time or actual time? (My intuition is actual time)
If I were rounding Is this intuition not applicable to the case you're considering? |
@jbrockmendel I think for time scales of hours or less it's probably actual time, but time scale of days they almost certainly want wall time. Imagine rounding to midnight UTC, then going back to PST/PDT. No one wants to round to 17:00 sometimes and 16:00 others. To be honest it's probably always wall time, but on sufficiently small time scales the two are equivalent. |
I agree with @pganssle and would imagine users would want to round based on wall time in general. Thanks for the feedback guys. I'm convinced I need to surface a |
git diff upstream/master -u -- "*.py" | flake8 --diff
The issues above required a new method to keep track if each timestamp in the
DatetimeIndex
was previously in daylight savings time in order to pass intotz_localize(..., ambiguous=...)
Therefore, added a
is_dst
method toDatetimeIndex
andTimestamp
.