-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
implement _tz_convert_dst for de-duplication #21730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -608,9 +608,66 @@ cpdef inline datetime localize_pydatetime(datetime dt, object tz): | |
# ---------------------------------------------------------------------- | ||
# Timezone Conversion | ||
|
||
cdef inline int64_t[:] _tz_convert_dst(ndarray[int64_t] values, tzinfo tz, | ||
bint to_utc=True): | ||
""" | ||
tz_convert for non-UTC non-tzlocal cases where we have to check | ||
DST transitions pointwise. | ||
|
||
Parameters | ||
---------- | ||
values : ndarray[int64_t] | ||
tz : tzinfo | ||
to_utc : bool | ||
True if converting _to_ UTC, False if converting _from_ utc | ||
|
||
Returns | ||
------- | ||
result : ndarray[int64_t] | ||
""" | ||
cdef: | ||
Py_ssize_t n = len(values) | ||
Py_ssize_t i, j, pos | ||
ndarray[int64_t] result = np.empty(n, dtype=np.int64) | ||
ndarray[int64_t] tt, trans, deltas | ||
ndarray[Py_ssize_t] posn | ||
int64_t v | ||
|
||
trans, deltas, typ = get_dst_info(tz) | ||
if not to_utc: | ||
# We add `offset` below instead of subtracting it | ||
deltas = -1 * deltas | ||
|
||
tt = values[values != NPY_NAT] | ||
if not len(tt): | ||
# if all NaT, return all NaT | ||
return values | ||
|
||
posn = trans.searchsorted(tt, side='right') | ||
|
||
j = 0 | ||
for i in range(n): | ||
v = values[i] | ||
if v == NPY_NAT: | ||
result[i] = v | ||
else: | ||
pos = posn[j] - 1 | ||
j += 1 | ||
if pos < 0: | ||
raise ValueError('First time before start of DST info') | ||
result[i] = v - deltas[pos] | ||
|
||
return result | ||
|
||
|
||
cdef inline int64_t _tz_convert_tzlocal_utc(int64_t val, tzinfo tz, | ||
bint to_utc=True): | ||
""" | ||
Convert the i8 representation of a datetime from a tzlocal timezone to | ||
UTC, or vice-versa. | ||
|
||
Private, not intended for use outside of tslibs.conversion | ||
|
||
Parameters | ||
---------- | ||
val : int64_t | ||
|
@@ -672,6 +729,7 @@ cpdef int64_t tz_convert_single(int64_t val, object tz1, object tz2): | |
Py_ssize_t pos | ||
int64_t v, offset, utc_date | ||
pandas_datetimestruct dts | ||
ndarray[int64_t] arr # TODO: Is there a lighter-weight way to do this? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ATM that doesn't work because There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmm, ok to make that change here if you can There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Not an option yet. To do this we need to get rid of the ATM the options are either wrap in a 1-element array or keep duplicated code. |
||
|
||
# See GH#17734 We should always be converting either from UTC or to UTC | ||
assert (is_utc(tz1) or tz1 == 'UTC') or (is_utc(tz2) or tz2 == 'UTC') | ||
|
@@ -683,29 +741,23 @@ cpdef int64_t tz_convert_single(int64_t val, object tz1, object tz2): | |
if is_tzlocal(tz1): | ||
utc_date = _tz_convert_tzlocal_utc(val, tz1, to_utc=True) | ||
elif get_timezone(tz1) != 'UTC': | ||
trans, deltas, typ = get_dst_info(tz1) | ||
pos = trans.searchsorted(val, side='right') - 1 | ||
if pos < 0: | ||
raise ValueError('First time before start of DST info') | ||
offset = deltas[pos] | ||
utc_date = val - offset | ||
arr = np.array([val]) | ||
utc_date = _tz_convert_dst(arr, tz1, to_utc=True)[0] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Dispatching by wrapping in an I'm hopeful that there's a way to make the argument an |
||
else: | ||
utc_date = val | ||
|
||
if get_timezone(tz2) == 'UTC': | ||
return utc_date | ||
elif is_tzlocal(tz2): | ||
return _tz_convert_tzlocal_utc(utc_date, tz2, to_utc=False) | ||
|
||
# Convert UTC to other timezone | ||
trans, deltas, typ = get_dst_info(tz2) | ||
|
||
pos = trans.searchsorted(utc_date, side='right') - 1 | ||
if pos < 0: | ||
raise ValueError('First time before start of DST info') | ||
|
||
offset = deltas[pos] | ||
return utc_date + offset | ||
else: | ||
# Convert UTC to other timezone | ||
arr = np.array([utc_date]) | ||
# Note: at least with cython 0.28.3, doing a looking `[0]` in the next | ||
# line is sensitive to the declared return type of _tz_convert_dst; | ||
# if it is declared as returning ndarray[int64_t], a compile-time error | ||
# is raised. | ||
return _tz_convert_dst(arr, tz2, to_utc=False)[0] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you could do an assignment here that then specifies the type liek
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why does that help in this context? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. then the return type is well defined (was reading your comment) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ahh gotcha. Fortunately the version here where _tz_convert_dst returns a memoryview is preferred anyway. I could remove the comment, but it did take me a few rounds to figure out that was where the problem was coming from. |
||
|
||
|
||
@cython.boundscheck(False) | ||
|
@@ -746,70 +798,25 @@ def tz_convert(ndarray[int64_t] vals, object tz1, object tz2): | |
else: | ||
utc_dates[i] = _tz_convert_tzlocal_utc(v, tz1, to_utc=True) | ||
else: | ||
trans, deltas, typ = get_dst_info(tz1) | ||
|
||
# all-NaT | ||
tt = vals[vals != NPY_NAT] | ||
if not len(tt): | ||
return vals | ||
|
||
posn = trans.searchsorted(tt, side='right') | ||
j = 0 | ||
for i in range(n): | ||
v = vals[i] | ||
if v == NPY_NAT: | ||
utc_dates[i] = NPY_NAT | ||
else: | ||
pos = posn[j] - 1 | ||
j = j + 1 | ||
if pos < 0: | ||
raise ValueError('First time before start of DST info') | ||
offset = deltas[pos] | ||
utc_dates[i] = v - offset | ||
utc_dates = np.array(_tz_convert_dst(vals, tz1, to_utc=True)) | ||
else: | ||
utc_dates = vals | ||
|
||
if get_timezone(tz2) == 'UTC': | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The conditions are in a different order for the to_utc block above vs this from_utc block, but with |
||
return utc_dates | ||
|
||
result = np.zeros(n, dtype=np.int64) | ||
if is_tzlocal(tz2): | ||
elif is_tzlocal(tz2): | ||
result = np.zeros(n, dtype=np.int64) | ||
for i in range(n): | ||
v = utc_dates[i] | ||
if v == NPY_NAT: | ||
result[i] = NPY_NAT | ||
else: | ||
result[i] = _tz_convert_tzlocal_utc(v, tz2, to_utc=False) | ||
return result | ||
|
||
# Convert UTC to other timezone | ||
trans, deltas, typ = get_dst_info(tz2) | ||
|
||
# use first non-NaT element | ||
# if all-NaT, return all-NaT | ||
if (result == NPY_NAT).all(): | ||
return result | ||
|
||
# if all NaT, return all NaT | ||
tt = utc_dates[utc_dates != NPY_NAT] | ||
if not len(tt): | ||
return utc_dates | ||
|
||
posn = trans.searchsorted(tt, side='right') | ||
|
||
j = 0 | ||
for i in range(n): | ||
v = utc_dates[i] | ||
if vals[i] == NPY_NAT: | ||
result[i] = vals[i] | ||
else: | ||
pos = posn[j] - 1 | ||
j = j + 1 | ||
if pos < 0: | ||
raise ValueError('First time before start of DST info') | ||
offset = deltas[pos] | ||
result[i] = v + offset | ||
return result | ||
else: | ||
# Convert UTC to other timezone | ||
return np.array(_tz_convert_dst(utc_dates, tz2, to_utc=False)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have to explicitly wrap in |
||
|
||
|
||
# TODO: cdef scalar version to call from convert_str_to_tsobject | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other places where
get_dst_info
is used don't do thistt
thing._normalize_local
,period.localize_dt64_arr_to_period
,resolution._reso_local
all do:is_date_array_normalized
andtslib.ints_to_pydatetime
makessearchsorted
calls pointwiseI'm checking to see if anything breaks if we change it here to match the others.
Perf Considerations:
tt
thing here presumably improves perf in cases with manyNaT
svalues != NPY_NAT
check we could passvalues
as anint64_t[:]
which I think would be more performant.cnp.PY_ArraySearchSorted
that could be used to avoid reaching into python-space, but apparently there is a bug in cython's numpy.pxd that makes this non-trivial (https://stackoverflow.com/questions/28184211/calling-pyarray-searchsorted-from-cython-3-or-4-arguments?rq=1)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why? NPY_NAT is i8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why you can search an array of timestamps to searchsorted here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT
int64_t[:]
doesn't support__eq__
. You can basically iterate through it and little else.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the question.
trans.searchsorted(x)
assumes (doesn't check) thattrans
is sorted, doesn't care aboutx
(which can be array or scalar)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no what i mean is that why can you do
trans.searchsorted(array_of_time_statmps)
rather than iteratiing?