-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
implement _tz_convert_dst for de-duplication #21730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
offset = deltas[pos] | ||
utc_date = val - offset | ||
arr = np.array([val]) | ||
utc_date = _tz_convert_dst(arr, tz1, to_utc=True)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dispatching by wrapping in an ndarray
is not my favorite, but I consider the de-duplication worth it.
I'm hopeful that there's a way to make the argument an int64_t[:]
which should have less overhead, but that is not yet an option (see comments above on _tz_convert_dst
)
raise ValueError('First time before start of DST info') | ||
offset = deltas[pos] | ||
utc_dates[i] = v - offset | ||
utc_dates = np.array(_tz_convert_dst(vals, tz1, to_utc=True)) | ||
else: | ||
utc_dates = vals | ||
|
||
if get_timezone(tz2) == 'UTC': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conditions are in a different order for the to_utc block above vs this from_utc block, but with _tz_convert_dst
and _tz_convert_tzlocal_utc
, the blocks are now identical up to an argument change to_utc
True/False. After this PR, both tz_convert and tz_convert_single can be cut in half.
return result | ||
else: | ||
# Convert UTC to other timezone | ||
return np.array(_tz_convert_dst(utc_dates, tz2, to_utc=False)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to explicitly wrap in np.array
bc _tz_convert_dst
returns an int64_t[:]
.
@@ -672,6 +729,7 @@ cpdef int64_t tz_convert_single(int64_t val, object tz1, object tz2): | |||
Py_ssize_t pos | |||
int64_t v, offset, utc_date | |||
pandas_datetimestruct dts | |||
ndarray[int64_t] arr # TODO: Is there a lighter-weight way to do this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int64_t[:] arr
is a memory view, should work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ATM that doesn't work because _tz_convert_dst
takes an ndarray[int64_t]
and not a memoryview. I'm hopeful that down the road that can be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, ok to make that change here if you can
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, ok to make that change here if you can
Not an option yet. To do this we need to get rid of the values != NPY_NAT
check in _tz_convert_dst
, which entails a non-trivial redesign of that function. Need to a) make sure it doesn't break things and b) check perf.
ATM the options are either wrap in a 1-element array or keep duplicated code.
# line is sensitive to the declared return type of _tz_convert_dst; | ||
# if it is declared as returning ndarray[int64_t], a compile-time error | ||
# is raised. | ||
return _tz_convert_dst(arr, tz2, to_utc=False)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could do an assignment here that then specifies the type
liek
arr = _tz_convert.....
cdef int64_t result = arr[0]
return result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does that help in this context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then the return type is well defined (was reading your comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh gotcha. Fortunately the version here where _tz_convert_dst returns a memoryview is preferred anyway. I could remove the comment, but it did take me a few rounds to figure out that was where the problem was coming from.
# if all NaT, return all NaT | ||
return values | ||
|
||
posn = trans.searchsorted(tt, side='right') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other places where get_dst_info
is used don't do this tt
thing.
_normalize_local
, period.localize_dt64_arr_to_period
, resolution._reso_local
all do:
trans, deltas, typ = get_dst_info(tz)
_pos = trans.searchsorted(stamps, side='right') - 1
if _pos.dtype != np.int64:
_pos = _pos.astype(np.int64)
pos = _pos
is_date_array_normalized
and tslib.ints_to_pydatetime
makes searchsorted
calls pointwise
trans, deltas, typ = get_dst_info(tz)
for i in range(n):
pos = trans.searchsorted(stamps[i]) - 1
I'm checking to see if anything breaks if we change it here to match the others.
Perf Considerations:
- The
tt
thing here presumably improves perf in cases with manyNaT
s - Without the
values != NPY_NAT
check we could passvalues
as anint64_t[:]
which I think would be more performant. - There is a
cnp.PY_ArraySearchSorted
that could be used to avoid reaching into python-space, but apparently there is a bug in cython's numpy.pxd that makes this non-trivial (https://stackoverflow.com/questions/28184211/calling-pyarray-searchsorted-from-cython-3-or-4-arguments?rq=1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without the values != NPY_NAT check we could pass values as an int64_t[:] which I think would be more performant.
why? NPY_NAT is i8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why you can search an array of timestamps to searchsorted here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why? NPY_NAT is i8
AFAICT int64_t[:]
doesn't support __eq__
. You can basically iterate through it and little else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why you can search an array of timestamps to searchsorted here?
I don't understand the question. trans.searchsorted(x)
assumes (doesn't check) that trans
is sorted, doesn't care about x
(which can be array or scalar)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no what i mean is that why can you do trans.searchsorted(array_of_time_statmps)
rather than iteratiing?
Codecov Report
@@ Coverage Diff @@
## master #21730 +/- ##
=======================================
Coverage 91.92% 91.92%
=======================================
Files 158 158
Lines 49705 49705
=======================================
Hits 45693 45693
Misses 4012 4012
Continue to review full report at Codecov.
|
Analogous to #21727, this handles the
get_dst_info
case whereas that handled thetzlocal
case.Fleshes out docstring as discussed in #21727
All the other uses of
get_dst_info
are handled slightly differently. Need to figure out whether those are intentional to see if more de-duplication (or perf) is available.