-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REF: Localizer class to de-duplicate tzconversion code #46397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
can u make just a function to avoid repeating w/o creating the localizer obj? |
similar perf hit |
Any insights in checking the generated C code before/after? My best guess is that Cython may be generating extra checks for class attribute access beyond what normal struct member access would do |
This is the generated C-code for the old version:
and this is the generated C-code for the new version:
Nothing that would catch my eye... Do we know which of the branches is responsible for the slowdown or do all of them become slower? |
I don't see any clear pattern, no. |
Yea that makes sense to me. Unless it gets optimized away (compiler will warn for this as unused variable) |
I don't suppose that was in reference to my "compiler takes the checks outside the loop hypothesis?" OK, I tried just creating Localizer but never actually using it. Recorded samples over 4 runs to try to de-noise the whole thing.
We should be seeing pure O(1) overhead, so in this exercise the most-affected cases should be the one with small N, which is almost exactly what we're seeing. We don't see any changes of more than 10 microseocnds until the last 4 entries, and of those 3 are clearly-just-noise speedups. Disregarding those last 4, the mean slowdown was 1394ns. |
One last approach I can think of would be to scrap the class-based approach and go purely functional, with something taking pointer arguments to the Struct based approach looks pretty hard given the amount of Python objects that are currently in the mix, including the return value of |
I've thought of this w/r/t get_dst_info and think we may get some improvement by returning a struct instead of a string for the 3rd return value (since we later do e.g. Aside from trial/error and "just look at the generated code", are there any profiling tools that would help identify the problem(s)? Other alternatives that come to mind:
|
so the question here is sure some of these microbenchmarks are slower, but can you show the effect of a real operation (e.g. localizing a sizable DTI). what kind of perf is lost there? |
localizing doesn't use the path(s) in this PR yet. Observation someone may be able to make something of. The most-affected benchmark (from a few minutes ago) was With this change, the formerly-worst benchmark is now at 1.09x, and the new worst one is 536ns vs 592ns. Dies this produce an "a-ha!" moment for anyone on the thread? |
Marking as ready for review, the perf impact is now an improvement for big-array cases and a sub-microsecond hit for the super-small cases:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think this is a good consolidation of logic despite the small perf hit.
Also this might set up for abstracting library dependent tz offset implementations?
Not sure what you have in mind here. |
I remember a some feature discussion a while back about user's being able to provide their own timezone offset implementations. This
in the future? |
So the two main possibilities for next steps either look like a) what you wrote with pytz/dateutil/zoneinfo/fallback-specific implementations or b) GH#46511. My initial intuition was that the subclass-specific route would be more performant, but the last time I experimented with that it didn't pan out that way. It'll be worth re-checking.
if you mean support for arbitrary tzinfo subclasses, then absolutely. That actually becomes really easy as a follow-up to #46425, as the general case will end up going through the tzlocal/zoneinfo path. If you mean allowing users to implement Localizer subclasses, that sounds like a pretty big hassle. |
Cool. Yeah having user implementing |
very nice @jbrockmendel |
Second attempt at #46246. Perf impact isn't as bad, but is still ugly, see asvs posted below.
The lines:
are repeated 5 times in this file, once more with small edits in tzconversion.pyx, and potentially a couple others. So I'd really like to turn them into either a Localizer method or just a module-level function. Doing so causes even bigger performance hits.
Also these lines are liable to change as we add zoneinfo and non-nano support, so making them a function would be helpful for keeping them in sync.
The only idea I have for troubleshooting perf is to move the
info.use_foo
checks (or the info.delta and info.deltas lookups) outside of the loop, but that doesn't seem to make any difference.Asking for help from the big guns cc @jreback @WillAyd @realead @da-woods