-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: typing and cdefs for tslibs.resolution #21452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/_libs/tslibs/resolution.pyx
Outdated
return us % mult == 0 | ||
|
||
|
||
def _maybe_add_count(base, count): | ||
cdef inline str _maybe_add_count(str base, count): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a int64_t type for count
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ill double-check, but I'm pretty sure past-me had a reason for leaving that one out. Maybe there are cases when None
is passed
pandas/_libs/tslibs/resolution.pyx
Outdated
""" | ||
Not sure if I can avoid the state machine here | ||
""" | ||
cdef public: | ||
index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need types here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really viable since we can't specify Index
as a type. We just have to make the name public for wheels to turn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just specify them as object to make this explict
bint calendar_start = True | ||
bint business_start = True | ||
bint cal | ||
int32_t[:] years |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are memoryviews, right? Any reason you are choosing these instead of the ndarray
declarations throughout the rest of the code base?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that memoryviews are encouraged because they are lighter weight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel : That is indeed correct. I agree with this design choice.
Might be something to look into for other parts of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, one of the main problems with memoryviews is with read-only arrays (and for that reason we do/should not use them solely in certain algos), but I don't think that's relevant in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can make cdefs readonly when it makes sense FYI
pandas/_libs/tslibs/resolution.pyx
Outdated
return us % mult == 0 | ||
|
||
|
||
def _maybe_add_count(base, count): | ||
cdef inline str _maybe_add_count(str base, count): | ||
if count != 1: | ||
return '{count}{base}'.format(count=int(count), base=base) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If above comment is correct shouldn't need the int cast here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you're right. Just pushed a commit that added the suggested typing and removed the int(
. (and hopefully fixed the test errors)
Codecov Report
@@ Coverage Diff @@
## master #21452 +/- ##
==========================================
+ Coverage 91.89% 91.9% +<.01%
==========================================
Files 153 153
Lines 49600 49606 +6
==========================================
+ Hits 45580 45589 +9
+ Misses 4020 4017 -3
Continue to review full report at Codecov.
|
@jbrockmendel : I feel like the performance help is relatively obvious given that you are moving things underneath the Python level. Is there any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments. just run asv's to confirm no regressions. of course if improvements then shout it out!
ping on green after updating for comments.
pandas/_libs/tslibs/resolution.pyx
Outdated
""" | ||
Not sure if I can avoid the state machine here | ||
""" | ||
cdef public: | ||
index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just specify them as object to make this explict
bint calendar_start = True | ||
bint business_start = True | ||
bint cal | ||
int32_t[:] years |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can make cdefs readonly when it makes sense FYI
Status quo results for PR: Note that in 0.23.0 it looks like these come back closer to 832. No idea what changed in the interim. |
thanks! |
Moving things lower-level will help improve performance (due in part to better Cython compilation).