-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: optimize NaT lookups in cython modules #24008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @jbrockmendel! Thanks for submitting the PR.
|
Also moved some NaTType properties from the python class to the cython class, as those are supposed to be marginally more efficient |
Travis failures are pickle plotting, not clearly related. |
Codecov Report
@@ Coverage Diff @@
## master #24008 +/- ##
==========================================
- Coverage 42.46% 42.46% -0.01%
==========================================
Files 161 161
Lines 51557 51554 -3
==========================================
- Hits 21892 21890 -2
+ Misses 29665 29664 -1
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #24008 +/- ##
=========================================
Coverage ? 42.44%
=========================================
Files ? 161
Lines ? 51559
Branches ? 0
=========================================
Hits ? 21886
Misses ? 29673
Partials ? 0
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the introduction of NAT
extra-confusing. What actually does this buy?
A ton of dict lookups. Consider
In the PR the C code is:
In the status quo it has to look up "NaT" in the module-level namespace dict (and +/- refcount). In the PR we avoid that dict lookup. It adds up because it is a dict lookup we do a lot. |
this adds IMHO a huge amount of mental overhead. I would either call this |
Agreed that would be ideal, but AFAICT cython won't allow
Can you elaborate on what you have in mind with "cdef accessor function"? I'm broadly indifferent between NAT vs c_NaT vs _NaT in terms of what we call the cdef'd version within |
I meant this, but this is not going to help the problem.
how about if you rename what you are calling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments remain, otherwise lgtm.
cdef readonly: | ||
int64_t value | ||
object freq | ||
# cdef readonly: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or do we do this elsewhere to remind of the attributes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the pattern we used for _TSObject
lgtm. do we have sufficient asv's for this? |
The %timeit results in the OP are all that's available ATM. I'm poking at an idea that would make it easier to identify what asvs to run for any particular commit/PR, will see how that pans out. |
thanks! |
By making
NaT
a cdef'd object that we can cimport, we take a module-level lookup out of each check ofif obj is NaT
. Since we tend to do this check a lot, avoiding these global lookups can get us some mileage:master:
PR