-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: Make DateOffset (partially) immutable, cache _params for __eq__ speedup #17137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #17137 +/- ##
==========================================
- Coverage 91.03% 90.95% -0.08%
==========================================
Files 161 161
Lines 49405 49411 +6
==========================================
- Hits 44974 44943 -31
- Misses 4431 4468 +37
Continue to review full report at Codecov.
|
|
||
# default for prior pickles | ||
normalize = False | ||
|
||
def __setattr__(self, key, value): | ||
if key in ['n', '_offset'] and hasattr(self, key): | ||
raise TypeError('%s is intended to be immutable; "%s" ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a test to hit this TypeError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually. First there are some lingering corner cases in which the current implementation can be incorrect.
see if any asv's are needed for this (I don't think so actually), and show a representative speedup (not the profile), just a before / after timeit. if it looks good, pls add a whatsnew in perf. |
Will do. There are some cases I want to take a closer look at before calling this ready; parking some timeit results here in the meantime:
Old Results
New Results
|
There may be a problem. A lot of cross-design appears to have built up over the years including at least two approaches to caching. It looks like the original author had caching in mind, but it got dropped here. There is a AFAICT the idea in The motivation is that
(There is a TODO note further down suggesting this should be changed:
) Making I'm still not sure if this approach would play nicely with the |
closing in favor of #17181 |
Any time two
pd.Period
objects are compared, it starts off by checkingself.freq != other.freq
. Thesefreq
attributes areDateOffset
objects defined intseries.offsets
.DateOffset.__ne__
callsself._params()
, which is the bottleneck in a lot of Period-based work.Results under the status quo:
Results under this PR:
Note that in this example the periods were already ordered, so the number of comparisons (20013) is pretty much best-case. After shuffling them (sample size=1) there were 35181 comparisons.
There are some things I'd like to touch up to move towards actual-immutability, but the caching here is a big win.