-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
cache DateOffset attrs now that they are immutable #21582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #21582 +/- ##
==========================================
- Coverage 91.9% 91.9% -0.01%
==========================================
Files 153 153
Lines 49549 49547 -2
==========================================
- Hits 45539 45537 -2
Misses 4010 4010
Continue to review full report at Codecov.
|
Looks like the follow-up (short-circuit a bunch of isinstance calls, lose |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a whatsnew note
do we have sufficient asvs to specifically test this? if not can you add
Done. |
thanks! |
TL;DR ~6x speedup in
set_index
forPeriodIndex
-like column.Alright! Now that DateOffset objects are immutable (#21341), we can can start caching stuff. This was pretty much the original motivation that brought me here, so I'm pretty psyched to finally make this happen.
The motivating super-slow operation is
df.set_index
. Profiling before/after with:Total Runtime Before: 32.708 seconds
Total Runtime After: 5.340 seconds
pstats output (truncated) before:
pstats output (truncated) after:
The
_params
calls that make up half of the runtime in the before version doesn't even make the cut for the pstats output in the after version.There is some more tweaking around the edges we can do for perf, but this is the big one. (Also another big one when columns can have PeriodDtype).
git diff upstream/master -u -- "*.py" | flake8 --diff