-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: Vectorized Timedelta property access (#18092) #18225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
can u post the asv results? |
See below. I was not able to consistently reproduce the one increase for
|
9003258
to
1be6911
Compare
your new asv did not appear to run |
you don't need to post the entire run, just the changed benches. |
Codecov Report
@@ Coverage Diff @@
## master #18225 +/- ##
==========================================
- Coverage 91.42% 91.38% -0.05%
==========================================
Files 163 163
Lines 50068 50065 -3
==========================================
- Hits 45776 45753 -23
- Misses 4292 4312 +20
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #18225 +/- ##
=========================================
Coverage ? 91.38%
=========================================
Files ? 163
Lines ? 50065
Branches ? 0
=========================================
Hits ? 45753
Misses ? 4312
Partials ? 0
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty good. some comments.
asv_bench/benchmarks/timedelta.py
Outdated
|
||
def timedelta_seconds(self): | ||
self.td.seconds | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these need to have time_ leading the name
doc/source/whatsnew/v0.22.0.txt
Outdated
@@ -70,7 +70,7 @@ Performance Improvements | |||
~~~~~~~~~~~~~~~~~~~~~~~~ | |||
|
|||
- Indexers on Series or DataFrame no longer create a reference cycle (:issue:`17956`) | |||
- | |||
- Vectorized Timedelta property access (:issue:`18092`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be a bit more informative here
pandas/core/indexes/timedeltas.py
Outdated
@@ -39,16 +39,10 @@ | |||
|
|||
def _field_accessor(name, alias, docstring=None): | |||
def f(self): | |||
values = self.asi8 | |||
result = libts.get_timedelta_field(values, alias) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import this from tslibs.timedeltas directly
pandas/_libs/tslib.pyx
Outdated
@@ -92,7 +92,7 @@ from tslibs.timezones cimport ( | |||
get_dst_info) | |||
from tslibs.fields import ( | |||
get_date_name_field, get_start_end_field, get_date_field, | |||
build_field_sarray) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need this import, see below
Changes made. See below for sample benchmark run. As far as the increase, on some runs microseconds failed, while other runs nanoseconds failed. There was always one or the other. I think it has to do with some overhead in moving the computations out of timedeltas.pyx into np_datetimes.c but will report back with what I find
|
This is nice. Is the datetime.c bit the same as the numpy version? Can any of it be |
35cb876
to
fb543ec
Compare
I didn't see any method for converting a timedelta to a timedeltastruct in numpy's datetime.c FWIW I noticed some of the Timedelta properties defined in timedelta.pyx (namely days, seconds and microseconds) are superfluous as they already defined in Cython's source. Do you think it's worth removing them from pandas as part of this? |
this looks good, just rebase on master.
I think you could do this (in a followup). The reason to leave them is that we have a consistent doc-string, and then future readers are not wondering where the 'missing' methods are. It might be a slight perf boost (though not as much as @jbrockmendel got with microseconds on Timestamp). So if you want to attack that would be great. (but in a followup). |
@jreback rebased |
thanks @WillAyd very nice. keep em coming! |
np_datetime.c has a |
I see it dating back to the start of 2012 when I believe the file was created in 098ce73 |
closes #18092
git diff upstream/master -u -- "*.py" | flake8 --diff