-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ASV/CLN: cleanup period benchmarks #18275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ASV/CLN: cleanup period benchmarks #18275
Conversation
Codecov Report
@@ Coverage Diff @@
## master #18275 +/- ##
==========================================
- Coverage 91.4% 91.36% -0.05%
==========================================
Files 164 164
Lines 49878 49878
==========================================
- Hits 45590 45569 -21
- Misses 4288 4309 +21
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #18275 +/- ##
==========================================
- Coverage 91.35% 91.34% -0.02%
==========================================
Files 163 163
Lines 49714 49691 -23
==========================================
- Hits 45415 45388 -27
- Misses 4299 4303 +4
Continue to review full report at Codecov.
|
No, but you can add |
asv_bench/benchmarks/period.py
Outdated
|
||
def time_quarter(self): | ||
self.per.quarter | ||
params = ['M', 'min'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this have to be double brackets ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in my admittedly-limited experience. Though the only examples that use both params
and param_names
do have 2+ param_names, so do have double double-brackets for params
.
asv_bench/benchmarks/period.py
Outdated
df['col'] = rng | ||
|
||
|
||
class PeriodIndexAlgorithms(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could also param this one (I would just call it Algorithms, the 'period' is already in the file name) for ['series', 'index']
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that but decided against it because the two cases treat the data differently, would need a if cls is Series: ...
in the setup method. But heck, let's go for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather do a data = ...; if typ == 'series': data = pd.Series(data)
asv_bench/benchmarks/period.py
Outdated
self.series.value_counts() | ||
|
||
|
||
class PeriodStandardIndexing(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just do 'Indexing' ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name used to be period_standard_indexing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and the one in your PR is certainly an improvement (since it follows PEP8), but I think the 'period' and 'standard' are just redundant (unless there is another class with non-standard indexing)
asv_bench/benchmarks/period.py
Outdated
@@ -101,7 +28,7 @@ def time_second(self): | |||
self.per.second | |||
|
|||
def time_is_leap_year(self): | |||
self.per.is_leap_year | |||
self.per.is_leapyear |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think is_leap_year
was actually correct ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, you're right. One of the two existing benchmarks used each and I guessed wrong.
asv_bench/benchmarks/period.py
Outdated
class Algorithms(object): | ||
goal_time = 0.2 | ||
|
||
params = [PeriodIndex, Series] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this give in the asv run output as name for the benchmark? (just 'PeriodIndex' or the full path? in the last case I would just do ['index', 'series']
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this give in the asv run output as name for the benchmark?
I'm not sure. Unfortunately I'm not aware of a quick-feedback way of running these.
Going to put a pin in this PR for a bit, much rather focus on offsets where current PRs need to get resolved before I can do actual bugfixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check 'asv dev', you can select only this file and specify a quick run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quickly checked this (with asv dev -b period
and it used the full path, so please use ['index', 'series']
as the params
lgtm. ping when you have updated the comments to @jorisvandenbossche quesiton. |
asv_bench/benchmarks/period.py
Outdated
self.rng = date_range('1985', periods=1000) | ||
self.rng2 = date_range('1985', periods=1000).to_pydatetime() | ||
|
||
def time_from_date_range(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you also need to pass the param in the actual time_
functions as well. So time_from_date_range(self, freq)
in this case (and also for all other benchmarks)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this in addition to or instead of setting self.freq=freq
in setup
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that doesn't matter, it is just needed to let the code actually run.
But since you have to do that, the self.freq = freq
is a bit superfluous as you can also do PeriodIndex(.. freq=freq)
. So to conclude: instead
can you update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please run the benchmarks (eg asv dev -b period
does not take a long time) to make sure they at least work.
asv_bench/benchmarks/period.py
Outdated
def setup(self): | ||
self.per = Period('2017-09-06 08:28', freq='min') | ||
def time_now(self, freq): | ||
self.per.now() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is failing (needs a freq)
asv_bench/benchmarks/period.py
Outdated
def time_to_timestamp(): | ||
self.per.to_timestamp() | ||
if typ == 'index': | ||
self.vector = PeriodIndex(data, freq='M') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do here data * 1000
as well like you did for series (to have more data)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to make that change, but the *1000 for series was there when I got here. Im implicitly assuming someone had a reason for doing it that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there was a reason
Fix one benchmark that wasnt getting run because of a name typo
Remove redundant Period benchmarks.
Cleanup a few benchmarks using
params
.#noci
(is that how that works?)