Skip to content

ASV/CLN: cleanup period benchmarks #18275

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Nov 23, 2017

Conversation

jbrockmendel
Copy link
Member

Fix one benchmark that wasnt getting run because of a name typo

Remove redundant Period benchmarks.

Cleanup a few benchmarks using params.

#noci
(is that how that works?)

@codecov
Copy link

codecov bot commented Nov 14, 2017

Codecov Report

Merging #18275 into master will decrease coverage by 0.04%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18275      +/-   ##
==========================================
- Coverage    91.4%   91.36%   -0.05%     
==========================================
  Files         164      164              
  Lines       49878    49878              
==========================================
- Hits        45590    45569      -21     
- Misses       4288     4309      +21
Flag Coverage Δ
#multiple 89.16% <ø> (-0.03%) ⬇️
#single 39.41% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/plotting/_converter.py 63.44% <0%> (-1.82%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 69472f9...b6551f7. Read the comment docs.

@codecov
Copy link

codecov bot commented Nov 14, 2017

Codecov Report

Merging #18275 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18275      +/-   ##
==========================================
- Coverage   91.35%   91.34%   -0.02%     
==========================================
  Files         163      163              
  Lines       49714    49691      -23     
==========================================
- Hits        45415    45388      -27     
- Misses       4299     4303       +4
Flag Coverage Δ
#multiple 89.14% <ø> (ø) ⬆️
#single 39.67% <ø> (-0.03%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️
pandas/io/sql.py 94.79% <0%> (-0.01%) ⬇️
pandas/core/reshape/reshape.py 100% <0%> (ø) ⬆️
pandas/core/indexes/datetimes.py 95.49% <0%> (ø) ⬆️
pandas/io/formats/format.py 96.01% <0%> (ø) ⬆️
pandas/core/indexes/multi.py 96.41% <0%> (ø) ⬆️
pandas/core/dtypes/missing.py 90.74% <0%> (+0.11%) ⬆️
pandas/tseries/offsets.py 97% <0%> (+0.3%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 103ea6f...27d75c3. Read the comment docs.

@jorisvandenbossche jorisvandenbossche added the Benchmark Performance (ASV) benchmarks label Nov 14, 2017
@jorisvandenbossche
Copy link
Member

#noci (is that how that works?)

No, but you can add [ci skip] in the commit message (https://docs.travis-ci.com/user/customizing-the-build/#Skipping-a-build)


def time_quarter(self):
self.per.quarter
params = ['M', 'min']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this have to be double brackets ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in my admittedly-limited experience. Though the only examples that use both params and param_names do have 2+ param_names, so do have double double-brackets for params.

df['col'] = rng


class PeriodIndexAlgorithms(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could also param this one (I would just call it Algorithms, the 'period' is already in the file name) for ['series', 'index']

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that but decided against it because the two cases treat the data differently, would need a if cls is Series: ... in the setup method. But heck, let's go for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather do a data = ...; if typ == 'series': data = pd.Series(data)

self.series.value_counts()


class PeriodStandardIndexing(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just do 'Indexing' ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name used to be period_standard_indexing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and the one in your PR is certainly an improvement (since it follows PEP8), but I think the 'period' and 'standard' are just redundant (unless there is another class with non-standard indexing)

@@ -101,7 +28,7 @@ def time_second(self):
self.per.second

def time_is_leap_year(self):
self.per.is_leap_year
self.per.is_leapyear
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is_leap_year was actually correct ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, you're right. One of the two existing benchmarks used each and I guessed wrong.

class Algorithms(object):
goal_time = 0.2

params = [PeriodIndex, Series]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this give in the asv run output as name for the benchmark? (just 'PeriodIndex' or the full path? in the last case I would just do ['index', 'series'])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this give in the asv run output as name for the benchmark?

I'm not sure. Unfortunately I'm not aware of a quick-feedback way of running these.

Going to put a pin in this PR for a bit, much rather focus on offsets where current PRs need to get resolved before I can do actual bugfixes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check 'asv dev', you can select only this file and specify a quick run

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quickly checked this (with asv dev -b period and it used the full path, so please use ['index', 'series'] as the params

@jreback jreback added this to the 0.22.0 milestone Nov 15, 2017
@jreback
Copy link
Contributor

jreback commented Nov 15, 2017

lgtm. ping when you have updated the comments to @jorisvandenbossche quesiton.

self.rng = date_range('1985', periods=1000)
self.rng2 = date_range('1985', periods=1000).to_pydatetime()

def time_from_date_range(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you also need to pass the param in the actual time_ functions as well. So time_from_date_range(self, freq) in this case (and also for all other benchmarks)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this in addition to or instead of setting self.freq=freq in setup?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that doesn't matter, it is just needed to let the code actually run.
But since you have to do that, the self.freq = freq is a bit superfluous as you can also do PeriodIndex(.. freq=freq). So to conclude: instead

@jreback
Copy link
Contributor

jreback commented Nov 22, 2017

can you update

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please run the benchmarks (eg asv dev -b period does not take a long time) to make sure they at least work.

def setup(self):
self.per = Period('2017-09-06 08:28', freq='min')
def time_now(self, freq):
self.per.now()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is failing (needs a freq)

def time_to_timestamp():
self.per.to_timestamp()
if typ == 'index':
self.vector = PeriodIndex(data, freq='M')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do here data * 1000 as well like you did for series (to have more data)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to make that change, but the *1000 for series was there when I got here. Im implicitly assuming someone had a reason for doing it that way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there was a reason

@jorisvandenbossche jorisvandenbossche changed the title ASV cleanup ASV/CLN: cleanup period benchmarks Nov 23, 2017
@jorisvandenbossche jorisvandenbossche merged commit c634352 into pandas-dev:master Nov 23, 2017
@jbrockmendel jbrockmendel deleted the asv_cleanup branch December 8, 2017 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmark Performance (ASV) benchmarks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants