Skip to content

More speedups for Period comparisons #21606

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 26, 2018
Merged

Conversation

jbrockmendel
Copy link
Member

Following #21582, the biggest avoidable overheads in Period comparisons are in a) isinstance calls that can be short-circuited and b) __ne__ call overhead. This PR moves __eq__ and __ne__ to the cython file, removing the __ne__ overhead, and changes an isinstance check to a try/except.

Using the same profile code from #21582, this brings the set_index runtime from 6.603 seconds down to 2.165 seconds.

@codecov
Copy link

codecov bot commented Jun 23, 2018

Codecov Report

Merging #21606 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21606      +/-   ##
==========================================
- Coverage    91.9%   91.89%   -0.01%     
==========================================
  Files         154      154              
  Lines       49562    49543      -19     
==========================================
- Hits        45549    45530      -19     
  Misses       4013     4013
Flag Coverage Δ
#multiple 90.29% <100%> (-0.01%) ⬇️
#single 41.84% <100%> (-0.02%) ⬇️
Impacted Files Coverage Δ
pandas/tseries/offsets.py 97.11% <100%> (-0.05%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 36422a8...92e6fdd. Read the comment docs.

@jreback jreback added Performance Memory or execution speed performance Period Period data type labels Jun 25, 2018
@jreback jreback added this to the 0.24.0 milestone Jun 25, 2018
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add this issue to your existing perf issue (always do that on follow ons)

@@ -313,6 +313,30 @@ class _BaseOffset(object):
def __setattr__(self, name, value):
raise AttributeError("DateOffset objects are immutable.")

def __eq__(self, other):
if is_string_object(other):
from pandas.tseries.frequencies import to_offset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you want to define a function to return this import, we use this in a number of places?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this just be trading one one-liner for another?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no then then its a function which we can import at the top (and of course if we eventually are able to actually move to_offset to cython its a big win). but its 'cleaner' I think. (certainly can followup on this in new PR). I just don't really like the imports hidden in the functions (as we must do currently), and rather centrailize in a single place.


@property
def _params(self):
from pandas.errors import AbstractMethodError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you import this at the top?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a minor comment. ping on green.

@@ -313,6 +313,30 @@ class _BaseOffset(object):
def __setattr__(self, name, value):
raise AttributeError("DateOffset objects are immutable.")

def __eq__(self, other):
if is_string_object(other):
from pandas.tseries.frequencies import to_offset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no then then its a function which we can import at the top (and of course if we eventually are able to actually move to_offset to cython its a big win). but its 'cleaner' I think. (certainly can followup on this in new PR). I just don't really like the imports hidden in the functions (as we must do currently), and rather centrailize in a single place.

@@ -182,6 +182,8 @@ def __add__(date):

Since 0 is a bit weird, we suggest avoiding its use.
"""
_params = cache_readonly(BaseOffset._params.fget)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove this blank line


@property # NB: non-cython subclasses override with cache_readonly
def _params(self):
all_paras = self.__dict__.copy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you give an expl of what this is doing (and you can move the comment inside the function too)

@jreback jreback merged commit 838e8da into pandas-dev:master Jun 26, 2018
@jreback
Copy link
Contributor

jreback commented Jun 26, 2018

thanks @jbrockmendel !

@jbrockmendel jbrockmendel deleted the pcache2 branch July 1, 2018 01:27
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Period Period data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants