Skip to content

PERF: typing and cdefs for tslibs.resolution #21452

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 14, 2018

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Jun 12, 2018

Moving things lower-level will help improve performance (due in part to better Cython compilation).

return us % mult == 0


def _maybe_add_count(base, count):
cdef inline str _maybe_add_count(str base, count):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a int64_t type for count here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ill double-check, but I'm pretty sure past-me had a reason for leaving that one out. Maybe there are cases when None is passed

"""
Not sure if I can avoid the state machine here
"""
cdef public:
index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need types here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really viable since we can't specify Index as a type. We just have to make the name public for wheels to turn.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just specify them as object to make this explict

bint calendar_start = True
bint business_start = True
bint cal
int32_t[:] years
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are memoryviews, right? Any reason you are choosing these instead of the ndarray declarations throughout the rest of the code base?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that memoryviews are encouraged because they are lighter weight.

Copy link
Member

@gfyoung gfyoung Jun 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel : That is indeed correct. I agree with this design choice.

Might be something to look into for other parts of the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, one of the main problems with memoryviews is with read-only arrays (and for that reason we do/should not use them solely in certain algos), but I don't think that's relevant in this case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can make cdefs readonly when it makes sense FYI

return us % mult == 0


def _maybe_add_count(base, count):
cdef inline str _maybe_add_count(str base, count):
if count != 1:
return '{count}{base}'.format(count=int(count), base=base)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If above comment is correct shouldn't need the int cast here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you're right. Just pushed a commit that added the suggested typing and removed the int(. (and hopefully fixed the test errors)

@codecov
Copy link

codecov bot commented Jun 13, 2018

Codecov Report

Merging #21452 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21452      +/-   ##
==========================================
+ Coverage   91.89%    91.9%   +<.01%     
==========================================
  Files         153      153              
  Lines       49600    49606       +6     
==========================================
+ Hits        45580    45589       +9     
+ Misses       4020     4017       -3
Flag Coverage Δ
#multiple 90.3% <ø> (ø) ⬆️
#single 41.89% <ø> (+0.02%) ⬆️
Impacted Files Coverage Δ
pandas/core/indexes/datetimes.py 95.8% <0%> (ø) ⬆️
pandas/core/indexes/base.py 96.62% <0%> (ø) ⬆️
pandas/core/indexes/category.py 97.06% <0%> (+0.03%) ⬆️
pandas/tseries/offsets.py 97.24% <0%> (+0.24%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ab668b0...bf67f17. Read the comment docs.

@gfyoung gfyoung added Datetime Datetime data dtype Internals Related to non-user accessible pandas implementation Performance Memory or execution speed performance labels Jun 13, 2018
@gfyoung gfyoung changed the title typing and cdefs for tslibs.resolution PERF: typing and cdefs for tslibs.resolution Jun 13, 2018
@gfyoung
Copy link
Member

gfyoung commented Jun 13, 2018

@jbrockmendel : I feel like the performance help is relatively obvious given that you are moving things underneath the Python level. Is there any asv that could illustrate this by any chance (not a blocker, but just curious)? If so, that might be worthwhile to include in a whatsnew.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments. just run asv's to confirm no regressions. of course if improvements then shout it out!
ping on green after updating for comments.

"""
Not sure if I can avoid the state machine here
"""
cdef public:
index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just specify them as object to make this explict

bint calendar_start = True
bint business_start = True
bint cal
int32_t[:] years
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can make cdefs readonly when it makes sense FYI

@jreback jreback added this to the 0.24.0 milestone Jun 13, 2018
@jbrockmendel
Copy link
Member Author

%timeit results are insignificantly sped up:

now = pd.Timestamp.now()
dti = pd.DatetimeIndex([now + pd.Timedelta(seconds=19*n) for n in range(10**5)])
dti.inferred_freq  # creates _cache attr

def infer():
    dti._cache.clear()
    return dti.inferred_freq

Status quo results for %timeit infer(), in microseconds:
958, 957, 961, 964, 956, 959, 964

PR:
962, 958, 954, 953, 954, 956, 957

Note that in 0.23.0 it looks like these come back closer to 832. No idea what changed in the interim.

@jreback jreback merged commit 6028a5b into pandas-dev:master Jun 14, 2018
@jreback
Copy link
Contributor

jreback commented Jun 14, 2018

thanks!

david-liu-brattle-1 pushed a commit to david-liu-brattle-1/pandas that referenced this pull request Jun 18, 2018
@jbrockmendel jbrockmendel deleted the cyres branch June 22, 2018 03:27
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Internals Related to non-user accessible pandas implementation Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants