-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: Implement date_range for business day in terms of daily #16463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
pd.date_range is just a convenience wrapper around DatetimeIndex, and so the same problem affects resamples to B as well because of the call to DatetimeIndex it makes along the way when building a binner. There will be lots of secondary benefits as well. |
xref: #11214 - slightly more generic issue. In general any kind of bin generation larger than |
@chris-b1: wow, when you're right, you're right. This is all kinds of crazysauce.
That's a wild ratio. I think it should be possible to provide a vectorized (not even cythonized) version for each of the some-filter-on-daily offsets, both anchored and not, without too much trouble. But even if not, we should at least fix the big ones. I grepped through just one of my codebases and found hundreds of D->B resamples on either Series or few-column frames where this is apparently the bottleneck. :-/ |
I can try giving this a go! |
@chris-b1 this is actually not that surprising, all this is doing is this an |
From what I can tell, cythonizing |
I think that's the base location which is slow, but I don't know how much of the slowness is due to the Python-level looping in generate_range or the Python-level operations in the offset's Whether that's a better direction than just taking advantage of preexisting fast code, I don't know. |
Yeah @dsm054 is right, there is a whole chain of pure-python stuff behind the offsets, so not that much benefit to cythonizing the existing code without a major rewrite. Much easier would be to define some fastpaths for "easy" offsets, using existing vectorized datetime methods like suggested at the top.
Hardest part is probably getting the number of periods/start/end logic correct. |
@rohanp: are you still interested in looking at this even though it may not involve cython? ;-) If not, I have a prototype which looks promising although my testing of it has been pretty shallow so far (and as usual, it'll probably take longer to test than to implement..) |
I am still interested, but feel free to share your prototype.
…On Thu, May 25, 2017 at 7:34 AM DSM ***@***.***> wrote:
@rohanp <https://github.com/rohanp>: are you still interested in looking
at this even though it may not involve cython? ;-) If not, I have a
prototype which looks promising although my testing of it has been pretty
shallow so far (and as usual, it'll probably take longer to test than to
implement..)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16463 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGxpcLJn7liouFzY61c_vE7SMlQYhHDAks5r9ZFwgaJpZM4NkNK2>
.
|
generate_range and the relevant DateOffset.apply code have all been moved into cython, so some of this has improved. Generating with date_range(..., freq="D") and then masking as in the OP is fine for "B" where you're not discarding much, it would get expensive as you discard more. Instead I think we can now use period_range
|
Code Sample, a copy-pastable example if possible
@dsm054 noticed that business day performance is slow relative to daily
These two are equivalent
If we do these, we'll have to special case the handling when
periods
is passed to expand it by enough weekends. But that's a pretty nice performance boost for a pretty common operation.The text was updated successfully, but these errors were encountered: