-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Custom-business-days offsets very slow #6584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
this was contributed several pandas releases ago and has not had many comments... you always want to work in Pls profile and see if you can figure out where and submit a PR! |
#5148 might have an impact on this as well |
Ok thanks, I'll have a look at it. |
And I think we shouldn't have a UsBday offset, but rather start implementing calendars that can be added to the BusinessDay offset. This way we don't have a ton of date classes lying around. Further, it would be nice to have some rule factory that would take generic rules and be able to develop the offset from that. For example, the below are rules that I have for US holidays (Code is relative to a date composed of Month, Day). I implemented a different offset scheme where 0, -1, +1 actually have some meaning and 3 day means to make it a 3 day weekend where Saturday means a Friday holiday and Sunday means a Monday holiday. So -1d+3Mon will be interpreted as the 3rd Monday from the reference date. From this it's very easy to create a holiday curve by just going through each rule and each year. This way there aren't many holiday functions, but rules in a simple table. What do you think?
|
Its actually a bit more complicated, their are some holidays that are year dependent, e.g. president's deaths. This could certainly be implemented via a |
That could be easily supported by having an optional start/end date parameter (for example, if a holiday changes date or for the example you describe) or if it's only in a given year, have an optional year field. |
sure...you can do lots with holiday! need someone to write the Holidays class (and supporting machinery), to integrate with BusinessDay/CustomBusinessDay. A BusinessDay is really just a set of custom holidays (though impl makes it easier to use a weekday filter), but same idea. |
I have most of this done. I can commit what I have and we can go from there. @bjonen is that okay? |
@rockg awesome! can you show an example? |
Here are some:
ApplyOffsetRule is what parses the date rules to create both the holidays and can apply holidays to other rules like +1CustomBusinessDay ('+1b' above). HolidayCalendar('US') is a stored object that just contains the rules as an attribute. We can very easily pass this into CustomBusinessDay(calendar=HolidayCalendar('US')) or CustomBusinessDay(calendar=USHolidayCalendar). |
I investigated the issue a bit further. We have to deal with two cases. Either increment
|
I never got why the back-and-forth betwen np.datetime64 and datetime's here... i didn't really look into detail. But if datetimes are faster then use them. Just make sure it passes the current tests.! pls submit a PR when ready |
The reason is |
ahh ok.....so maybe convert to/fro from that (that's what its doing i guess then)..... |
not 100% sure what conversions are happening low-level, but you can |
@rockg Regarding the holiday calendar. Your approach works for me. I looked around a bit for existing calendars that we could use (e.g. http://www.mozilla.org/en-US/projects/calendar/holidays/). We could extract the holdays from the .ical files without having to worry about the exact rules. However, most of the calendars do not range back very long. For most of my usecases that is important however. |
Not in an ideal format, but this data goes back a long time: http://www.nyse.com/pdfs/closings.pdf |
The custom-business-days are currently significantly slower (around factor 4) compared to pd.offsets.BusinessDay(). Without actually specifying any custom business days:
Profiling pd.offsets.CustomBusinessDay.apply shows that only around 13% of the time is spent in np.busday_offset. The majority of time is spent casting the dates from datetime to datetime64 etc.
I'm not so familiar with the code but one idea would be to work in datetime by default and try to stick with it as much as possible. The method could then look something like this:
While this might not be a perfect comparison because I left out some conversion code, the changes yield a sizable speedup.
Ultimately I would like to have a
UsBday
offset. The code looks like this:My first intuition when I noticed that custom business days are slower was that this is due to the large list of holidays passed to numpy. The timings at the end of the code block, however, show that adding a custom business day with realistic holidays does not alter the performance by much. The main speed difference results from interfacing with numpy and is therefore a Pandas issue.
I know that CustomBusinessDays is in experimental mode. I hope this feedback can help improve it because I think it is an important feature in Pandas. Also perhaps it would be nice to ship certain custom calendars, for example for the US, directly with Pandas.
The text was updated successfully, but these errors were encountered: