Skip to content

Handling a CustomBusinessDay in time-based .rolling() #13969

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
chrisaycock opened this issue Aug 11, 2016 · 6 comments
Closed

Handling a CustomBusinessDay in time-based .rolling() #13969

chrisaycock opened this issue Aug 11, 2016 · 6 comments
Labels
Enhancement Frequency DateOffsets Window rolling, ewma, expanding

Comments

@chrisaycock
Copy link
Contributor

Starting with 10 business days following Christmas Eve:

from pandas.tseries.offsets import CustomBusinessDay
from pandas.tseries.holiday import USFederalHolidayCalendar

days = CustomBusinessDay(calendar=USFederalHolidayCalendar())

df = pd.DataFrame({'value': np.arange(n)},
                  index=pd.date_range('2015-12-24', periods=n, freq=days))

I can compute the three-day sum of the values with just:

In [21]: df.rolling('3d').sum()
Out[21]:
            value
2015-12-24    0.0
2015-12-28    1.0
2015-12-29    3.0
2015-12-30    6.0
2015-12-31    9.0
2016-01-04    5.0
2016-01-05   11.0
2016-01-06   18.0
2016-01-07   21.0
2016-01-08   24.0

But this is purely in terms of Gregorian calendar days, not the business calendar days that I had created the DataFrame with.

I can easily compute

df.index - 3*days

though I get a performance warning:

Non-vectorized DateOffset being applied to Series or DatetimeIndex

But I can't just pass this offset directly:

In [23]: df.rolling(3*days).sum()
...
ValueError: <3 * CustomBusinessDays> is a non-fixed frequency

I would like to be able to handle a CustomBusinessDay in .rolling(). (Because the DataFrame may come from any source, it would be easier to just pass the DateOffset object to .rolling() instead of using anything specific to the DataFrame's index.)

I know that the freq parameter was deprecated in 0.18, though 0.19 kinda brings this back in window. Is there an intrinsic reason window can't handle a CustomBusinessDay?

@chris-b1
Copy link
Contributor

You could probably build a second version of this class that handled an arbitrary offset object rather than a fixed window, although it wouldn't be performant, as there would have to be a ton of calls back into python space. Unless there's some trick I'm not thinking of.
https://github.com/pydata/pandas/blob/master/pandas/window.pyx#L252

There was an issue about cythonized offsets, #11214, and I started a proof of concept, but never went any further with it.
https://github.com/chris-b1/pandas/tree/cythonize-offset

@agraboso
Copy link
Contributor

agraboso commented Aug 11, 2016

Is there an intrinsic reason window can't handle a CustomBusinessDay?

As of now, because offset-based rolling windows are based on converting to nanoseconds (as we are discussing in #13968), which non-fixed frequencies do not support — the ValueError is coming from here.

This reaffirms my impression that the whole architecture around DateOffset is somewhat broken: we are calling by the same name things that are fundamentally different — regularity and anchoring being the critical concepts. I think this is something to think about long and hard for pandas 2.0

@chris-b1
Copy link
Contributor

Actually, this isn't as bad as I was thinking. It's be almost like resampling, the one expensive op you'd have to do is generate the range of anchored dates.
rng = pd.date_range(df.index.min(), df.index.max(), freq=day)

Then you'd just need a cython function that generates window edges from that range and the frame's index.

@sinhrks
Copy link
Member

sinhrks commented Aug 11, 2016

xref #12351, #12377.

@sinhrks sinhrks added Frequency DateOffsets Resample resample method labels Aug 11, 2016
@jreback
Copy link
Contributor

jreback commented Aug 11, 2016

this is actually pretty easy. the point of build as @chris-b1 indicated. IS to create the indexers to the window, which by-definition are generally variable (they don't have to be though as that can be done more efficiently by a fixed offset window).

Once you have this window indexer (and this is pretty cheap to create), then rolling is the same. The current logic assumes a fixed differential indexer (e.g. it has a constant period regardless of the point in time).

@jreback jreback added Datetime Datetime data dtype Reshaping Concat, Merge/Join, Stack/Unstack, Explode Difficulty Advanced labels Aug 11, 2016
@jreback jreback added this to the Next Major Release milestone Aug 11, 2016
@mroeschke mroeschke added Window rolling, ewma, expanding and removed Resample resample method Reshaping Concat, Merge/Join, Stack/Unstack, Explode Datetime Datetime data dtype labels Oct 27, 2019
@mroeschke
Copy link
Member

I think this issue can be closed as the VariableOffsetWindowIndexer was added to handle non-fixed offsets. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.api.indexers.VariableOffsetWindowIndexer.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Frequency DateOffsets Window rolling, ewma, expanding
Projects
None yet
Development

No branches or pull requests

7 participants