Skip to content

Exact matches in time-based .rolling() #13965

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
chrisaycock opened this issue Aug 11, 2016 · 9 comments
Closed

Exact matches in time-based .rolling() #13965

chrisaycock opened this issue Aug 11, 2016 · 9 comments
Labels
Enhancement Resample resample method Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@chrisaycock
Copy link
Contributor

I would like to allow_exact_matches in time-based .rolling(). Using an example similar to the one found in the documentation:

df = pd.DataFrame({'B': [1, 1, 2, np.nan, 4]},
                  index = [pd.Timestamp('20130101 09:00:00'),
                           pd.Timestamp('20130101 09:00:02'),
                           pd.Timestamp('20130101 09:00:03'),
                           pd.Timestamp('20130101 09:00:05'),
                           pd.Timestamp('20130101 09:00:06')])

I can compute the three-second rolling sum:

In [200]: df
Out[200]:
                       B
2013-01-01 09:00:00  1.0
2013-01-01 09:00:02  1.0
2013-01-01 09:00:03  2.0
2013-01-01 09:00:05  NaN
2013-01-01 09:00:06  4.0

In [201]: df.rolling('3s').sum()
Out[201]:
                       B
2013-01-01 09:00:00  1.0
2013-01-01 09:00:02  2.0
2013-01-01 09:00:03  3.0
2013-01-01 09:00:05  2.0
2013-01-01 09:00:06  4.0

But note that 09:00:03 does not include 09:00:00, 09:00:05 does not include 09:00:02, and 09:00:06 does not include 09:00:03. If I were to include these timestamps, then I would have:

                       B
2013-01-01 09:00:00  1.0
2013-01-01 09:00:02  2.0
2013-01-01 09:00:03  4.0
2013-01-01 09:00:05  3.0
2013-01-01 09:00:06  6.0

A quick-and-dirty way of getting these values is with:

from itertools import starmap

window = pd.Timedelta('3s')

begins = np.searchsorted(df.index, df.index - window)
ends = np.arange(len(df)) + 1
ranges = starmap(range, zip(begins, ends))

pd.concat([df.ix[r].sum() for r in ranges])

pd.merge_asof() has the parameter allow_exact_matches to permit this. Would it be possible to add this to .rolling()?

I can try this one myself.

@chris-b1
Copy link
Contributor

chris-b1 commented Aug 11, 2016

For similarity with resampling, would this make sense as a parameter closed={'right', 'left', 'both'}, where IIUC, 'right' is the default and 'both' is the option you're adding?

@chris-b1
Copy link
Contributor

Although closed='left' maybe doesn't make any sense.

@chrisaycock
Copy link
Contributor Author

@chris-b1 The choice is rather binary: either we want all timestamps strictly greater than the lowest bound, or we want all timestamps greater than or equal to the lowest bound. .rolling() always defines the last row as the current one under consideration.

@chris-b1
Copy link
Contributor

Yep, fair point. I might be off base, but I find allow_exact_matches hard to parse in this context, since here it's about including the earliest date in a window, where in merge_asof it's about including the latest date (right?).

That said, regardless of how it's spelled, I do think it'd be a nice addition!

@chrisaycock
Copy link
Contributor Author

@chris-b1 Fair point about first in .rolling() vs last in pd.merge_asof(). I do like reusing the parameter name since it's already around, but I'm open to anything that makes sense.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Aug 11, 2016

Agreed with @chris-b1 that it would be a nice addition, but for me allow_exact_matches is not a very clear name as well (in the sense that I wouldn't directly think of that use case when I see the name).

I think it is actually about an open vs closed interval? So maybe using closed=True/False would also be an option, given that 'right'/'left' is not the correct meaning here (as it is always right closed, and it's only the left that can be open or closed). Although that name is also not directly meaningful from its name alone.

@agraboso
Copy link
Contributor

How about left_closed? #13968

@sinhrks sinhrks added the Resample resample method label Aug 11, 2016
@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Difficulty Intermediate labels Aug 11, 2016
@jreback jreback added this to the Next Major Release milestone Aug 11, 2016
@jreback jreback modified the milestones: 0.20.0, Next Major Release Feb 27, 2017
@jreback
Copy link
Contributor

jreback commented Mar 15, 2017

@chrisaycock re-reading this, is there a suggestion to change allow_exact_matches to a closed='left'|'both' parameter in merge_asof as well? (we can create a separate issue for that).

@chrisaycock
Copy link
Contributor Author

chrisaycock commented Mar 15, 2017

@jreback Since there are only two possible values for that parameter, I'd rather that it be a boolean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Resample resample method Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
6 participants