-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Rolling window endpoints inclusion #15795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rolling window endpoints inclusion #15795
Conversation
pandas/core/window.pyx
Outdated
|
||
|
||
cdef _roll_min_max(ndarray[numeric] input, int64_t win, int64_t minp, | ||
object index, bint is_max): | ||
object index, str closed, bint is_max): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about the behavior of str here (cdef)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
Here is a quick example of what this feature looks like
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly minor points
- add a whatsnew entry (Other Enhancements) in whatsnew (0.20.0), with a link to the computation.rst docs (where you have a new section showing the use of closed; you can use your test examples, e.g. showing all 4 cases).
pandas/core/window.py
Outdated
@@ -101,6 +103,9 @@ def validate(self): | |||
if self.min_periods is not None and not \ | |||
is_integer(self.min_periods): | |||
raise ValueError("min_periods must be an integer") | |||
if self.closed not in ['right', 'both', 'left', 'neither']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testing this?
pandas/core/window.pyx
Outdated
@@ -161,7 +161,8 @@ cdef class MockFixedWindowIndexer(WindowIndexer): | |||
|
|||
""" | |||
def __init__(self, ndarray input, int64_t win, int64_t minp, | |||
object index=None, object floor=None): | |||
object index=None, object floor=None, | |||
bint l_closed=False, bint r_closed=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you have a chance, can you add a mini-doc strings to things.
@@ -307,18 +322,35 @@ def get_window_indexer(input, win, minp, index, floor=None, | |||
compat Indexer that allows us to use a standard | |||
code path with all of the indexers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you augment the doc-string here (with closed)
pandas/core/window.pyx
Outdated
bint l_closed = False | ||
bint r_closed = False | ||
|
||
if closed not in ['right', 'left', 'both', 'neither']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just do an assert here (this is an internal routine that has certain guarantees)
pandas/core/window.pyx
Outdated
@@ -327,7 +359,7 @@ def get_window_indexer(input, win, minp, index, floor=None, | |||
|
|||
|
|||
def roll_count(ndarray[double_t] input, int64_t win, int64_t minp, | |||
object index): | |||
object index, closed='right'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just make this required
object closed
pandas/core/window.pyx
Outdated
@@ -1111,7 +1148,7 @@ cdef inline numeric calc_mm(int64_t minp, Py_ssize_t nobs, | |||
|
|||
|
|||
def roll_max(ndarray[numeric] input, int64_t win, int64_t minp, | |||
object index): | |||
object index, str closed): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for the rest of these: str-> object
pandas/core/window.pyx
Outdated
|
||
|
||
cdef _roll_min_max(ndarray[numeric] input, int64_t win, int64_t minp, | ||
object index, bint is_max): | ||
object index, str closed, bint is_max): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
pandas/tests/test_window.py
Outdated
@@ -3356,6 +3356,34 @@ def test_min_periods(self): | |||
result = df.rolling('2s', min_periods=1).sum() | |||
tm.assert_frame_equal(result, expected) | |||
|
|||
def test_closed(self): | |||
df = DataFrame({'A': [1]*5}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add this issue number as a comment
pandas/tests/test_window.py
Outdated
df = DataFrame({'A': [1]*5}, | ||
index = [pd.Timestamp('20130101 09:00:01'), | ||
pd.Timestamp('20130101 09:00:02'), | ||
pd.Timestamp('20130101 09:00:03'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test passing invalid closed
expected["A"] = [np.nan, 1.0, 2, 2, 1] | ||
result = df.rolling('2s', closed='left').sum() | ||
tm.assert_frame_equal(result, expected) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chrisaycock how do these tests look? what corners do we need to test here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the DataFrame is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the DataFrame is empty, the behavior is exactly the same as in the current implementation. For time-based windows, it throws "ValueError: window must be an integer". Note that this feature only affects the window indexer's start and end vectors. In fixed windows, empty DataFrames always return empty (the result is constructed as np.empty(N) where N is the length of the input)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separately I think [4] should just return an empty frame for consistency.
In [4]: DataFrame().rolling('1s').sum()
ValueError: window must be an integer
In [5]: DataFrame().rolling(1).sum()
Out[5]:
Empty DataFrame
Columns: []
Index: []
@carlosdanielcsantos #15795 (comment) are basically the docs (with a bit of prose) and nice formatting. Laying out the possibilities (in a sub-section of computation.rst). Note you also might want to have a 'real-world' uses case sentence for some of these. |
you can see pep errors: http://pandas-docs.github.io/pandas-docs-travis/contributing.html#python-pep8 use |
Codecov Report
@@ Coverage Diff @@
## master #15795 +/- ##
==========================================
- Coverage 91.01% 90.99% -0.03%
==========================================
Files 143 143
Lines 49400 49404 +4
==========================================
- Hits 44963 44956 -7
- Misses 4437 4448 +11
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #15795 +/- ##
==========================================
+ Coverage 91% 91.03% +0.02%
==========================================
Files 145 145
Lines 49581 49592 +11
==========================================
+ Hits 45122 45146 +24
+ Misses 4459 4446 -13
Continue to review full report at Codecov.
|
@jreback How should I add this to computations.rst? A new section inside Window Functions or just add the examples inside the existing Window Functions -> Rolling Windows? |
doc/source/whatsnew/v0.20.0.txt
Outdated
@@ -268,6 +268,7 @@ To convert a ``SparseDataFrame`` back to sparse SciPy matrix in COO format, you | |||
Other enhancements | |||
^^^^^^^^^^^^^^^^^^ | |||
|
|||
- ``DataFrame.rolling()`` now accepts the parameter ``closed='right'|'left'|'both'|'neither'`` to choose the rolling window endpoint closedness. (:issue:`13965`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is that a word?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some discussion going on about that online, but its a useful word to refer the property of being closed. But I am open to suggestions, no pun intended :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hahah yep that is fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe closed-ness
? (though I don't think that is actually a word)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also add a :ref:
to the new sub-section that you are creating in computation.rst
pandas/core/window.py
Outdated
@@ -374,8 +379,12 @@ class Window(_Window): | |||
on : string, optional | |||
For a DataFrame, column on which to calculate | |||
the rolling window, rather than the index | |||
closed : 'right', 'left', 'both', 'neither' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the default setting (right)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do this with a bulleted list
pandas/core/window.pyx
Outdated
@@ -158,10 +158,15 @@ cdef class MockFixedWindowIndexer(WindowIndexer): | |||
index of the input | |||
floor: optional | |||
unit for flooring | |||
l_closed: bint | |||
left endpoint closedness |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would write these out l_closed
-> left_closed
easier to read
pandas/core/window.pyx
Outdated
@@ -229,10 +239,14 @@ cdef class VariableWindowIndexer(WindowIndexer): | |||
min number of obs in a window to consider non-NaN | |||
index: ndarray | |||
index of the input | |||
l_closed: bint | |||
left endpoint closedness |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can add a bit of explication here (1 line), just for future readers
@@ -261,7 +276,11 @@ cdef class VariableWindowIndexer(WindowIndexer): | |||
N = self.N | |||
|
|||
start[0] = 0 | |||
end[0] = 1 | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put the comments above the code
if r_closed:
# .....
end[0] = 1
else:
# ....
end[0] 0
pandas/core/window.pyx
Outdated
@@ -271,6 +290,9 @@ cdef class VariableWindowIndexer(WindowIndexer): | |||
end_bound = index[i] | |||
start_bound = index[i] - win | |||
|
|||
if l_closed: # left endpoint is closed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of can do (I use this if only an if statement)
# left endpoint ....
if left_closed:
....
@@ -286,9 +308,13 @@ cdef class VariableWindowIndexer(WindowIndexer): | |||
else: | |||
end[i] = end[i - 1] | |||
|
|||
# right endpoint is open |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep excatly like this one
pandas/core/window.pyx
Outdated
@@ -299,6 +325,8 @@ def get_window_indexer(input, win, minp, index, floor=None, | |||
minp: integer, minimum periods | |||
index: 1d ndarray, optional | |||
index to the input array | |||
closed: 'right', 'left', 'both', 'neither' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
closed: string, default 'right'
{'right',......}
@carlosdanielcsantos lgtm. some more doc-string nitpicks. |
ping when all green. |
@jreback I was going to start writing tests for the FixedWindowIndexer but there are no explicit rolling computing test going on in TestRolling (they are done in TestMoments, I think). Where should I add these new tests? |
exactly, a new sub-section that pretty much shows the example with all 4 closed. maybe right before Centering Windows? |
We don't have explicit tests for the cython routines, these are covered by the integration tests (top-level routines). Though certainly could add them if you want / can. You could add a new class in test_windows that explicity exercises these. |
@jreback @chrisaycock |
@carlosdanielcsantos can you show a comparison of these 2, to show more what you mean? |
Look how these results come from the application of sum() to every element inside the valid window. This resembles Nonetheless, I think that the possibility of ignoring the current index in computations is valuable (that was my initial problem, after all). Some solutions for this are:
|
This a bit orthogonal though. We always want to actually use the index. Since using a fixed window size, it looks like we are ignoring the index :>. So a possiblity is to actually check the index whether its equiv to a RangeIndex. e.g.
Let's deal with this in a separate issue. A way to disambiguate on this ATM is to allow I think we should fix this for Fixed intervals, but I agree this is a problem because we essentially have |
@carlosdanielcsantos if you can update would be great (docs and such), ignoring the issue about fixed windows for the moment. |
@carlosdanielcsantos can you update |
@jreback I'll do that in the next hours. Should I just leave the fixed-window as it is? |
I think make the We have to fix this, but this PR merely opens up the possibilities of change here. So I think ok to merge this. |
@carlosdanielcsantos can you update |
@jreback made some corrections but circleci is not passing in test_append_index - pandas.tests.test_multilevel.TestMultiLevel, but passing locally. Any thoughts? |
try pushing again (make a modification) |
@jreback Still not passing. Could I have broken this with the changes in window? |
‘Closed’ feature is working as expected in time-based windows. Sum tests are passing.
Adding test of assert for closed parameter Adding assert for closed parameter in get_window_indexer
Also, adding whatsnew entry
Adding computation.rst section (still not written)
Not allowing closed parameter for fixed windows Passing tests again Updating docs
Style corrections
6ca4949
to
568c12f
Compare
@carlosdanielcsantos I rebased you against current master. This should pass cleanly. If you'd make those other changes and ping. |
@jreback what changes? I believe I updated everything that you requested. Can you take a look at AppVeyor build that failed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments.
doc/source/computation.rst
Outdated
.. _stats.rolling_window.endpoints: | ||
|
||
Rolling window endpoint inclusion | ||
~~~~~~~~~~~~~~~~~~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this underline needs to be the same length as the text
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this Rolling Window Endpoints
doc/source/computation.rst
Outdated
The inclusion of the interval endpoints in rolling window calculations can be specified with the ``closed`` | ||
parameter: | ||
|
||
- ``right`` : close right endpoint (default for time-based windows) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make this a csv-table (3 columns: closed
, description
, default
) or something like that
|
||
df | ||
|
||
Currently, this feature is only implemented for time-based windows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expand this a touch to say that only closed='both'
is accepted for fixed windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am only accepting closed=None for fixed windows, as previously requested by you
For the fixed window document as well (you can say its closed='both', but ATM only allow None)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's what i mean just make sure the docs are clear on that ; e.g. None -> 'both' for fixed (and assert that as well )
doc/source/whatsnew/v0.20.0.txt
Outdated
@@ -319,6 +319,7 @@ To convert a ``SparseDataFrame`` back to sparse SciPy matrix in COO format, you | |||
Other Enhancements | |||
^^^^^^^^^^^^^^^^^^ | |||
|
|||
- ``DataFrame.rolling()`` now accepts the parameter ``closed='right'|'left'|'both'|'neither'`` to choose the rolling window endpoint closedness. See :ref:`documentation <stats.rolling_window.endpoints>` (:issue:`13965`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See :ref:`the documentation ....
@@ -101,6 +103,10 @@ def validate(self): | |||
if self.min_periods is not None and not \ | |||
is_integer(self.min_periods): | |||
raise ValueError("min_periods must be an integer") | |||
if self.closed is not None and self.closed not in \ | |||
['right', 'both', 'left', 'neither']: | |||
raise ValueError("closed must be 'right', 'left', 'both' or " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there validation for a fixed window when closed is NOT both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, there is a validation for when closed is not None in Rolling.validate()
@@ -431,6 +431,12 @@ def test_numpy_compat(self): | |||
tm.assertRaisesRegexp(UnsupportedFunctionCall, msg, | |||
getattr(r, func), dtype=np.float64) | |||
|
|||
def test_closed(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback here is the piece that tests that closed must be None for fixed-window. Well, actually it only tests that it should not be 'neither', but I do not think replication is necessary.
@@ -1044,6 +1058,10 @@ def validate(self): | |||
elif self.window < 0: | |||
raise ValueError("window must be non-negative") | |||
|
|||
if not self.is_datetimelike and self.closed is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback here we enforce that closed
must be None
for windows that are not datetimelike
thanks @carlosdanielcsantos nice PR. as you can see, the code changes are the 'easy' part, its the docs & testing that are the details.! keem em coming! |
git diff upstream/master --name-only -- '*.py' | flake8 --diff