Skip to content

ENH: add expanding window time support #16425

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jreback opened this issue May 22, 2017 · 6 comments
Open

ENH: add expanding window time support #16425

jreback opened this issue May 22, 2017 · 6 comments
Labels
Enhancement Window rolling, ewma, expanding

Comments

@jreback
Copy link
Contributor

jreback commented May 22, 2017

xref #16431

[5] should easily work. This is just disabled ATM.

In [1]: dft = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]},
   ...:    ....:                    index = pd.Index([pd.Timestamp('20130101 09:00:00'),
   ...:    ....:                                      pd.Timestamp('20130101 09:00:02'),
   ...:    ....:                                      pd.Timestamp('20130101 09:00:03'),
   ...:    ....:                                      pd.Timestamp('20130101 09:00:05'),
   ...:    ....:                                      pd.Timestamp('20130101 09:00:06')],
   ...:    ....:                                     name='foo'))
   ...:                                     

In [2]: dft
Out[2]: 
                       B
foo                     
2013-01-01 09:00:00  0.0
2013-01-01 09:00:02  1.0
2013-01-01 09:00:03  2.0
2013-01-01 09:00:05  NaN
2013-01-01 09:00:06  4.0

In [4]: dft.rolling('2s').sum()
Out[4]: 
                       B
foo                     
2013-01-01 09:00:00  0.0
2013-01-01 09:00:02  1.0
2013-01-01 09:00:03  3.0
2013-01-01 09:00:05  NaN
2013-01-01 09:00:06  4.0

In [5]: dft.expanding('2s').sum()
ValueError: min_periods must be an integer
@jreback jreback added Difficulty Intermediate Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode Datetime Datetime data dtype labels May 22, 2017
@jreback jreback added this to the Interesting Issues milestone May 22, 2017
@jreback jreback modified the milestones: Interesting Issues, Next Major Release Nov 26, 2017
@WillAyd
Copy link
Member

WillAyd commented Apr 10, 2018

What is the expectation of this? Doesn't expanding just provide a window yielding all values up to that point in time?

It's not possible to provide the size of the window in the expanding constructor hence the error (it gets inadvertently sent to the min_periods argument instead) so this is a non-issue?

@jorisvandenbossche
Copy link
Member

I am also not sure what the original intent was, but if it is to specify min_periods using a frequency string (since min_periods is the only keyword argument apart from center/axis): I personally don't think that makes sense in the case of expanding.

@jreback
Copy link
Contributor Author

jreback commented Apr 10, 2018

i'll clarify. it is useful to run an expanding window with a min_periods. For rolling min_periods defaults to the same as the window length

In [7]: dft.expanding(min_periods='2s').sum()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-597db88bedaf> in <module>()
----> 1 dft.expanding(min_periods='2s').sum()

~/pandas/pandas/core/generic.py in expanding(self, min_periods, center, axis)
   8721             axis = self._get_axis_number(axis)
   8722             return rwindow.expanding(self, min_periods=min_periods,
-> 8723                                      center=center, axis=axis)
   8724 
   8725         cls.expanding = expanding

~/pandas/pandas/core/window.py in expanding(obj, **kwds)
   2396         raise TypeError('invalid type: %s' % type(obj))
   2397 
-> 2398     return Expanding(obj, **kwds)
   2399 
   2400 

~/pandas/pandas/core/window.py in __init__(self, obj, min_periods, center, axis, **kwargs)
   1686                  **kwargs):
   1687         super(Expanding, self).__init__(obj=obj, min_periods=min_periods,
-> 1688                                         center=center, axis=axis)
   1689 
   1690     @property

~/pandas/pandas/core/window.py in __init__(self, obj, window, min_periods, center, win_type, axis, on, closed, **kwargs)
     78         self.win_freq = None
     79         self.axis = obj._get_axis_number(axis) if axis is not None else None
---> 80         self.validate()
     81 
     82     @property

~/pandas/pandas/core/window.py in validate(self)
    101         if self.min_periods is not None and not \
    102            is_integer(self.min_periods):
--> 103             raise ValueError("min_periods must be an integer")
    104         if self.closed is not None and self.closed not in \
    105            ['right', 'both', 'left', 'neither']:

ValueError: min_periods must be an integer

Note this doesn't actually work for rolling either.

In [8]: dft.rolling('2s', min_periods='2s').sum()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-9001c1afc49b> in <module>()
----> 1 dft.rolling('2s', min_periods='2s').sum()

~/pandas/pandas/core/generic.py in rolling(self, window, min_periods, center, win_type, on, axis, closed)
   8713                                    min_periods=min_periods,
   8714                                    center=center, win_type=win_type,
-> 8715                                    on=on, axis=axis, closed=closed)
   8716 
   8717         cls.rolling = rolling

~/pandas/pandas/core/window.py in rolling(obj, win_type, **kwds)
   2386         return Window(obj, win_type=win_type, **kwds)
   2387 
-> 2388     return Rolling(obj, **kwds)
   2389 
   2390 

~/pandas/pandas/core/window.py in __init__(self, obj, window, min_periods, center, win_type, axis, on, closed, **kwargs)
     78         self.win_freq = None
     79         self.axis = obj._get_axis_number(axis) if axis is not None else None
---> 80         self.validate()
     81 
     82     @property

~/pandas/pandas/core/window.py in validate(self)
   1373 
   1374     def validate(self):
-> 1375         super(Rolling, self).validate()
   1376 
   1377         # we allow rolling on a datetimelike index

~/pandas/pandas/core/window.py in validate(self)
    101         if self.min_periods is not None and not \
    102            is_integer(self.min_periods):
--> 103             raise ValueError("min_periods must be an integer")
    104         if self.closed is not None and self.closed not in \
    105            ['right', 'both', 'left', 'neither']:

ValueError: min_periods must be an integer

@WillAyd
Copy link
Member

WillAyd commented Apr 11, 2018

What would you expect in the above examples though? Perhaps I'm misreading this but I feel like this is a way of requiring a minimum distance from a given point in the window to calculate. So with the above you'd ignore any entry that wasn't at least 2s away from the window location (?).

If that's the case do we need to restrict this to frequencies or could we apply that same time of logic to numeric indices as well?

@jorisvandenbossche
Copy link
Member

Yes, I also don't directly see how it would work.

Let's take an example:

In [114]: s = pd.Series(range(8), index=pd.date_range("2018-01-01", periods=8))

In [115]: s.iloc[3] = np.nan

In [116]: s
Out[116]: 
2018-01-01    0.0
2018-01-02    1.0
2018-01-03    2.0
2018-01-04    NaN
2018-01-05    4.0
2018-01-06    5.0
2018-01-07    6.0
2018-01-08    7.0
Freq: D, dtype: float64

In [117]: s.rolling('3D', min_periods=3).sum()
Out[117]: 
2018-01-01     NaN
2018-01-02     NaN
2018-01-03     3.0
2018-01-04     NaN
2018-01-05     NaN
2018-01-06     NaN
2018-01-07    15.0
2018-01-08    18.0
Freq: D, dtype: float64

The min_periods=3 ensures that where there is the original NaN gap, we get NaNs in the result.

But what would a min_periods='3D' do? Assume you get the window of 2018-01-03 to 2018-01-05, which is a window of 3 days, but has only 2 valid values.

@jreback
Copy link
Contributor Author

jreback commented Jan 1, 2020

cc @mroeschke

@mroeschke mroeschke added Window rolling, ewma, expanding and removed Reshaping Concat, Merge/Join, Stack/Unstack, Explode Datetime Datetime data dtype labels Jan 2, 2020
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Window rolling, ewma, expanding
Projects
None yet
Development

No branches or pull requests

5 participants