Skip to content

regression in 0.23.0 on rolling max with DatetimeIndex. #21096

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lexual opened this issue May 17, 2018 · 5 comments · Fixed by #29761
Closed

regression in 0.23.0 on rolling max with DatetimeIndex. #21096

lexual opened this issue May 17, 2018 · 5 comments · Fixed by #29761
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Resample resample method
Milestone

Comments

@lexual
Copy link
Contributor

lexual commented May 17, 2018

Behaviour of .rolling() with datetimeindex appears to have changed or regressed between 0.22 & 0.23 versions of pandas.

See below for cut-down test case exhibiting behaviour.

Also see below for different output produced on same code on same data with 2 different pandas versions.

import numpy as np
import pandas as pd


def main():
    print('pandas', pd.__version__)
    n = 10
    index = pd.date_range(
        start='2018-1-1 01:00:00',
        freq='1min',
        periods=n,
    )
    s = pd.Series(
        data=0,
        index=index,
    )
    s.iloc[1] = np.nan
    s.iloc[-1] = 2
    print(s)
    maxes = s.rolling(window=f'{n}min').max()
    print(maxes.value_counts(dropna=False))


if __name__ == '__main__':
    main()

ACTUAL OUTPUT

pandas 0.22.0      
2018-01-01 01:00:00    0.0             
2018-01-01 01:01:00    NaN             
2018-01-01 01:02:00    0.0             
2018-01-01 01:03:00    0.0             
2018-01-01 01:04:00    0.0             
2018-01-01 01:05:00    0.0             
2018-01-01 01:06:00    0.0             
2018-01-01 01:07:00    0.0             
2018-01-01 01:08:00    0.0             
2018-01-01 01:09:00    2.0             
Freq: T, dtype: float64                
0.0    9           
2.0    1           
dtype: int64       
pandas 0.23.0      
2018-01-01 01:00:00    0.0             
2018-01-01 01:01:00    NaN             
2018-01-01 01:02:00    0.0             
2018-01-01 01:03:00    0.0             
2018-01-01 01:04:00    0.0             
2018-01-01 01:05:00    0.0             
2018-01-01 01:06:00    0.0             
2018-01-01 01:07:00    0.0             
2018-01-01 01:08:00    0.0             
2018-01-01 01:09:00    2.0             
Freq: T, dtype: float64                
0.0    10          
dtype: int64       

@jreback
Copy link
Contributor

jreback commented May 17, 2018

can you create the input programatically here (and update the top )

@jreback jreback added the Resample resample method label May 17, 2018
@lexual
Copy link
Contributor Author

lexual commented May 17, 2018

done @jreback , updated simpler test-case, completely in code in 1st comment.

@lexual
Copy link
Contributor Author

lexual commented May 19, 2018

$ git bisect bad 
39e7b6916b07982240bac87132848fb2665806a2 is the first bad commit
commit 39e7b6916b07982240bac87132848fb2665806a2
Author: Matt Kirk <[email protected]>
Date:   Wed Feb 14 18:13:19 2018 +0700

    Performance increase rolling min max (#19549)

:040000 040000 6543b93964c04b7f71c14a9992c8f70606e02c77 a7f4578db8843fdb0f9dbe02322584ced5628dd9 M      asv_bench
:040000 040000 3f68b43b26722b2f7a2874433e88965ca5d8bfda 2eaff46202589c76a8d85251cfdb052dd38660f1 M      doc
:040000 040000 e28eb2f13f13846afe64b82ce492a5a877b98be7 18784432eb0d9655cca3de34a02f5f514c44eaff M      pandas
:100755 100755 2332503e558ed7ee9d57e3b3153ae674c1d292ad c66979dd19ef039cd4b5172c65276d1340507631 M      setup.py

@lexual
Copy link
Contributor Author

lexual commented May 19, 2018

cc @hexgnu

@mroeschke
Copy link
Member

This looks fixed on master and could still use a regression test.

In [9]: pd.__version__
Out[9]: '0.24.0.dev0+911.g24ab22f75'

In [10]: s
Out[10]:
2018-01-01 01:00:00    0.0
2018-01-01 01:01:00    NaN
2018-01-01 01:02:00    0.0
2018-01-01 01:03:00    0.0
2018-01-01 01:04:00    0.0
2018-01-01 01:05:00    0.0
2018-01-01 01:06:00    0.0
2018-01-01 01:07:00    0.0
2018-01-01 01:08:00    0.0
2018-01-01 01:09:00    2.0
Freq: T, dtype: float64

In [11]: maxes = s.rolling(window=f'{n}min').max()

In [12]: maxes.value_counts(dropna=False)
Out[12]:
0.0    9
2.0    1
dtype: int64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Resample resample method
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants