Skip to content

BUG: Rolling.count modifies 'min_periods' inplace since 1.2.0 #39554

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
dchigarev opened this issue Feb 2, 2021 · 0 comments · Fixed by #39604
Closed
3 tasks done

BUG: Rolling.count modifies 'min_periods' inplace since 1.2.0 #39554

dchigarev opened this issue Feb 2, 2021 · 0 comments · Fixed by #39604
Assignees
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member Window rolling, ewma, expanding
Milestone

Comments

@dchigarev
Copy link

dchigarev commented Feb 2, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import numpy as np
import pandas
from pandas.testing import assert_frame_equal

data = {
    "col1": [np.nan, 1, 2]
}

pd_df = pandas.DataFrame(data)
pd_rolled = pd_df.rolling(window=2, min_periods=None)

res1 = pd_rolled.sum()
pd_rolled.count()
res2 = pd_rolled.sum()

assert_frame_equal(res1, res2) # AssertionError
Output
AssertionError: DataFrame.iloc[:, 0] (column name="col1") are different

DataFrame.iloc[:, 0] (column name="col1") values are different (66.66667 %)
[index]: [0, 1, 2]
[left]:  [nan, nan, 3.0]
[right]: [0.0, 1.0, 3.0]

Problem description

Two sequential calls of .sum on the rolling object produces different results if we call .count between them and min_periods=None. The default behavior of Rolling.sum if min_periods is None is to consider min_periods to be equal to the window size. Currently, Rolling.count behaves differently, and considers min_periods to be 0 if it is None. #36649 brought a warning that this behavior is deprecated and also refactored .count implementation. Right after giving a warning, it modifies the original value of min_periods of the rolling object, so the future calls of .sum and other operations give incorrect results.

def count(self):
if self.min_periods is None:
warnings.warn(
(
"min_periods=None will default to the size of window "
"consistent with other methods in a future version. "
"Specify min_periods=0 instead."
),
FutureWarning,
)
self.min_periods = 0
return super().count()

Expected Output

Rolling.count should not modify min_periods attribute of the rolling object, or if it is, revert back the original value of min_periods after performing count

Output of pd.show_versions()


INSTALLED VERSIONS
------------------
commit           : 9d598a5e1eee26df95b3910e3f2934890d062caa
python           : 3.7.7.final.0
python-bits      : 64
OS               : Linux
OS-release       : 4.15.0-50-generic
Version          : #54-Ubuntu SMP Mon May 6 18:46:08 UTC 2019
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.2.1
numpy            : 1.19.0
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 20.1.1
setuptools       : 47.3.1.post20200622
Cython           : None
pytest           : 6.0.2
hypothesis       : None
sphinx           : None
blosc            : None
feather          : 0.4.1
xlsxwriter       : None
lxml.etree       : 4.5.1
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.11.2
IPython          : None
pandas_datareader: None
bs4              : 4.9.1
bottleneck       : None
fsspec           : 0.7.4
fastparquet      : None
gcsfs            : None
matplotlib       : 3.2.2
numexpr          : 2.7.1
odfpy            : None
openpyxl         : 3.0.4
pandas_gbq       : 0.13.2
pyarrow          : 1.0.1
pyxlsb           : None
s3fs             : 0.4.2
scipy            : 1.5.1
sqlalchemy       : 1.3.18
tables           : 3.6.1
tabulate         : None
xarray           : 0.15.1
xlrd             : 1.2.0
xlwt             : None
numba            : None
@dchigarev dchigarev added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 2, 2021
@mroeschke mroeschke added the Window rolling, ewma, expanding label Feb 3, 2021
@mroeschke mroeschke self-assigned this Feb 3, 2021
@jreback jreback added this to the 1.2.2 milestone Feb 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member Window rolling, ewma, expanding
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants