-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Pandas df.rolling.mean() abnormal behavior for a Series having larger numbers (in the scale of billions) #28305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Actually, similar behavior is observed with pandas 0.24 version. |
Hmm not sure I follow. Can you provide self-contained code samples to reproduce? Unfortunately screenshots are not very helpful |
Could be the same problem pointed out in this issue: #28244 |
I think I added the lines that can reproduce this behavior. Not sure if it is missing. Here it is: #Creating a dataframe with residual percent |
Thank you @dsaxton |
@bobnvk99 Thanks for the report. closing as this doesn't look actionable. We would need a minimal reproducible example to help debug the issue. see https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports the code sample is not executable, raises NameError: name 'residuals' is not defined. |
Code Sample, a copy-pastable example if possible
Problem description
I am trying to implement adjusted residual percent as the rolling average of the previous three residual percents.
Due to some data issue, model forecasted the target variable in the scale of billions, consequently effecting the residual percentage calculation.
However, this is irrelevant to the current problem, the issue is when doing the rolling average using .....rolling(window=3).mean(), the average for the rows after the abnormal residual percent got affected.
If I do take those same numbers and do the normal way of averaging or use MS Excel, I got it right.
Please see the image below:

Output of
pd.show_versions()
pandas : 0.25.1
numpy : 1.16.4
pytz : 2019.2
dateutil : 2.8.0
pip : 19.2.2
setuptools : 41.0.1
Cython : 0.29.13
pytest : 5.0.1
hypothesis : None
sphinx : 2.1.2
blosc : None
feather : None
xlsxwriter : 1.1.8
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.8.0
pandas_datareader: None
bs4 : 4.8.0
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 2.6.2
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.7
tables : 3.5.2
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.1.8
Is there anything that I am missing or unaware. I tried to check in the pandas' source code and found nothing. Irrespective of the scale of the numbers, excel and the normal way of averaging works fine.
Please advise.
I greatly appreciate your help.
Thank you.
The text was updated successfully, but these errors were encountered: