Skip to content

BUG: pandas EWM fails silently if data types are float32 instead of float64 #42452

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task
PCerles opened this issue Jul 9, 2021 · 6 comments · Fixed by #42650
Closed
1 task

BUG: pandas EWM fails silently if data types are float32 instead of float64 #42452

PCerles opened this issue Jul 9, 2021 · 6 comments · Fixed by #42650
Assignees
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Regression Functionality that used to work in a prior pandas version Window rolling, ewma, expanding
Milestone

Comments

@PCerles
Copy link
Contributor

PCerles commented Jul 9, 2021

  • [X ] I have checked that this issue has not already been reported.

  • [ X] I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import numpy as np
import pandas as pd

kk = pd.DataFrame(np.random.rand(20, 3))
kk = kk.astype(np.float32)
print(kk.ewm(alpha=0.5, axis=1).mean().shape)
# (20, 0)

kk = kk.astype(np.float64)
print(kk.ewm(alpha=0.5, axis=1).mean().shape)

# (20, 3)

Problem description

This should not depend on dtypes.

Expected Output

Output of pd.show_versions()

commit : 7c48ff4
python : 3.7.10.final.0
python-bits : 64
OS : Darwin
OS-release : 20.5.0
Version : Darwin Kernel Version 20.5.0: Sat May 8 05:10:33 PDT 2021; root:xnu-7195.121.3~9/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.5
numpy : 1.17.4
pytz : 2020.1
dateutil : 2.8.1
pip : 21.1.2
setuptools : 57.0.0
Cython : None
pytest : 6.2.2
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.5 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : 7.23.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : 0.4.1
gcsfs : None
matplotlib : 3.3.4
numexpr : None
odfpy : None
openpyxl : 3.0.6
pandas_gbq : None
pyarrow : 0.17.1
pyxlsb : None
s3fs : None
scipy : 1.5.4
sqlalchemy : 1.3.19
tables : None
tabulate : 0.8.9
xarray : None
xlrd : 1.2.0
xlwt : None
numba : 0.53.1

@PCerles PCerles added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 9, 2021
@rhshadrach
Copy link
Member

@PCerles - can you provide a reproducible example and expected output.

@PCerles
Copy link
Contributor Author

PCerles commented Jul 9, 2021

Added!

@rhshadrach
Copy link
Member

Thanks - it appears to me the issue is in the usage of select_dtypes:

if self.axis == 1:
# GH: 20649 in case of mixed dtype and axis=1 we have to convert everything
# to float to calculate the complete row at once. We exclude all non-numeric
# dtypes.
obj = obj.select_dtypes(include=["integer", "float"], exclude=["timedelta"])
obj = obj.astype("float64", copy=False)
obj._mgr = obj._mgr.consolidate()

This drops float32 columns. Under the current implementation, allowing the float32 columns to stay they would then become float64 (even if the cast is removed from this block!), which is perhaps undesirable. Further investigations and PRs to improve are welcome.

@rhshadrach rhshadrach added Dtype Conversions Unexpected or buggy dtype conversions Window rolling, ewma, expanding and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 9, 2021
@rhshadrach rhshadrach added this to the Contributions Welcome milestone Jul 9, 2021
@debnathshoham
Copy link
Member

take

@jreback jreback modified the milestones: Contributions Welcome, 1.4 Jul 25, 2021
simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue Aug 3, 2021
@simonjayhawkins
Copy link
Member

The code sample appears to worked in 1.1.5 and the issue not confined to EWM but rolling in general. (test_rolling_float_dtype added in #42650)

first bad commit: [00a510b] [BUG]: Rolling.sum() calculated wrong values when axis is one and dtypes are mixed (#36458)

cc @phofl

@simonjayhawkins simonjayhawkins added the Regression Functionality that used to work in a prior pandas version label Aug 3, 2021
@simonjayhawkins
Copy link
Member

maybe duplicate of #41779

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Regression Functionality that used to work in a prior pandas version Window rolling, ewma, expanding
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants