-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
weird NaN in mean() of float16 series #20642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
float16 is barely supoorted u can have. a look to improve things |
well maybe this explains why my models are not training very well :) |
loaded this pickle on another machine, issue repeats exactly |
float32 is quite well supported |
closing as duplicate of #9220 |
I met the same problem when I try to reduce the memory-usage of DataFrame according to the data types. |
You have an overflow. Take the mean over a ratio,(df[col] / n).mean() * n, where n is large enough. To know how large n needs to be you can compute the sum of the column once cast into float32, and compare to the largest float16. |
@jfpuget You are correct! I was using float16 and while finding mean, sum of all the observations was out of range for float16. Changed the type to float64 and it's working. Thanks! |
I spent 5 hours on this ! 🤦🏻♂️ |
I had the same problem. Probably the definition of Probably the best approach is to use an intermediate normalization as proposed by @jfpuget :
|
I have a shuffled series with a bunch of sinvalues in float16, like this:
There's no NaN values, everything's a sinus of something:
But for some reason,
mean()
chokes somewhere in the middle like it's overflowing:And it works fine when converted to float32:
Is this weird or am I missing something about float16?
This behavior persists after pickling and loading and sorting by index, although it now chokes much earlier:
Problem description
Expected Output
Output of
pd.show_versions()
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-119-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.22.0
pytest: None
pip: 9.0.3
setuptools: 39.0.1
Cython: None
numpy: 1.14.2
scipy: 1.0.1
pyarrow: None
xarray: None
IPython: 6.3.1
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: