Skip to content

BUG: int64 overflow/wrap around with sum() #15453

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
xflr6 opened this issue Feb 18, 2017 · 11 comments
Closed

BUG: int64 overflow/wrap around with sum() #15453

xflr6 opened this issue Feb 18, 2017 · 11 comments
Labels
Bug good first issue Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@xflr6
Copy link
Contributor

xflr6 commented Feb 18, 2017

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd

In [2]: s = pd.Series([2**31])

In [3]: print(s.dtype, s.sum())
(dtype('int64'), -2147483648)

In [4]: pd.Series([2**31 - 1, 1]).sum()
Out[4]: -2147483648

In [5]: pd.Series([2**31 - 1, 1]).astype('int32').sum()
Out[5]: 2147483648

Problem description

negative values in [3] and [4]

Expected Output

see [5]

Output of pd.show_versions()

In [6]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 34.2.0
Cython: None
numpy: 1.12.0
scipy: 0.19.0rc1
statsmodels: 0.8.0
xarray: None
IPython: 5.2.2
sphinx: 1.5.2
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: None
numexpr: 2.6.2
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.5.3
html5lib: 0.999999999
httplib2: 0.10.3
apiclient: 1.6.2
sqlalchemy: 1.1.5
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.9.5
boto: None
pandas_datareader: None
@jreback
Copy link
Contributor

jreback commented Feb 18, 2017

(pandas2.7) C:\Users\conda\Documents\pandas2.7>ipython
Python 2.7.11 |Continuum Analytics, Inc.| (default, Feb 16 2016, 09:58:36) [MSC v.1500 64 bit (AMD64)]
Type "copyright", "credits" or "license" for more information.

In [1]: import bottleneck as bn

In [2]: bn.__version__
Out[2]: '1.2.0'

In [5]: import numpy as np

In [6]: bn.nansum(np.array([2**31],dtype='int64'))
Out[6]: -2147483648

so couple of things.

  • this is actually a bug in bottleneck itself. so please report it there. Normally when we do ops we can specify a dtype= operation for the accumulator (in fact that's exactly what we do normally). So this should support this operation as well It think.
  • this only happens on 2.7 on windows AFAICT (3.5 looks good)
  • you can provide a patch to pandas where you can modify https://github.com/pandas-dev/pandas/blob/master/pandas/core/nanops.py#L124, so that we force us NOT to use bottleneck with nansum and if we have ints that have an itemsize < 8 (be very narrow in this specification or other things might break).

@jreback jreback added Bug Difficulty Novice Numeric Operations Arithmetic, Comparison, and Logical operations labels Feb 18, 2017
@jreback jreback added this to the 0.20.0 milestone Feb 18, 2017
@xflr6
Copy link
Contributor Author

xflr6 commented Feb 18, 2017

Thanks. Also on 3.6 here, though (win7):

In [1]: import bottleneck as bn

In [2]: bn.__version__
Out[2]: '1.2.0'

In [3]: import numpy as np

In [4]: bn.nansum(np.array([2**31],dtype='int64'))
Out[4]: -2147483648

In [5]: import sys

In [6]: sys.version
Out[6]: '3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)]'

@jreback
Copy link
Contributor

jreback commented Feb 18, 2017

yeah ok with simply not using bottleneck on windows for sum always then (though this IS an API change, so needs some documentation, because nansum != sum w/o nans), see #9422 we should just change this I think (to use pandas version).

@pabloazurduy
Copy link

hi everyone, Im using the version 0.19.2 of pandas in win server and i have this overflow problem, it's there a way to solve this issue before the pandas update ? I used the .sum() function in a lot of lines in the code ..

@xflr6
Copy link
Contributor Author

xflr6 commented Apr 24, 2017

As this is an issue in bottleneck, uninstalling bottleneck should in principle be a workaround.

@pabloazurduy
Copy link

pabloazurduy commented Apr 24, 2017

It has dependency with anaconda... I will try it to remove anyway, and see what happens.

@xflr6
Copy link
Contributor Author

xflr6 commented Apr 24, 2017

The nanops._USE_BOTTLENECK flag shown in #9422 seems to work:

In [1]: import pandas as pd

In [2]: s = pd.Series([2**31])

In [3]: s.sum()
Out[3]: -2147483648

In [4]: from pandas.core import nanops

In [5]: nanops._USE_BOTTLENECK
Out[5]: True

In [6]: nanops._USE_BOTTLENECK = False

In [7]: s.sum()
Out[7]: 2147483648

@pabloazurduy
Copy link

Thanks @xflr6 that was awsome !!!

@lakshayg
Copy link

lakshayg commented Oct 20, 2017

It seems that the issue in bottleneck has been resolved @jreback. I am using bottleneck version 1.2.1. Can we just bump up the bottleneck version to >= 1.2.1 in pandas?
screenshot from 2017-10-20 10-43-30

@xflr6
Copy link
Contributor Author

xflr6 commented Oct 20, 2017

Note that this only affects Windows (see above). However, I can confirm that this is fixed in 1.2.1:

Python 2.7.14 (v2.7.14:84471935ed, Sep 16 2017, 20:25:58) [MSC v.1500 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import bottleneck as bn
>>> bn.__version__
'1.2.1'
>>> import numpy as np
>>> bn.nansum(np.array([2**31], dtype='int64'))
2147483648L

@jreback
Copy link
Contributor

jreback commented Oct 20, 2017

actually going to close this, but for another reason. in #15507 (0.21.0RC1 is out now) we no longer use bottleneck for sum or prod, so this is not an issue.

@jreback jreback closed this as completed Oct 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug good first issue Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants