BUG: int64 overflow/wrap around with sum() #15453

xflr6 · 2017-02-18T13:36:10Z

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd

In [2]: s = pd.Series([2**31])

In [3]: print(s.dtype, s.sum())
(dtype('int64'), -2147483648)

In [4]: pd.Series([2**31 - 1, 1]).sum()
Out[4]: -2147483648

In [5]: pd.Series([2**31 - 1, 1]).astype('int32').sum()
Out[5]: 2147483648

Problem description

negative values in [3] and [4]

Expected Output

see [5]

Output of `pd.show_versions()`

In [6]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 34.2.0
Cython: None
numpy: 1.12.0
scipy: 0.19.0rc1
statsmodels: 0.8.0
xarray: None
IPython: 5.2.2
sphinx: 1.5.2
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: None
numexpr: 2.6.2
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.5.3
html5lib: 0.999999999
httplib2: 0.10.3
apiclient: 1.6.2
sqlalchemy: 1.1.5
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.9.5
boto: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jreback · 2017-02-18T14:53:20Z

(pandas2.7) C:\Users\conda\Documents\pandas2.7>ipython
Python 2.7.11 |Continuum Analytics, Inc.| (default, Feb 16 2016, 09:58:36) [MSC v.1500 64 bit (AMD64)]
Type "copyright", "credits" or "license" for more information.

In [1]: import bottleneck as bn

In [2]: bn.__version__
Out[2]: '1.2.0'

In [5]: import numpy as np

In [6]: bn.nansum(np.array([2**31],dtype='int64'))
Out[6]: -2147483648

so couple of things.

this is actually a bug in bottleneck itself. so please report it there. Normally when we do ops we can specify a dtype= operation for the accumulator (in fact that's exactly what we do normally). So this should support this operation as well It think.
this only happens on 2.7 on windows AFAICT (3.5 looks good)
you can provide a patch to pandas where you can modify https://github.com/pandas-dev/pandas/blob/master/pandas/core/nanops.py#L124, so that we force us NOT to use bottleneck with nansum and if we have ints that have an itemsize < 8 (be very narrow in this specification or other things might break).

xflr6 · 2017-02-18T15:06:32Z

Thanks. Also on 3.6 here, though (win7):

In [1]: import bottleneck as bn

In [2]: bn.__version__
Out[2]: '1.2.0'

In [3]: import numpy as np

In [4]: bn.nansum(np.array([2**31],dtype='int64'))
Out[4]: -2147483648

In [5]: import sys

In [6]: sys.version
Out[6]: '3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)]'

jreback · 2017-02-18T15:32:25Z

yeah ok with simply not using bottleneck on windows for sum always then (though this IS an API change, so needs some documentation, because nansum != sum w/o nans), see #9422 we should just change this I think (to use pandas version).

pabloazurduy · 2017-04-24T18:39:22Z

hi everyone, Im using the version 0.19.2 of pandas in win server and i have this overflow problem, it's there a way to solve this issue before the pandas update ? I used the .sum() function in a lot of lines in the code ..

xflr6 · 2017-04-24T18:44:04Z

As this is an issue in bottleneck, uninstalling bottleneck should in principle be a workaround.

pabloazurduy · 2017-04-24T18:53:11Z

It has dependency with anaconda... I will try it to remove anyway, and see what happens.

xflr6 · 2017-04-24T18:55:53Z

The nanops._USE_BOTTLENECK flag shown in #9422 seems to work:

In [1]: import pandas as pd

In [2]: s = pd.Series([2**31])

In [3]: s.sum()
Out[3]: -2147483648

In [4]: from pandas.core import nanops

In [5]: nanops._USE_BOTTLENECK
Out[5]: True

In [6]: nanops._USE_BOTTLENECK = False

In [7]: s.sum()
Out[7]: 2147483648

pabloazurduy · 2017-04-24T19:16:04Z

Thanks @xflr6 that was awsome !!!

lakshayg · 2017-10-20T05:24:01Z

It seems that the issue in bottleneck has been resolved @jreback. I am using bottleneck version 1.2.1. Can we just bump up the bottleneck version to >= 1.2.1 in pandas?

xflr6 · 2017-10-20T07:07:10Z

Note that this only affects Windows (see above). However, I can confirm that this is fixed in 1.2.1:

Python 2.7.14 (v2.7.14:84471935ed, Sep 16 2017, 20:25:58) [MSC v.1500 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import bottleneck as bn
>>> bn.__version__
'1.2.1'
>>> import numpy as np
>>> bn.nansum(np.array([2**31], dtype='int64'))
2147483648L

jreback · 2017-10-20T13:23:39Z

actually going to close this, but for another reason. in #15507 (0.21.0RC1 is out now) we no longer use bottleneck for sum or prod, so this is not an issue.

jreback added Bug Difficulty Novice Numeric Operations Arithmetic, Comparison, and Logical operations labels Feb 18, 2017

jreback added this to the 0.20.0 milestone Feb 18, 2017

xflr6 mentioned this issue Feb 18, 2017

int64 overflow/wrap around with nansum() pydata/bottleneck#163

Closed

cpcloud added the HackIllinois 2017 label Feb 25, 2017

JaviSorribes mentioned this issue Feb 25, 2017

BUG: Fix nansum overflow on Windows with bottleneck #15507

Closed

4 tasks

jreback mentioned this issue Mar 1, 2017

API: sum of Series of all NaN should return 0 or NaN ? #9422

Closed

1 task

jreback modified the milestones: 0.20.0, 0.21.0 Mar 23, 2017

jreback mentioned this issue May 4, 2017

RLS: 0.20.0 #16049

Closed

cgohlke mentioned this issue May 5, 2017

Preparing to release bottleneck 1.2.1 pydata/bottleneck#168

Closed

jreback modified the milestones: Interesting Issues, 0.21.0 May 7, 2017

TomAugspurger added the good first issue label Oct 11, 2017

jreback closed this as completed Oct 20, 2017

jreback modified the milestones: Interesting Issues, 0.21.0 Oct 20, 2017

lumbric mentioned this issue Feb 13, 2019

Wrong result for float32 Series when using bottleneck #25307

Closed

jsturdy mentioned this issue Jul 31, 2019

Feature Request: Drop Bits [31:16] of ChipID if rawID is False cms-gem-daq-project/cmsgemos#297

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: int64 overflow/wrap around with sum() #15453

BUG: int64 overflow/wrap around with sum() #15453

xflr6 commented Feb 18, 2017

jreback commented Feb 18, 2017

xflr6 commented Feb 18, 2017

jreback commented Feb 18, 2017

pabloazurduy commented Apr 24, 2017

xflr6 commented Apr 24, 2017

pabloazurduy commented Apr 24, 2017 •

edited

Loading

xflr6 commented Apr 24, 2017

pabloazurduy commented Apr 24, 2017

lakshayg commented Oct 20, 2017 •

edited

Loading

xflr6 commented Oct 20, 2017

jreback commented Oct 20, 2017 •

edited

Loading

BUG: int64 overflow/wrap around with sum() #15453

BUG: int64 overflow/wrap around with sum() #15453

Comments

xflr6 commented Feb 18, 2017

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

jreback commented Feb 18, 2017

xflr6 commented Feb 18, 2017

jreback commented Feb 18, 2017

pabloazurduy commented Apr 24, 2017

xflr6 commented Apr 24, 2017

pabloazurduy commented Apr 24, 2017 • edited Loading

xflr6 commented Apr 24, 2017

pabloazurduy commented Apr 24, 2017

lakshayg commented Oct 20, 2017 • edited Loading

xflr6 commented Oct 20, 2017

jreback commented Oct 20, 2017 • edited Loading

Output of `pd.show_versions()`

pabloazurduy commented Apr 24, 2017 •

edited

Loading

lakshayg commented Oct 20, 2017 •

edited

Loading

jreback commented Oct 20, 2017 •

edited

Loading