Skip to content

OverflowError: Python int too large to convert to C long #20599

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
cscetbon opened this issue Apr 3, 2018 · 26 comments
Open

OverflowError: Python int too large to convert to C long #20599

cscetbon opened this issue Apr 3, 2018 · 26 comments
Labels
Bug IO JSON read_json, to_json, json_normalize

Comments

@cscetbon
Copy link

cscetbon commented Apr 3, 2018

Code Sample, a copy-pastable example if possible

import pandas

content = open('failing_pandas.json').readline()
pd = pandas.read_json(content, lines=True)

Problem description

This issue happens on 0.21.1+ and doesn't happen on 0.21.0 for instance. I also tried it using the last master branch 0.23.0 and got the same issue :

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/io/json/json.py", line 366, in read_json
    return json_reader.read()
  File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/io/json/json.py", line 464, in read
    self._combine_lines(data.split('\n'))
  File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/io/json/json.py", line 484, in _get_object_parser
    obj = FrameParser(json, **kwargs).parse()
  File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/io/json/json.py", line 582, in parse
    self._try_convert_types()
  File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/io/json/json.py", line 838, in _try_convert_types
    lambda col, c: self._try_convert_data(col, c, convert_dates=False))
  File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/io/json/json.py", line 818, in _process_converter
    new_data, result = f(col, c)
  File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/io/json/json.py", line 838, in <lambda>
    lambda col, c: self._try_convert_data(col, c, convert_dates=False))
  File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/io/json/json.py", line 652, in _try_convert_data
    new_data = data.astype('int64')
  File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/util/_decorators.py", line 118, in wrapper
    return func(*args, **kwargs)
  File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/core/generic.py", line 4004, in astype
    **kwargs)
  File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/core/internals.py", line 3462, in astype
    return self.apply('astype', dtype=dtype, **kwargs)
  File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/core/internals.py", line 3329, in apply
    applied = getattr(b, f)(**kwargs)
  File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/core/internals.py", line 544, in astype
    **kwargs)
  File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/core/internals.py", line 625, in _astype
    values = astype_nansafe(values.ravel(), dtype, copy=True)
  File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/core/dtypes/cast.py", line 692, in astype_nansafe
    return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
  File "pandas/_libs/lib.pyx", line 854, in pandas._libs.lib.astype_intsafe
  File "pandas/_libs/src/util.pxd", line 91, in util.set_value_at_unsafe
OverflowError: Python int too large to convert to C long

Expected Output

It should not crash ...

Output of pd.show_versions()

Here is the one working :

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 16.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.21.0
pytest: None
pip: 9.0.3
setuptools: 39.0.1
Cython: 0.28.1
numpy: 1.14.2
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

And one failing :

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 16.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.21.1
pytest: None
pip: 9.0.3
setuptools: 39.0.1
Cython: 0.28.1
numpy: 1.14.2
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
@TomAugspurger
Copy link
Contributor

Interested in trying to bisect where things br3oke between 0.21.0 and 0.21.1?

@TomAugspurger TomAugspurger added the IO JSON read_json, to_json, json_normalize label Apr 4, 2018
@TomAugspurger
Copy link
Contributor

We'll also need a reproducible example. read_json can take a json-string, so that should be easiest.

@TomAugspurger TomAugspurger added the Needs Info Clarification about behavior needed to assess issue label Apr 4, 2018
@cscetbon
Copy link
Author

cscetbon commented Apr 4, 2018

@TomAugspurger yes I'm interested in bisecting it. However I get a weird import issue when installing it in a local environment :

$ virtualenv env
New python executable in /Users/cscetbon/src/git/pandas/env/bin/python2.7
Also creating executable in /Users/cscetbon/src/git/pandas/env/bin/python
Installing setuptools, pip, wheel...done.
$ . env/bin/activate
$ python setup.py build_ext --inplace
$ python -m pip install -e .
Obtaining file:///Users/cscetbon/src/git/pandas
Collecting python-dateutil (from pandas==0.21.0)
  Using cached python_dateutil-2.7.2-py2.py3-none-any.whl
Collecting pytz>=2011k (from pandas==0.21.0)
  Using cached pytz-2018.3-py2.py3-none-any.whl
Collecting numpy>=1.9.0 (from pandas==0.21.0)
  Using cached numpy-1.14.2-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Collecting six>=1.5 (from python-dateutil->pandas==0.21.0)
  Using cached six-1.11.0-py2.py3-none-any.whl
Installing collected packages: six, python-dateutil, pytz, numpy, pandas
  Found existing installation: pandas 0.21.0
    Not uninstalling pandas at /Users/cscetbon/src/git/pandas, outside environment /Users/cscetbon/src/git/pandas/env
  Running setup.py develop for pandas
Successfully installed numpy-1.14.2 pandas python-dateutil-2.7.2 pytz-2018.3 six-1.11.0
$ pip freeze|grep -I panda
-e git+https://github.com/pandas-dev/pandas.git@81372093f1fdc0c07e4b45ba0f47b0360fabd405#egg=pandas
$ python -c 'import pandas; print pandas.__version__'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "pandas/__init__.py", line 42, in <module>
    from pandas.core.api import *
  File "pandas/core/api.py", line 10, in <module>
    from pandas.core.groupby import Grouper
  File "/Users/cscetbon/src/git/pandas/pandas/core/groupby/__init__.py", line 2, in <module>
  File "/Users/cscetbon/src/git/pandas/pandas/core/groupby/groupby.py", line 47, in <module>
  File "/Users/cscetbon/src/git/pandas/pandas/core/arrays/__init__.py", line 1, in <module>
  File "/Users/cscetbon/src/git/pandas/pandas/core/arrays/base.py", line 4, in <module>
ImportError: cannot import name AbstractMethodError

Any idea ?

@TomAugspurger
Copy link
Contributor

I'm not sure about these lines

  Found existing installation: pandas 0.21.0
    Not uninstalling pandas at /Users/cscetbon/src/git/pandas, outside environment /Users/cscetbon/src/git/pandas/env
  Running setup.py develop for pandas

@cscetbon
Copy link
Author

cscetbon commented Apr 4, 2018

I was able to find and solve the issue. I had to apply the following patch on v0.21.0 :

pandas.txt

This issue wasn't fixed by cf9f513

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Apr 4, 2018

@so your original issue is not fixed on master? Can you submit a PR fixing it, along with tests & a release note? Thanks.

@cscetbon
Copy link
Author

cscetbon commented Apr 4, 2018

Yes it's not fixed on the master branch. It'll have to wait a bit for me to find some time. Don't you think the OverflowError exception should be caught everywhere though ? I don't really have the answer but it seems it could happen with other types like float64 for instance

@jorisvandenbossche jorisvandenbossche added this to the 0.23.0 milestone Apr 4, 2018
@gfyoung gfyoung added Needs Info Clarification about behavior needed to assess issue and removed Needs Info Clarification about behavior needed to assess issue labels Apr 10, 2018
@gfyoung
Copy link
Member

gfyoung commented Apr 10, 2018

Sorry, relabeling this because I don't know yet how to reproduce.

@jreback jreback modified the milestones: 0.23.0, Next Major Release Apr 14, 2018
@ssikdar1
Copy link
Contributor

ssikdar1 commented Jun 8, 2018

As far as the 0.23.0 release it still excepts but is a ValueError

>>> import json
>>> import pandas as pd
>>> foo = 2**100000
>>> bar = {"foo": foo}
>>> baz = json.dumps(bar)
>>> pd = pd.read_json(baz)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/shan/test/lib/python3.6/site-packages/pandas/io/json/json.py", line 422, in read_json
    result = json_reader.read()
  File "/home/shan/test/lib/python3.6/site-packages/pandas/io/json/json.py", line 529, in read
    obj = self._get_object_parser(self.data)
  File "/home/shan/test/lib/python3.6/site-packages/pandas/io/json/json.py", line 546, in _get_object_parser
    obj = FrameParser(json, **kwargs).parse()
  File "/home/shan/test/lib/python3.6/site-packages/pandas/io/json/json.py", line 638, in parse
    self._parse_no_numpy()
  File "/home/shan/test/lib/python3.6/site-packages/pandas/io/json/json.py", line 853, in _parse_no_numpy
    loads(json, precise_float=self.precise_float), dtype=None)
ValueError: Value is too big

$ python -c "import pandas as pd; pd.show_versions()" | grep pandas
pandas: 0.23.0

@gfyoung
Copy link
Member

gfyoung commented Jun 8, 2018

@ssikdar1 : Was this example working on a previous version?

@cscetbon
Copy link
Author

cscetbon commented Jun 8, 2018

sorry guys I really didn't have time. If someone can start working from the patch I sent that'd be great.

@ssikdar1
Copy link
Contributor

ssikdar1 commented Jun 8, 2018

For v22 i get the same error:

    self._parse_no_numpy()
  File "/Users/ssikdar/workspace27/acquire-expand/workspace3/lib/python3.6/site-packages/pandas/io/json/json.py", line 793, in _parse_no_numpy
    loads(json, precise_float=self.precise_float), dtype=None)
ValueError: Value is too big
>>> 

$ python -c "import pandas as pd; pd.show_versions()" | grep pandas
pandas: 0.22.0
pandas_gbq: None
pandas_datareader: None

@gfyoung
Copy link
Member

gfyoung commented Jun 8, 2018

sorry guys I really didn't have time. If someone can start working from the patch I sent that'd be great.

@cscetbon : Thanks for letting us know! We can continue on from here.

@ssikdar1 : Does your code happen to work for 0.21.0 by any chance? BTW, you're going to have to provide an index for this to work (try your example with a smaller value for foo).

@gfyoung
Copy link
Member

gfyoung commented Jun 8, 2018

@cscetbon : Do you have an example that we can use to test your patch? That would be helpful actually.

@ssikdar1
Copy link
Contributor

ssikdar1 commented Jun 9, 2018

@gfyoung same error for 21 unfortunately

digging deeper on 23:

  File "/home/shan/test23/lib/python3.6/site-packages/pandas/io/json/json.py", line 853, in _parse_no_numpy
    loads(json, precise_float=self.precise_float), dtype=None)
>>> import pandas._libs.json as json
>>> json.loads(json.dumps({'f':2**10000, 'b': 'sh'}))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: int too big to convert
>>> 
>>> import json
>>> json.loads(json.dumps({'f':2**10000, 'b': 'sh'}))
{'f': 19950631168807583848837421626835850838234968318861924548520089498529438830221946631919961684036194597899331129423209124271556491349413781117593785932096323957855730046793794526765246551266059895520550086918193311542508608460618104685509074866089624888090489894838009253941633257850621568309473902556912388065225096643874441046759871626985453222868538161694315775629640762836880760732228535091641476183956381458969463899410840960536267821064621427333394036525565649530603142680234969400335934316651459297773279665775606172582031407994198179607378245683762280037302885487251900834464581454650557929601414833921615734588139257095379769119277800826957735674444123062018757836325502728323789270710373802866393031428133241401624195671690574061419654342324638801248856147305207431992259611796250130992860241708340807605932320161268492288496255841312844061536738951487114256315111089745514203313820202931640957596464756010405845841566072044962867016515061920631004186422275908670900574606417856951911456055068251250406007519842261898059237118054444788072906395242548339221982707404473162376760846613033778706039803413197133493654622700563169937455508241780972810983291314403571877524768509857276937926433221599399876886660808368837838027643282775172273657572744784112294389733810861607423253291974813120197604178281965697475898164531258434135959862784130128185406283476649088690521047580882615823961985770122407044330583075869039319604603404973156583208672105913300903752823415539745394397715257455290510212310947321610753474825740775273986348298498340756937955646638621874569499279016572103701364433135817214311791398222983845847334440270964182851005072927748364550578634501100852987812389473928699540834346158807043959118985815145779177143619698728131459483783202081474982171858011389071228250905826817436220577475921417653715687725614904582904992461028630081535583308130101987675856234343538955409175623400844887526162643568648833519463720377293240094456246923254350400678027273837755376406726898636241037491410966718557050759098100246789880178271925953381282421954028302759408448955014676668389697996886241636313376393903373455801407636741877711055384225739499110186468219696581651485130494222369947714763069155468217682876200362777257723781365331611196811280792669481887201298643660768551639860534602297871557517947385246369446923087894265948217008051120322365496288169035739121368338393591756418733850510970271613915439590991598154654417336311656936031122249937969999226781732358023111862644575299135758175008199839236284615249881088960232244362173771618086357015468484058622329792853875623486556440536962622018963571028812361567512543338303270029097668650568557157505516727518899194129711337690149916181315171544007728650573189557450920330185304847113818315407324053319038462084036421763703911550639789000742853672196280903477974533320468368795868580237952218629120080742819551317948157624448298518461509704888027274721574688131594750409732115080498190455803416826949787141316063210686391511681774304792596709376, 'b': 'sh'}

@gfyoung
Copy link
Member

gfyoung commented Jun 9, 2018

@ssikdar1 : That definitely looks like a _libs/src/ujson investigation. That being said, your example from above still doesn't work even if I pass in a smaller value

@cscetbon
Copy link
Author

Hey @gfyoung, you can use the following content :

{"a":"7868170657351128032018"},{"a":""}

If I change it to

{"a":"7868170657351128032018"},{"a":"10"}

It works .. The patch I provided allows to get the same behavior as before the change. However, at that time, I didn't know that the second content would work which now makes me think there might a bug somewhere.

@gfyoung
Copy link
Member

gfyoung commented Jun 10, 2018

@cscetbon : Thanks for this! That is indeed strange.

@jreback
Copy link
Contributor

jreback commented Jun 10, 2018

you should not need to touch the ujson code at all here - it cannot work with larger than uint64
the error above is trying to convert to a proper int64 - you need to catch the overflow and coerce to object dtype

@gfyoung
Copy link
Member

gfyoung commented Jun 10, 2018

@jreback : That makes sense. That was also what was proposed by @cscetbon above

That being said, patching is a little tricky since the issue emerges from argument validation on the json.loads call, which is all C. Thus, instead of aliasing loads to json.loads, we could define loads to wrap json.loads as follows:

def loads(*args, **kwargs):
    try:
        return json.loads(*args, **kwargs)
    except OverflowError:
        # type coercion, etc.

@gfyoung gfyoung removed the Needs Info Clarification about behavior needed to assess issue label Jun 10, 2018
@jreback
Copy link
Contributor

jreback commented Jun 10, 2018

no patching like this will not be accepted

there are 2 issues:

  • coercion to datetimes (can simply catch overflow error)
  • ujson parsing into an overflow - this is pretty tricky

@mondaysunrise
Copy link

mondaysunrise commented Dec 15, 2019

Still got this error today
Python int too large to convert to C ssize_t

image

Code to reproduce

import pandas as pd
e = 4
rng_srt = 9*10**300 # range start
rng_end = 11*10**300 # range end

p = pd.DataFrame(dtype=object) # potencies p

p['b'] = pd.Series(range(rng_srt,rng_end+1)) # base b
p['e'] = n # exponent e
p['v'] = [value**e for value in p['b']] # value v

p.tail()

@jreback
Copy link
Contributor

jreback commented Dec 15, 2019

@mondaysunrise you are commenting on an issue about json parsing

you cannot hold these large ints directly and must use object dtype on the Series you are constructing

@mondaysunrise
Copy link

Okay, sorry, I was missing that. Thank you for telling me.

@arw2019
Copy link
Member

arw2019 commented Jun 19, 2020

take

I'd like to fix this in the ujson implementation similarly to #34473

@deponovo
Copy link
Contributor

Posted test case in #26068

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO JSON read_json, to_json, json_normalize
Projects
None yet
10 participants