Skip to content

pandas.io.json.dumps raises OverflowError on float32 and float16 NaNs #16686

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Kiv opened this issue Jun 12, 2017 · 6 comments
Closed

pandas.io.json.dumps raises OverflowError on float32 and float16 NaNs #16686

Kiv opened this issue Jun 12, 2017 · 6 comments
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions IO JSON read_json, to_json, json_normalize Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

Comments

@Kiv
Copy link
Contributor

Kiv commented Jun 12, 2017

Code Sample, a copy-pastable example if possible

In [3]: from pandas.io.json import dumps

In [4]: import numpy as np

In [5]: dumps(np.float64('NaN'))
Out[5]: 'null'

In [6]: dumps(np.float32('NaN'))
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
----> 1 dumps(np.float32('NaN'))

OverflowError: Invalid Nan value when encoding double

In [7]: dumps(np.float16('NaN'))
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
----> 1 dumps(np.float16('NaN'))

OverflowError: Maximum recursion level reached

Problem description

Float64 does the sensible thing. Float32 and float16 should behave consistently but they error out instead.

Expected Output

'null' for all 3 cases

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-79-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: en_CA.UTF-8

pandas: 0.20.2
pytest: None
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.13.0
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.3
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: 2.7.1 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@jreback jreback added Difficulty Intermediate IO JSON read_json, to_json, json_normalize Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Jun 13, 2017
@jreback jreback added this to the Next Major Release milestone Jun 13, 2017
@jreback
Copy link
Contributor

jreback commented Jun 13, 2017

If you want to do a PR for this would be great. Note this is actually tricky to hit in practice as an array of float (whether float 32/64 with nans) is serialized in a slightly different impl. Further we have very limited support for float16s.

@Kiv
Copy link
Contributor Author

Kiv commented Jun 13, 2017

I can try, should I be looking in pandas/_libs/src/ujson/python/objToJSON.c? Or is it somewhere else?

@Kiv
Copy link
Contributor Author

Kiv commented Jun 13, 2017

I saw in ultrajsonenc.c that the error comes from line 784:

    if (!(value == value)) {
        SetError(obj, enc, "Invalid Nan value when encoding double");
        return FALSE;
    }

Are you suggesting that this code be changed to print the null instead? Or that a different code path be taken and we don't reach here?

@Kiv
Copy link
Contributor Author

Kiv commented Jun 13, 2017

Let me know if this is the right approach, it fixes float32 and I don't care about float16:

Kiv@1457ac6

@chris-b1
Copy link
Contributor

@Kiv - that looks reasonable, although I'm not sure why float 32 isn't already hitting this path that float 64 does? (I'm not especially familiar with this code)
https://github.com/Kiv/pandas/blob/1457ac60656046b1e4ff71a24a225cba8517f3e1/pandas/_libs/src/ujson/python/objToJSON.c#L545

@jreback
Copy link
Contributor

jreback commented Jun 14, 2017

@chris-b1 if you look at the line before there is a casting to <double>. I don't actually know what this does to a np.float32('nan'), but its possible it is not then detected by isnan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions IO JSON read_json, to_json, json_normalize Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

No branches or pull requests

5 participants