Skip to content

Support Infinity, -Infinity and NaN in read_json #12213

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
maxnoe opened this issue Feb 2, 2016 · 10 comments · Fixed by #30295
Closed

Support Infinity, -Infinity and NaN in read_json #12213

maxnoe opened this issue Feb 2, 2016 · 10 comments · Fixed by #30295
Labels
Enhancement IO JSON read_json, to_json, json_normalize
Milestone

Comments

@maxnoe
Copy link

maxnoe commented Feb 2, 2016

While these special values are not strictly standard conform, most implementations do allow or use them.

For example Google's GSON Library or python's json module in the standard library.

@jreback
Copy link
Contributor

jreback commented Feb 2, 2016

nan's are already supported.

In [1]: df = DataFrame({'A' : [np.nan,1,np.inf,-np.inf]})

In [2]: df.to_json(None)
Out[2]: '{"A":{"0":null,"1":1.0,"2":null,"3":null}}'

@jreback jreback added Enhancement IO JSON read_json, to_json, json_normalize Difficulty Intermediate labels Feb 2, 2016
@jreback jreback added this to the Next Major Release milestone Feb 2, 2016
@maxnoe
Copy link
Author

maxnoe commented Feb 2, 2016

But not NaN for reading: pd.read_json('{"a": [NaN, Infinity, -Infinity]}')

@jreback
Copy link
Contributor

jreback commented Feb 2, 2016

that's not valid json

@maxnoe
Copy link
Author

maxnoe commented Feb 2, 2016

I know, that's what I wrote in the first post. But it is a commonly used extension.

See the description in Google's GSON package:

Section 2.4 of JSON specification disallows special double values (NaN, Infinity, -Infinity). However, Javascript specification (see section 4.3.20, 4.3.22, 4.3.23) allows these values as valid Javascript values. Moreover, most JavaScript engines will accept these special values in JSON without problem. So, at a practical level, it makes sense to accept these values as valid JSON even though JSON specification disallows them.

Or python's json module:

It also understands NaN, Infinity, and -Infinity as their corresponding float values, which is outside the JSON spec.

@maxnoe
Copy link
Author

maxnoe commented Feb 2, 2016

Also interesting: the json specifications explicitly allow to accept extensions of the standard:
https://tools.ietf.org/html/rfc7159.html#section-9

A JSON parser transforms a JSON text into another representation. A
JSON parser MUST accept all texts that conform to the JSON grammar.
A JSON parser MAY accept non-JSON forms or extensions.

@kbruegge
Copy link

kbruegge commented Feb 4, 2016

👍 This would indeed be very helpful for interoperability between a pandas backend and a JS frontend.

@zack-sampson
Copy link

I took a crack at this and discovered that the json library underlying read_json, ujson, doesn't handle NaN nor infinity. In previous discussions (more) the authors of ujson resisted adding support because it's out of spec, and there's no way to specify a custom decodor nor encoder with ujson.

@maciejkula
Copy link

The newest master throws a ValueError: Expected object or value exception when trying to parse JSON containing inf of nan values. While I understand that these are not valid JSON, it might be helpful to throw more informative exceptions to help the user debug the issue.

Script to reproduce (using Hypothesis):

import json

from hypothesis import assume, example, given
import hypothesis.strategies as st

import pandas as pd

FAILING_EXAMPLES = (
    [{u'': float('inf')}],
    [{u'': float('nan')}],
)

@given(st.lists(st.dictionaries(st.text(),
                                st.floats() | st.booleans() | st.text() | st.none())))
@example([{u'': float('inf')}])
@example([{u'': float('nan')}])
def test_load_json(test_input):

    df = pd.read_json(json.dumps(test_input))

@yifeim
Copy link

yifeim commented Jul 31, 2018

Hi there,

Thanks for the thorough discussions. We would suppose that pandas to be friendly to data scientists and a little surprised to find compatibility issues with IEEE float standards. Infinity/-Infinity is particularly useful because it is flexible enough to be clipped to the max/min value in any valid ranges. This is not possible with NaN/null.

As of today, it is possible to encode Infinity using default_handler, but not to decode it, correct?

Thanks.

nzjrs added a commit to loopbio/imgstore that referenced this issue Jun 26, 2019
pandas choosing not to be pragmatic and not support them,
so I choose to use the standard library json module which
does

pandas-dev/pandas#12213
@jreback jreback modified the milestones: Contributions Welcome, 1.0 Dec 27, 2019
@maxnoe
Copy link
Author

maxnoe commented Jan 2, 2020

Finally! Thanks to anyone involved in making this possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants