Please consider switch to standard json module #24711

zhihaoy · 2019-01-10T18:18:56Z

Problem description

The ujson we currently using is not well maintained, no activity and no response since last two years:
ultrajson/ultrajson#291
and other projects are switching out of it:
fastavro/fastavro#150
It makes it hard for us to handle new feature requests:
#12213
and has bugs causing us unable to consume standard JSON files:
ultrajson/ultrajson#252

jreback · 2019-01-10T18:32:06Z

we use a vendored copy, so if you have a patch could be taken directly in pandas. the ujson is much more performant that the standard library and that is why it is used.

zhihaoy · 2019-01-10T20:12:12Z

There are benchmark showing that in large datasets, ujson doesn't show an advantage against standard library json. https://www.reddit.com/r/Python/comments/3mtswx/benchmark_of_pythons_alternative_json_libraries/ Moreover, we know as a fact that the json library in PyPy is the fastest (because it's optimized for JIT). Being said that, if someone really have concern about it, we can add a parameter to read_json families to let user chose which json-like library they want to use.

TomAugspurger · 2019-01-10T20:14:53Z

It's not just performance. The stdlib JSON module doesn't serialize things like NumPy arrays or scalars.

Does anyone know the status of Arrows's support for JSON (de)/serialization?

jreback · 2019-01-10T20:14:53Z

most folks do not use PyPy so not sure that actually matters

if you would like to add an engine kwarg to read_json would be ok

zhihaoy · 2019-01-10T20:22:01Z

It's not just performance. The stdlib JSON module doesn't serialize things like NumPy arrays or scalars.

The unreleased ujson upstream dropped this support since may 2016 (ultrajson/ultrajson@53f85b1), we still support it (buggyly ultrajson/ultrajson#221) by accident. If we want to keep this feature, we should do it properly by specializing on numpy.

jreback · 2019-01-10T21:00:11Z

I don't think arrow yet has support for JSON, cc @wesm @pitrou

pitrou · 2019-01-10T21:01:40Z

I'm not sure what you mean by support? Arrow C++ has some understanding of JSON, but it doesn't interoperate with Numpy or Pandas arrays.

wesm · 2019-01-10T21:15:51Z

@pitrou I think Jeff means support for JSON sufficient to power pandas.read_json. @bkietz is working on this in https://issues.apache.org/jira/browse/ARROW-694

I expect that we'll have support for reading and writing JSON sufficient to appease most pandas users on the timeline of Arrow 0.13 or 0.14, so most likely either by end of March or end of May

jreback · 2019-01-10T21:44:41Z

thanks @wesm yep that's what I mean. we have legacy c code in read_json and would be nice to remove it w/o sacrificing perf.

jreback · 2020-01-01T15:56:19Z

the c impl has recently been much refactored, closing this as no-action.

jreback added the IO JSON read_json, to_json, json_normalize label Jan 10, 2019

jreback added this to the No action milestone Jan 1, 2020

jreback closed this as completed Jan 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please consider switch to standard json module #24711

Please consider switch to standard json module #24711

zhihaoy commented Jan 10, 2019 •

edited

Loading

jreback commented Jan 10, 2019

zhihaoy commented Jan 10, 2019

TomAugspurger commented Jan 10, 2019

jreback commented Jan 10, 2019

zhihaoy commented Jan 10, 2019

jreback commented Jan 10, 2019

pitrou commented Jan 10, 2019

wesm commented Jan 10, 2019

jreback commented Jan 10, 2019

jreback commented Jan 1, 2020

Please consider switch to standard json module #24711

Please consider switch to standard json module #24711

Comments

zhihaoy commented Jan 10, 2019 • edited Loading

Problem description

jreback commented Jan 10, 2019

zhihaoy commented Jan 10, 2019

TomAugspurger commented Jan 10, 2019

jreback commented Jan 10, 2019

zhihaoy commented Jan 10, 2019

jreback commented Jan 10, 2019

pitrou commented Jan 10, 2019

wesm commented Jan 10, 2019

jreback commented Jan 10, 2019

jreback commented Jan 1, 2020

zhihaoy commented Jan 10, 2019 •

edited

Loading