read_json `engine` argument integration #49041

abkosar · 2022-10-11T04:07:21Z

Creating the PR so we can start the discussion. It's not complete yet I will keep updating the PR; however, any feedback of course is appreciated.

My main question is, there is an arrow_parser_wrapper.py file has a ArrowParserWrapper class which mainly serves the engine argument of the read_csv method. I though of extending that classes functionality so it can serve both read_json and read_csv but wanted to make get your opinion about it.

I will also add tests but I am still reading the tests part from the contributing guidelines.

closes ENH: Add engine keyword to read_json to enable reading from pyarrow #48893 (Replace xxxx with the Github issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

- added JSONEngine to _typing.py - added engine to `read_json` inputs - added engine to `read_json` docstring - added engine logic to `JsonReader` - added basis of the _make_engine method

mroeschke · 2022-10-11T16:59:27Z

pandas/io/json/_json.py

@@ -780,6 +792,7 @@ def __init__(
        precise_float: bool,
        date_unit,
        encoding,
+        engine,


This keyword will need to go at the end of the function to accommodate people passing positional args.

mroeschke · 2022-10-11T17:00:00Z

pandas/io/json/_json.py

@@ -607,6 +615,9 @@ def read_json(

        .. versionadded:: 1.3.0

+    engine : {{'ujson', 'pyarrow'}}


This should default to "ujson"

I actually have a question about this. I thought that the engine argument will be optional. What confused me is that read_json already has a parsing logic in place and I thought the engine keyword will provide two additional options. Or did I misunderstand?

The existing parsing logic is (vendored) ujson code, so that's why the "default" should be "ujson".

Yeah after I made that comment that's what I figured.

mroeschke · 2022-10-11T17:01:52Z

I though of extending that classes functionality so it can serve both read_json and read_csv but wanted to make get your opinion about it.

The CSV and JSON reading code is more-or-less distinct so I'd recommend against this.

I recommend just creating a new file in pandas/io/json like arrow_parser.py with the parsing class that can imported and used in read_json

abkosar · 2022-10-11T19:07:24Z

Yeah that makes sense. Sounds good then, will do that. Thanks for the feedback!

abkosar · 2022-10-18T00:40:06Z

I just wanted to give an update. I didn't forget about the issue, I'm working on it:) After days of staring (and exploring) between source code and me finally I was able to implement a logic the past two days and wrote a test for it. I have been trying to get the docker interpreter working so I can run the tests. After that I'll update the PR.

…das-dev#49122)

…ev#49125)

…is '' or 'NaT' (pandas-dev#49120)

* guess %Y format * fixup Co-authored-by: MarcoGorelli <>

abkosar · 2022-10-22T18:18:51Z

@mroeschke I have to close this PR since I messed up something in git flow and history got messed up while trying to squash commits. I have to create a new PR from main branch. Is there a closing process, any commands I should run?

abkosar · 2022-10-22T18:22:30Z

Opened #49249 instead of this PR.

mroeschke · 2022-10-24T17:53:14Z

Closing in favor of #49249

abkosar and others added 2 commits October 10, 2022 23:31

read_json engine argument integration

eefc8a4

- added JSONEngine to _typing.py - added engine to `read_json` inputs - added engine to `read_json` docstring - added engine logic to `JsonReader` - added basis of the _make_engine method

Merge branch 'pandas-dev:main' into main

df965f3

mroeschke added IO JSON read_json, to_json, json_normalize Arrow pyarrow functionality labels Oct 11, 2022

mroeschke reviewed Oct 11, 2022

View reviewed changes

Merge branch 'pandas-dev:main' into main

1045543

jbrockmendel and others added 7 commits October 22, 2022 13:40

REF: _reso->_creso (pandas-dev#49107)

39409f2

DOC: Fixed Issue: Typo of DataFrame.iat() in 10 minutes to panda (pan…

41c2c12

…das-dev#49122)

REF: reso->creso (pandas-dev#49123)

de6a506

DOC: fix versionchanged blank line usage (pandas-dev#49117) (pandas-d…

73b85f0

…ev#49125)

BUG: _guess_datetime_format_for_array doesn't guess if first element …

9049179

…is '' or 'NaT' (pandas-dev#49120)

BUG: guess_datetime_format doesn't guess just year (pandas-dev#49127)

28c4629

* guess %Y format * fixup Co-authored-by: MarcoGorelli <>

added ArrowJsonParser and tests

9f915d7

abkosar force-pushed the read-json-engine-argument branch from 6ff583c to 9f915d7 Compare October 22, 2022 17:47

Merge branch 'pandas-dev:main' into read-json-engine-argument

48601cc

mroeschke closed this Oct 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_json `engine` argument integration #49041

read_json `engine` argument integration #49041

abkosar commented Oct 11, 2022 •

edited

Loading

mroeschke Oct 11, 2022

mroeschke Oct 11, 2022

abkosar Oct 15, 2022 •

edited

Loading

mroeschke Oct 15, 2022

abkosar Oct 15, 2022

mroeschke commented Oct 11, 2022

abkosar commented Oct 11, 2022

abkosar commented Oct 18, 2022 •

edited

Loading

abkosar commented Oct 22, 2022 •

edited

Loading

abkosar commented Oct 22, 2022

mroeschke commented Oct 24, 2022

		@@ -607,6 +615,9 @@ def read_json(

		.. versionadded:: 1.3.0

		engine : {{'ujson', 'pyarrow'}}

read_json engine argument integration #49041

read_json engine argument integration #49041

Conversation

abkosar commented Oct 11, 2022 • edited Loading

mroeschke Oct 11, 2022

Choose a reason for hiding this comment

mroeschke Oct 11, 2022

Choose a reason for hiding this comment

abkosar Oct 15, 2022 • edited Loading

Choose a reason for hiding this comment

mroeschke Oct 15, 2022

Choose a reason for hiding this comment

abkosar Oct 15, 2022

Choose a reason for hiding this comment

mroeschke commented Oct 11, 2022

abkosar commented Oct 11, 2022

abkosar commented Oct 18, 2022 • edited Loading

abkosar commented Oct 22, 2022 • edited Loading

abkosar commented Oct 22, 2022

mroeschke commented Oct 24, 2022

read_json `engine` argument integration #49041

read_json `engine` argument integration #49041

abkosar commented Oct 11, 2022 •

edited

Loading

abkosar Oct 15, 2022 •

edited

Loading

abkosar commented Oct 18, 2022 •

edited

Loading

abkosar commented Oct 22, 2022 •

edited

Loading