-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
read_json engine
argument integration
#49041
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- added JSONEngine to _typing.py - added engine to `read_json` inputs - added engine to `read_json` docstring - added engine logic to `JsonReader` - added basis of the _make_engine method
pandas/io/json/_json.py
Outdated
@@ -780,6 +792,7 @@ def __init__( | |||
precise_float: bool, | |||
date_unit, | |||
encoding, | |||
engine, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This keyword will need to go at the end of the function to accommodate people passing positional args.
pandas/io/json/_json.py
Outdated
@@ -607,6 +615,9 @@ def read_json( | |||
|
|||
.. versionadded:: 1.3.0 | |||
|
|||
engine : {{'ujson', 'pyarrow'}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should default to "ujson"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually have a question about this. I thought that the engine
argument will be optional. What confused me is that read_json
already has a parsing logic in place and I thought the engine
keyword will provide two additional options. Or did I misunderstand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing parsing logic is (vendored) ujson code, so that's why the "default" should be "ujson".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah after I made that comment that's what I figured.
The CSV and JSON reading code is more-or-less distinct so I'd recommend against this. I recommend just creating a new file in |
Yeah that makes sense. Sounds good then, will do that. Thanks for the feedback! |
I just wanted to give an update. I didn't forget about the issue, I'm working on it:) After days of staring (and exploring) between source code and me finally I was able to implement a logic the past two days and wrote a test for it. I have been trying to get the docker interpreter working so I can run the tests. After that I'll update the PR. |
* guess %Y format * fixup Co-authored-by: MarcoGorelli <>
6ff583c
to
9f915d7
Compare
@mroeschke I have to close this PR since I messed up something in git flow and history got messed up while trying to squash commits. I have to create a new PR from main branch. Is there a closing process, any commands I should run? |
Opened #49249 instead of this PR. |
Closing in favor of #49249 |
Creating the PR so we can start the discussion. It's not complete yet I will keep updating the PR; however, any feedback of course is appreciated.
My main question is, there is an
arrow_parser_wrapper.py
file has aArrowParserWrapper
class which mainly serves theengine
argument of theread_csv
method. I though of extending that classes functionality so it can serve bothread_json
andread_csv
but wanted to make get your opinion about it.I will also add tests but I am still reading the tests part from the contributing guidelines.
engine
keyword toread_json
to enable reading from pyarrow #48893 (Replace xxxx with the Github issue number)doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.