-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: Expand reference doc for read_json #14284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I don't think the failing CI checks are a consequence of the changes I propose in this PR, which are literally only changing python comments. Is there anything I can do to get the PR a clean bill of health? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cswarth Thanks a lot! Clearer docs are always welcome.
I put some small comments
@@ -123,32 +123,39 @@ def read_json(path_or_buf=None, orient=None, typ='frame', dtype=True, | |||
file. For file URLs, a host is expected. For instance, a local file | |||
could be ``file://localhost/path/to/table.json`` | |||
|
|||
orient | |||
orient : string, indicating the expected format of the JSON input. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put the explanation on the next line? (but leave the type (so 'string') on this one)
orient | ||
orient : string, indicating the expected format of the JSON input. | ||
The set of allowed orients changes depending on the value | ||
of the ``typ`` parameter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to closely follow the numpy docstring standard, refering to other keywords would be with single backticks instead of double
strings produced by ``to_json()`` with a corresponding value | ||
of ``orient``. | ||
|
||
- ``'split'`` : dict like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The extra indentation compared to the previous paragraph is not needed
>>> df = pd.DataFrame([['a', 'b'], ['c', 'd']], | ||
index=['row 1', 'row 2'], | ||
columns=['col 1', 'col 2']) | ||
>>> print df |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'print' is not needed
-------- | ||
|
||
>>> df = pd.DataFrame([['a', 'b'], ['c', 'd']], | ||
index=['row 1', 'row 2'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you align this nicer?
>>> for orient in ['split', 'records', 'index']: | ||
str = df.to_json(orient=orient) | ||
print "'{}': '{}'".format(orient, str) | ||
pd.read_json(str, orient=orient) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just use separate lines instead of the for loop, I personally think this is going to be clearer for the reader
I mean like
>>> df.to_json(orient='split')
.. output ..
>>> df.to_json(orient='records')
.. output ..
....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you think of the following examples? We're trying to document pd.read_json()
, but df.to_json()
is along for the ride as a convenient source of well-formatted JSON strings.
The results are a little artificial in that I had to reformat the output of df.to_json(orient='split')
to avoid the flake8-imposed constraint on line length.
I also used '' to retrieve previous results, but that syntax is not available in ipython when the prompt is '>>> ', as that indicates previous result caching is turned off. I think using '' makes the examples a lot easier to understand, but they won't work if pasted into %doctest_mode
Current coverage is 85.25% (diff: 100%)@@ master #14284 diff @@
==========================================
Files 140 140
Lines 50579 50579
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
- Hits 43123 43122 -1
- Misses 7456 7457 +1
Partials 0 0
|
Preview of how the documentation looks after incorporating review comments.
Examples >>> df = pd.DataFrame([['a', 'b'], ['c', 'd']], ... index=['row 1', 'row 2'], ... columns=['col 1', 'col 2']) >>> df.to_json(orient='split') '{"columns":["col 1","col 2"], "index":["row 1","row 2"], "data":[["a","b"],["c","d"]]}' >>> pd.read_json(_, orient='split') col 1 col 2 row 1 a b row 2 c d >>> df.to_json(orient='records') '[{"col 1":"a","col 2":"b"},{"col 1":"c","col 2":"d"}]' >>> pd.read_json(_, orient='records') col 1 col 2 0 a b 1 c d >>> df.to_json(orient='index') '{"row 1":{"col 1":"a","col 2":"b"},"row 2":{"col 1":"c","col 2":"d"}}' >>> pd.read_json(_, orient='index') col 1 col 2 row 1 a b row 2 c d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes looking good! (left some small further comments)
No problem in adapting the output of to_json
to satisfy flake
Maybe it would be nice to also have an example showing the use of typ
? (but can also leave for other PR)
@@ -122,33 +122,42 @@ def read_json(path_or_buf=None, orient=None, typ='frame', dtype=True, | |||
The string could be a URL. Valid URL schemes include http, ftp, s3, and | |||
file. For file URLs, a host is expected. For instance, a local file | |||
could be ``file://localhost/path/to/table.json`` | |||
meta_prefix : string, default None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy-pasta error - removed
``'columns'``, and ``'records'``. | ||
|
||
|
||
The value of `orient` specifies the expected format of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two blank lines are not needed above this one (one blank line is OK).
But something else: would it make it more clear to first list the possibilities, and then which of those is the default/accepted value depending on the type? (just an idea)
'{"columns":["col 1","col 2"], | ||
"index":["row 1","row 2"], | ||
"data":[["a","b"],["c","d"]]}' | ||
<BLANKLINE> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this is to have a blank line in the resulting code block, but to keep it as one code block? (so it's clearer they belong together).
That's a good idea I think, only a pity for the plain text docstring ..
BTW, you can also put some 'introducing' text in between the code examples when this can make it clearer what you are showing. (and that can also help delineate the different examples)
@cswarth Do you have time to update this? It's a really nice improvement of the docstring! |
I'm mystified and could use some help to figure out what's going on. I pushed a commit to my branch to address your review, but this PR is not picking up the changes. I can see the commit on the branch, but the commits link at the top of this page insists there are only two commits for this PR. I can't figure out what I've screwed up here. |
@cswarth yeah we changed the base github domain to pandas-dev and it seems can push existing PR's. So close this, one and open a new PR. |
Closing to move PR to new github domain |
git diff upstream/master | flake8 --diff
pandas.read_json()
, especially concentrating on theorient
parameter. Also added some example usage code and explicitly mentionto_json()
as a source of valid JSON strings.pandas.read_json¶
pandas.
read_json
(path_or_buf=None, orient=None, typ='frame', dtype=True, convert_axes=True, convert_dates=True, keep_default_dates=True, numpy=False, precise_float=False, date_unit=None, encoding=None, lines=False)[source]¶Convert a JSON string to pandas object
path_or_buf : a valid JSON string or file-like, default: None
orient : string, indicating the expected format of the JSON input.
typ : type of object to recover (series or frame), default ‘frame’
dtype : boolean or dict, default True
convert_axes : boolean, default True
convert_dates : boolean, default True
keep_default_dates : boolean, default True
numpy : boolean, default False
precise_float : boolean, default False
date_unit : string, default None
lines : boolean, default False
encoding : str, default is ‘utf-8’
result : Series or DataFrame, depending on the value of
typ
.Examples