DOC: Expand reference doc for read_json #14284

cswarth · 2016-09-23T01:41:01Z

closes #xxxx
tests added / passed
passes git diff upstream/master | flake8 --diff
expanded reference document for pandas.read_json(), especially concentrating on the orient parameter. Also added some example usage code and explicitly mention to_json() as a source of valid JSON strings.

pandas.read_json¶

pandas.read_json(path_or_buf=None, orient=None, typ='frame', dtype=True, convert_axes=True, convert_dates=True, keep_default_dates=True, numpy=False, precise_float=False, date_unit=None, encoding=None, lines=False)[source]¶

Convert a JSON string to pandas object

Parameters:	path_or_buf : a valid JSON string or file-like, default: None The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. For instance, a local file could be `file://localhost/path/to/table.json` orient : string, indicating the expected format of the JSON input. The set of allowed orients changes depending on the value of the `typ` parameter. when `typ == 'series'`, allowed orients are `{'split','records','index'}` default is `'index'` The Series index must be unique for orient `'index'`. when `typ == 'frame'`, allowed orients are `{'split','records','index', 'columns','values'}` default is `'columns'` The DataFrame index must be unique for orients ‘index’ and ‘columns’. The DataFrame columns must be unique for orients ‘index’, ‘columns’, and ‘records’. The value of `orient` specifies the expected format of the JSON string. The expected JSON formats are compatible with the strings produced by `to_json()` with a corresponding value of `orient`. `'split'` : dict like `{index -> [index], columns -> [columns], data -> [values]}` `'records'` : list like `[{column -> value}, ... , {column -> value}]` `'index'` : dict like `{index -> {column -> value}}` `'columns'` : dict like `{column -> {index -> value}}` `'values'` : just the values array typ : type of object to recover (series or frame), default ‘frame’ dtype : boolean or dict, default True If True, infer dtypes, if a dict of column to dtype, then use those, if False, then don’t infer dtypes at all, applies only to the data. convert_axes : boolean, default True Try to convert the axes to the proper dtypes. convert_dates : boolean, default True List of columns to parse for dates; If True, then try to parse datelike columns default is True; a column label is datelike if it ends with `'_at'`, it ends with `'_time'`, it begins with `'timestamp'`, it is `'modified'`, or it is `'date'` keep_default_dates : boolean, default True If parsing dates, then parse the default datelike columns numpy : boolean, default False Direct decoding to numpy arrays. Supports numeric data only, but non-numeric column and index labels are supported. Note also that the JSON ordering MUST be the same for each term if numpy=True. precise_float : boolean, default False Set to enable usage of higher precision (strtod) function when decoding string to double values. Default (False) is to use fast but less precise builtin functionality date_unit : string, default None The timestamp unit to detect if converting dates. The default behaviour is to try and detect the correct precision, but if this is not desired then pass one of ‘s’, ‘ms’, ‘us’ or ‘ns’ to force parsing only seconds, milliseconds, microseconds or nanoseconds respectively. lines : boolean, default False Read the file as a json object per line. New in version 0.19.0. encoding : str, default is ‘utf-8’ The encoding to use to decode py3 bytes. New in version 0.19.0.
Returns:	result : Series or DataFrame, depending on the value of `typ`.

Examples

>>> df = pd.DataFrame([['a', 'b'], ['c', 'd']],
                          index=['row 1', 'row 2'],
                          columns=['col 1', 'col 2'])
>>> print df
      col 1 col 2
row 1     a     b
row 2     c     d
>>> for orient in ['split', 'records', 'index']:
        str = df.to_json(orient=orient)
        print "'{}': '{}'".format(orient, str)
        pd.read_json(str, orient=orient)
'split':
'{"columns":["col 1","col 2"],"index":["row 1","row 2"],"data":[["a","b"],
["c","d"]]}'
'records':
'[{"col 1":"a","col 2":"b"},{"col 1":"c","col 2":"d"}]'
'index':
'{"row 1":{"col 1":"a","col 2":"b"},"row 2":{"col 1":"c","col 2":"d"}}'

cswarth · 2016-09-23T18:38:55Z

I don't think the failing CI checks are a consequence of the changes I propose in this PR, which are literally only changing python comments.

Is there anything I can do to get the PR a clean bill of health?

jorisvandenbossche

@cswarth Thanks a lot! Clearer docs are always welcome.
I put some small comments

jorisvandenbossche · 2016-09-23T20:02:57Z

pandas/io/json.py

@@ -123,32 +123,39 @@ def read_json(path_or_buf=None, orient=None, typ='frame', dtype=True,
        file. For file URLs, a host is expected. For instance, a local file
        could be ``file://localhost/path/to/table.json``

-    orient
+    orient : string, indicating the expected format of the JSON input.


Can you put the explanation on the next line? (but leave the type (so 'string') on this one)

jorisvandenbossche · 2016-09-23T20:06:01Z

pandas/io/json.py

-    orient
+    orient : string, indicating the expected format of the JSON input.
+        The set of allowed orients changes depending on the value
+        of the ``typ`` parameter.


If we want to closely follow the numpy docstring standard, refering to other keywords would be with single backticks instead of double

jorisvandenbossche · 2016-09-23T20:07:04Z

pandas/io/json.py

+        strings produced by ``to_json()`` with a corresponding value
+        of ``orient``.
+
+          - ``'split'`` : dict like


The extra indentation compared to the previous paragraph is not needed

jorisvandenbossche · 2016-09-23T20:13:07Z

pandas/io/json.py

+    >>> df = pd.DataFrame([['a', 'b'], ['c', 'd']],
+                              index=['row 1', 'row 2'],
+                              columns=['col 1', 'col 2'])
+    >>> print df


'print' is not needed

jorisvandenbossche · 2016-09-23T20:13:48Z

pandas/io/json.py

+    --------
+
+    >>> df = pd.DataFrame([['a', 'b'], ['c', 'd']],
+                              index=['row 1', 'row 2'],


can you align this nicer?

jorisvandenbossche · 2016-09-23T20:16:00Z

pandas/io/json.py

+    >>> for orient in ['split', 'records', 'index']:
+            str = df.to_json(orient=orient)
+            print "'{}': '{}'".format(orient, str)
+            pd.read_json(str, orient=orient)


I would just use separate lines instead of the for loop, I personally think this is going to be clearer for the reader

I mean like

>>> df.to_json(orient='split') .. output .. >>> df.to_json(orient='records') .. output .. ....

What would you think of the following examples? We're trying to document pd.read_json(), but df.to_json() is along for the ride as a convenient source of well-formatted JSON strings.

The results are a little artificial in that I had to reformat the output of df.to_json(orient='split') to avoid the flake8-imposed constraint on line length.

I also used '' to retrieve previous results, but that syntax is not available in ipython when the prompt is '>>> ', as that indicates previous result caching is turned off. I think using '' makes the examples a lot easier to understand, but they won't work if pasted into %doctest_mode

codecov-io · 2016-09-26T18:33:11Z

Current coverage is 85.25% (diff: 100%)

Merging #14284 into master will decrease coverage by <.01%

@@             master     #14284   diff @@
==========================================
  Files           140        140          
  Lines         50579      50579          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits          43123      43122     -1   
- Misses         7456       7457     +1   
  Partials          0          0

Powered by Codecov. Last update 99b5876...4689d3a

cswarth · 2016-09-26T18:58:08Z

Preview of how the documentation looks after incorporating review comments.

Parameters:	path_or_buf : a valid JSON string or file-like, default: None The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. For instance, a local file could be `file://localhost/path/to/table.json` meta_prefix : string, default None orient : string, Indication of expected JSON input format. The set of allowed orients changes depending on the value of the typ parameter. when `typ == 'series'`, allowed orients are `{'split','records','index'}` default is `'index'` The Series index must be unique for orient `'index'`. when `typ == 'frame'`, allowed orients are `{'split','records','index', 'columns','values'}` default is `'columns'` The DataFrame index must be unique for orients `'index'` and `'columns'`. The DataFrame columns must be unique for orients `'index'`, `'columns'`, and `'records'`. The value of orient specifies the expected format of the JSON string. The expected JSON formats are compatible with the strings produced by `to_json()` with a corresponding value of orient. `'split'` : dict like `{index -> [index], columns -> [columns], data -> [values]}` `'records'` : list like `[{column -> value}, ... , {column -> value}]` `'index'` : dict like `{index -> {column -> value}}` `'columns'` : dict like `{column -> {index -> value}}` `'values'` : just the values array typ : type of object to recover (series or frame), default ‘frame’ dtype : boolean or dict, default True If True, infer dtypes, if a dict of column to dtype, then use those, if False, then don’t infer dtypes at all, applies only to the data. convert_axes : boolean, default True Try to convert the axes to the proper dtypes. convert_dates : boolean, default True List of columns to parse for dates; If True, then try to parse datelike columns default is True; a column label is datelike if it ends with `'_at'`, it ends with `'_time'`, it begins with `'timestamp'`, it is `'modified'`, or it is `'date'` keep_default_dates : boolean, default True If parsing dates, then parse the default datelike columns numpy : boolean, default False Direct decoding to numpy arrays. Supports numeric data only, but non-numeric column and index labels are supported. Note also that the JSON ordering MUST be the same for each term if numpy=True. precise_float : boolean, default False Set to enable usage of higher precision (strtod) function when decoding string to double values. Default (False) is to use fast but less precise builtin functionality date_unit : string, default None The timestamp unit to detect if converting dates. The default behaviour is to try and detect the correct precision, but if this is not desired then pass one of ‘s’, ‘ms’, ‘us’ or ‘ns’ to force parsing only seconds, milliseconds, microseconds or nanoseconds respectively. lines : boolean, default False Read the file as a json object per line. New in version 0.19.0. encoding : str, default is ‘utf-8’ The encoding to use to decode py3 bytes. New in version 0.19.0.
Returns:	result : Series or DataFrame, depending on the value of typ.

Examples

>>> df = pd.DataFrame([['a', 'b'], ['c', 'd']],
...                   index=['row 1', 'row 2'],
...                   columns=['col 1', 'col 2'])

>>> df.to_json(orient='split')
'{"columns":["col 1","col 2"],
  "index":["row 1","row 2"],
  "data":[["a","b"],["c","d"]]}'
>>> pd.read_json(_, orient='split')
      col 1 col 2
row 1     a     b
row 2     c     d

>>> df.to_json(orient='records')
'[{"col 1":"a","col 2":"b"},{"col 1":"c","col 2":"d"}]'
>>> pd.read_json(_, orient='records')
  col 1 col 2
0     a     b
1     c     d

>>> df.to_json(orient='index')
'{"row 1":{"col 1":"a","col 2":"b"},"row 2":{"col 1":"c","col 2":"d"}}'
>>> pd.read_json(_, orient='index')
      col 1 col 2
row 1     a     b
row 2     c     d

jorisvandenbossche

Changes looking good! (left some small further comments)

No problem in adapting the output of to_json to satisfy flake

Maybe it would be nice to also have an example showing the use of typ? (but can also leave for other PR)

jorisvandenbossche · 2016-09-28T15:56:59Z

pandas/io/json.py

@@ -122,33 +122,42 @@ def read_json(path_or_buf=None, orient=None, typ='frame', dtype=True,
        The string could be a URL. Valid URL schemes include http, ftp, s3, and
        file. For file URLs, a host is expected. For instance, a local file
        could be ``file://localhost/path/to/table.json``
+    meta_prefix : string, default None


What is this ?

copy-pasta error - removed

jorisvandenbossche · 2016-09-28T16:00:39Z

pandas/io/json.py

+            ``'columns'``, and ``'records'``.
+
+
+        The value of `orient` specifies the expected format of the


The two blank lines are not needed above this one (one blank line is OK).

But something else: would it make it more clear to first list the possibilities, and then which of those is the default/accepted value depending on the type? (just an idea)

jorisvandenbossche · 2016-09-28T16:03:11Z

pandas/io/json.py

+    '{"columns":["col 1","col 2"],
+      "index":["row 1","row 2"],
+      "data":[["a","b"],["c","d"]]}'
+    <BLANKLINE>


I suppose this is to have a blank line in the resulting code block, but to keep it as one code block? (so it's clearer they belong together).
That's a good idea I think, only a pity for the plain text docstring ..

BTW, you can also put some 'introducing' text in between the code examples when this can make it clearer what you are showing. (and that can also help delineate the different examples)

jorisvandenbossche · 2016-10-13T20:24:39Z

@cswarth Do you have time to update this? It's a really nice improvement of the docstring!

cswarth · 2016-10-14T23:38:08Z

I'm mystified and could use some help to figure out what's going on. I pushed a commit to my branch to address your review, but this PR is not picking up the changes.

I can see the commit on the branch, but the commits link at the top of this page insists there are only two commits for this PR.

I can't figure out what I've screwed up here.

jreback · 2016-10-14T23:56:07Z

@cswarth yeah we changed the base github domain to pandas-dev and it seems can push existing PR's. So close this, one and open a new PR.

cswarth · 2016-10-17T17:08:39Z

Closing to move PR to new github domain

jreback added Docs IO JSON read_json, to_json, json_normalize labels Sep 23, 2016

jorisvandenbossche changed the title ~~DOC: Expand reference doc for panda.read_json()~~ DOC: Expand reference doc for read_json Sep 23, 2016

jorisvandenbossche requested changes Sep 23, 2016

View reviewed changes

cswarth added 2 commits September 26, 2016 11:29

DOC: Expand on reference docs for panda.read_json()

fdc69f3

address comments from @jorisvandenbossche

4689d3a

jorisvandenbossche reviewed Sep 28, 2016

View reviewed changes

cswarth closed this Oct 17, 2016

cswarth mentioned this pull request Oct 17, 2016

DOC: Expand on reference docs for read_json #14442

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: Expand reference doc for read_json #14284

DOC: Expand reference doc for read_json #14284

cswarth commented Sep 23, 2016

cswarth commented Sep 23, 2016

jorisvandenbossche left a comment

jorisvandenbossche Sep 23, 2016

jorisvandenbossche Sep 23, 2016

jorisvandenbossche Sep 23, 2016

jorisvandenbossche Sep 23, 2016

jorisvandenbossche Sep 23, 2016

jorisvandenbossche Sep 23, 2016

cswarth Sep 23, 2016

codecov-io commented Sep 26, 2016 •

edited

Loading

cswarth commented Sep 26, 2016

jorisvandenbossche left a comment

jorisvandenbossche Sep 28, 2016 •

edited

Loading

cswarth Oct 14, 2016

jorisvandenbossche Sep 28, 2016

jorisvandenbossche Sep 28, 2016

jorisvandenbossche commented Oct 13, 2016

cswarth commented Oct 14, 2016

jreback commented Oct 14, 2016

cswarth commented Oct 17, 2016

		``'columns'``, and ``'records'``.


		The value of `orient` specifies the expected format of the

DOC: Expand reference doc for read_json #14284

DOC: Expand reference doc for read_json #14284

Conversation

cswarth commented Sep 23, 2016

pandas.read_json¶

cswarth commented Sep 23, 2016

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Sep 26, 2016 • edited Loading

Current coverage is 85.25% (diff: 100%)

cswarth commented Sep 26, 2016

jorisvandenbossche left a comment

Choose a reason for hiding this comment

jorisvandenbossche Sep 28, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented Oct 13, 2016

cswarth commented Oct 14, 2016

jreback commented Oct 14, 2016

cswarth commented Oct 17, 2016

codecov-io commented Sep 26, 2016 •

edited

Loading

jorisvandenbossche Sep 28, 2016 •

edited

Loading