Skip to content

Commit ac1609e

Browse files
committed
Merge pull request #5287 from Komnomnomnom/json-docs
DOC: expand JSON docs
2 parents 2e391bb + 684cf76 commit ac1609e

File tree

3 files changed

+148
-12
lines changed

3 files changed

+148
-12
lines changed

doc/source/io.rst

+143-10
Original file line numberDiff line numberDiff line change
@@ -1018,7 +1018,7 @@ which, if set to ``True``, will additionally output the length of the Series.
10181018
JSON
10191019
----
10201020

1021-
Read and write ``JSON`` format files.
1021+
Read and write ``JSON`` format files and strings.
10221022

10231023
.. _io.json:
10241024

@@ -1066,12 +1066,77 @@ Note ``NaN``'s, ``NaT``'s and ``None`` will be converted to ``null`` and ``datet
10661066
json = dfj.to_json()
10671067
json
10681068
1069+
Orient Options
1070+
++++++++++++++
1071+
1072+
There are a number of different options for the format of the resulting JSON
1073+
file / string. Consider the following DataFrame and Series:
1074+
1075+
.. ipython:: python
1076+
1077+
dfjo = DataFrame(dict(A=range(1, 4), B=range(4, 7), C=range(7, 10)),
1078+
columns=list('ABC'), index=list('xyz'))
1079+
dfjo
1080+
sjo = Series(dict(x=15, y=16, z=17), name='D')
1081+
sjo
1082+
1083+
**Column oriented** (the default for ``DataFrame``) serialises the data as
1084+
nested JSON objects with column labels acting as the primary index:
1085+
1086+
.. ipython:: python
1087+
1088+
dfjo.to_json(orient="columns")
1089+
# Not available for Series
1090+
1091+
**Index oriented** (the default for ``Series``) similar to column oriented
1092+
but the index labels are now primary:
1093+
1094+
.. ipython:: python
1095+
1096+
dfjo.to_json(orient="index")
1097+
sjo.to_json(orient="index")
1098+
1099+
**Record oriented** serialises the data to a JSON array of column -> value records,
1100+
index labels are not included. This is useful for passing DataFrame data to plotting
1101+
libraries, for example the JavaScript library d3.js:
1102+
1103+
.. ipython:: python
1104+
1105+
dfjo.to_json(orient="records")
1106+
sjo.to_json(orient="records")
1107+
1108+
**Value oriented** is a bare-bones option which serialises to nested JSON arrays of
1109+
values only, column and index labels are not included:
1110+
1111+
.. ipython:: python
1112+
1113+
dfjo.to_json(orient="values")
1114+
# Not available for Series
1115+
1116+
**Split oriented** serialises to a JSON object containing separate entries for
1117+
values, index and columns. Name is also included for ``Series``:
1118+
1119+
.. ipython:: python
1120+
1121+
dfjo.to_json(orient="split")
1122+
sjo.to_json(orient="split")
1123+
1124+
.. note::
1125+
1126+
Any orient option that encodes to a JSON object will not preserve the ordering of
1127+
index and column labels during round-trip serialisation. If you wish to preserve
1128+
label ordering use the `split` option as it uses ordered containers.
1129+
1130+
Date Handling
1131+
+++++++++++++
1132+
10691133
Writing in iso date format
10701134

10711135
.. ipython:: python
10721136
10731137
dfd = DataFrame(randn(5, 2), columns=list('AB'))
10741138
dfd['date'] = Timestamp('20130101')
1139+
dfd = dfd.sort_index(1, ascending=False)
10751140
json = dfd.to_json(date_format='iso')
10761141
json
10771142
@@ -1082,7 +1147,7 @@ Writing in iso date format, with microseconds
10821147
json = dfd.to_json(date_format='iso', date_unit='us')
10831148
json
10841149
1085-
Actually I prefer epoch timestamps, in seconds
1150+
Epoch timestamps, in seconds
10861151

10871152
.. ipython:: python
10881153
@@ -1101,6 +1166,9 @@ Writing to a file, with a date index and a date column
11011166
dfj2.to_json('test.json')
11021167
open('test.json').read()
11031168
1169+
Fallback Behavior
1170+
+++++++++++++++++
1171+
11041172
If the JSON serialiser cannot handle the container contents directly it will fallback in the following manner:
11051173

11061174
- if a ``toDict`` method is defined by the unrecognised object then that
@@ -1182,7 +1250,7 @@ is ``None``. To explicity force ``Series`` parsing, pass ``typ=series``
11821250
- ``convert_dates`` : a list of columns to parse for dates; If True, then try to parse datelike columns, default is True
11831251
- ``keep_default_dates`` : boolean, default True. If parsing dates, then parse the default datelike columns
11841252
- ``numpy`` : direct decoding to numpy arrays. default is False;
1185-
Note that the JSON ordering **MUST** be the same for each term if ``numpy=True``
1253+
Supports numeric data only, although labels may be non-numeric. Also note that the JSON ordering **MUST** be the same for each term if ``numpy=True``
11861254
- ``precise_float`` : boolean, default ``False``. Set to enable usage of higher precision (strtod) function when decoding string to double values. Default (``False``) is to use fast but less precise builtin functionality
11871255
- ``date_unit`` : string, the timestamp unit to detect if converting dates. Default
11881256
None. By default the timestamp precision will be detected, if this is not desired
@@ -1191,6 +1259,13 @@ is ``None``. To explicity force ``Series`` parsing, pass ``typ=series``
11911259

11921260
The parser will raise one of ``ValueError/TypeError/AssertionError`` if the JSON is not parsable.
11931261

1262+
If a non-default ``orient`` was used when encoding to JSON be sure to pass the same
1263+
option here so that decoding produces sensible results, see `Orient Options`_ for an
1264+
overview.
1265+
1266+
Data Conversion
1267+
+++++++++++++++
1268+
11941269
The default of ``convert_axes=True``, ``dtype=True``, and ``convert_dates=True`` will try to parse the axes, and all of the data
11951270
into appropriate types, including dates. If you need to override specific dtypes, pass a dict to ``dtype``. ``convert_axes`` should only
11961271
be set to ``False`` if you need to preserve string-like numbers (e.g. '1', '2') in an axes.
@@ -1209,31 +1284,31 @@ be set to ``False`` if you need to preserve string-like numbers (e.g. '1', '2')
12091284

12101285
Thus there are times where you may want to specify specific dtypes via the ``dtype`` keyword argument.
12111286

1212-
Reading from a JSON string
1287+
Reading from a JSON string:
12131288

12141289
.. ipython:: python
12151290
12161291
pd.read_json(json)
12171292
1218-
Reading from a file
1293+
Reading from a file:
12191294

12201295
.. ipython:: python
12211296
12221297
pd.read_json('test.json')
12231298
1224-
Don't convert any data (but still convert axes and dates)
1299+
Don't convert any data (but still convert axes and dates):
12251300

12261301
.. ipython:: python
12271302
12281303
pd.read_json('test.json', dtype=object).dtypes
12291304
1230-
Specify how I want to convert data
1305+
Specify dtypes for conversion:
12311306

12321307
.. ipython:: python
12331308
12341309
pd.read_json('test.json', dtype={'A' : 'float32', 'bools' : 'int8'}).dtypes
12351310
1236-
I like my string indicies
1311+
Preserve string indicies:
12371312

12381313
.. ipython:: python
12391314
@@ -1250,8 +1325,7 @@ I like my string indicies
12501325
sij.index
12511326
sij.columns
12521327
1253-
My dates have been written in nanoseconds, so they need to be read back in
1254-
nanoseconds
1328+
Dates written in nanoseconds need to be read back in nanoseconds:
12551329

12561330
.. ipython:: python
12571331
@@ -1269,6 +1343,65 @@ nanoseconds
12691343
dfju = pd.read_json(json, date_unit='ns')
12701344
dfju
12711345
1346+
The Numpy Parameter
1347+
+++++++++++++++++++
1348+
1349+
.. note::
1350+
This supports numeric data only. Index and columns labels may be non-numeric, e.g. strings, dates etc.
1351+
1352+
If ``numpy=True`` is passed to ``read_json`` an attempt will be made to sniff
1353+
an appropriate dtype during deserialisation and to subsequently decode directly
1354+
to numpy arrays, bypassing the need for intermediate Python objects.
1355+
1356+
This can provide speedups if you are deserialising a large amount of numeric
1357+
data:
1358+
1359+
.. ipython:: python
1360+
1361+
randfloats = np.random.uniform(-100, 1000, 10000)
1362+
randfloats.shape = (1000, 10)
1363+
dffloats = DataFrame(randfloats, columns=list('ABCDEFGHIJ'))
1364+
1365+
jsonfloats = dffloats.to_json()
1366+
1367+
.. ipython:: python
1368+
1369+
timeit read_json(jsonfloats)
1370+
1371+
.. ipython:: python
1372+
1373+
timeit read_json(jsonfloats, numpy=True)
1374+
1375+
The speedup is less noticable for smaller datasets:
1376+
1377+
.. ipython:: python
1378+
1379+
jsonfloats = dffloats.head(100).to_json()
1380+
1381+
.. ipython:: python
1382+
1383+
timeit read_json(jsonfloats)
1384+
1385+
.. ipython:: python
1386+
1387+
timeit read_json(jsonfloats, numpy=True)
1388+
1389+
.. warning::
1390+
1391+
Direct numpy decoding makes a number of assumptions and may fail or produce
1392+
unexpected output if these assumptions are not satisfied:
1393+
1394+
- data is numeric.
1395+
1396+
- data is uniform. The dtype is sniffed from the first value decoded.
1397+
A ``ValueError`` may be raised, or incorrect output may be produced
1398+
if this condition is not satisfied.
1399+
1400+
- labels are ordered. Labels are only read from the first container, it is assumed
1401+
that each subsequent row / column has been encoded in the same order. This should be satisfied if the
1402+
data was encoded using ``to_json`` but may not be the case if the JSON
1403+
is from another source.
1404+
12721405
.. ipython:: python
12731406
:suppress:
12741407

doc/source/release.rst

+2
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,8 @@ Improvements to existing features
102102
- Significant table writing performance improvements in ``HDFStore``
103103
- JSON date serialisation now performed in low-level C code.
104104
- JSON support for encoding datetime.time
105+
- Expanded JSON docs, more info about orient options and the use of the numpy
106+
param when decoding.
105107
- Add ``drop_level`` argument to xs (:issue:`4180`)
106108
- Can now resample a DataFrame with ohlc (:issue:`2320`)
107109
- ``Index.copy()`` and ``MultiIndex.copy()`` now accept keyword arguments to

pandas/io/json.py

+3-2
Original file line numberDiff line numberDiff line change
@@ -153,8 +153,9 @@ def read_json(path_or_buf=None, orient=None, typ='frame', dtype=True,
153153
keep_default_dates : boolean, default True.
154154
If parsing dates, then parse the default datelike columns
155155
numpy : boolean, default False
156-
Direct decoding to numpy arrays. Note that the JSON ordering MUST be
157-
the same for each term if numpy=True.
156+
Direct decoding to numpy arrays. Supports numeric data only, but
157+
non-numeric column and index labels are supported. Note also that the
158+
JSON ordering MUST be the same for each term if numpy=True.
158159
precise_float : boolean, default False.
159160
Set to enable usage of higher precision (strtod) function when
160161
decoding string to double values. Default (False) is to use fast but

0 commit comments

Comments
 (0)