Skip to content

Commit 59288eb

Browse files
committed
ENH: Added to_json_schema
Lays the groundwork for pandas-dev#14386 This handles the schema part of the request there. We'll still need to do the work to publish the data to the frontend, but that can be done as a followup. DOC: More notes in prose docs Move files use isoformat updates Moved to to_json json_table no config refactor with classes Added duration tests more timedelta Change default orient Series test fixup docs JSON Table -> Table doc Change to table orient added version Handle Categorical Many more tests
1 parent 7d6afc4 commit 59288eb

File tree

5 files changed

+815
-17
lines changed

5 files changed

+815
-17
lines changed

doc/source/io.rst

+46
Original file line numberDiff line numberDiff line change
@@ -2033,6 +2033,52 @@ using Hadoop or Spark.
20332033
df
20342034
df.to_json(orient='records', lines=True)
20352035
2036+
2037+
Table Schema
2038+
''''''''''''
2039+
2040+
`Table Schema`_ is a spec for describing tabular datasets as a JSON
2041+
object. The JSON includes information on the field names, types, and
2042+
other attributes. You can use the orient ``'table'`` to build
2043+
a JSON string with two fields, ``scehma`` and ``data``.
2044+
2045+
.. ipython:: python
2046+
2047+
df = pd.DataFrame(
2048+
{'A': [1, 2, 3],
2049+
'B': ['a', 'b', 'c'],
2050+
'C': pd.date_range('2016-01-01', freq='d', periods=3),
2051+
}, index=pd.Index(range(3), name='idx'))
2052+
df
2053+
df.to_json(orient='table')
2054+
2055+
The ``schema`` field contains the ``'fields'`` keys, which itself contains
2056+
a list of column name to type pairs, including the Index or MultiIndex
2057+
(see below for a list of types).
2058+
The ``schema`` field also contains a ``primary_key`` field if the (Multi)index
2059+
is unique.
2060+
2061+
The second field, ``data``, contains the serialized data with the ``records``
2062+
orient.
2063+
The index is included, and any datetimes are ISO 8601 formatted, as required
2064+
by the Table Schema spec.
2065+
2066+
The full list of types supported are described in the Table Schema
2067+
spec. This table shows the mapping from pandas types:
2068+
2069+
============== =================
2070+
Pandas type Table Schema type
2071+
============== =================
2072+
int64 integer
2073+
float64 number
2074+
bool boolean
2075+
datetime64[ns] date
2076+
timedelta64[ns] duration
2077+
object str
2078+
=============== =================
2079+
2080+
_Table Schema: http://specs.frictionlessdata.io/json-table-schema/
2081+
20362082
HTML
20372083
----
20382084

doc/source/whatsnew/v0.20.0.txt

+20
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,26 @@ Notably, a new numerical index, ``UInt64Index``, has been created (:issue:`14937
114114
- Bug in ``pd.unique()`` in which unsigned 64-bit integers were causing overflow (:issue:`14915`)
115115
- Bug in ``pd.value_counts()`` in which unsigned 64-bit integers were being erroneously truncated in the output (:issue:`14934`)
116116

117+
118+
Table Schema Output
119+
^^^^^^^^^^^^^^^^^^^
120+
121+
The new orient ``'table'`` for :meth:`DataFrame.to_json`
122+
will generate a `Table Schema`_ compatible string representation of
123+
the data.
124+
125+
.. ipython:: python
126+
127+
df = pd.DataFrame(
128+
{'A': [1, 2, 3],
129+
'B': ['a', 'b', 'c'],
130+
'C': pd.date_range('2016-01-01', freq='d', periods=3),
131+
}, index=pd.Index(range(3), name='idx'))
132+
df
133+
df.to_json(orient='table')
134+
135+
.. _Table Schema: http://specs.frictionlessdata.io/json-table-schema/
136+
117137
.. _whatsnew_0200.enhancements.other:
118138

119139
Other enhancements

pandas/core/generic.py

+48-3
Original file line numberDiff line numberDiff line change
@@ -1075,8 +1075,9 @@ def __setstate__(self, state):
10751075
"""
10761076

10771077
def to_json(self, path_or_buf=None, orient=None, date_format='epoch',
1078-
double_precision=10, force_ascii=True, date_unit='ms',
1079-
default_handler=None, lines=False):
1078+
timedelta_format='epoch', double_precision=10,
1079+
force_ascii=True, date_unit='ms', default_handler=None,
1080+
lines=False):
10801081
"""
10811082
Convert the object to a JSON string.
10821083
@@ -1109,10 +1110,19 @@ def to_json(self, path_or_buf=None, orient=None, date_format='epoch',
11091110
- index : dict like {index -> {column -> value}}
11101111
- columns : dict like {column -> {index -> value}}
11111112
- values : just the values array
1113+
- ``'table'`` dict like
1114+
``{'schema': [schema], 'data': [data]}``
1115+
where the schema component is a Table Schema
1116+
describing the data, and the data component is
1117+
like ``orient='records'``.
1118+
1119+
.. versionchanged:: 0.20.0
11121120
11131121
date_format : {'epoch', 'iso'}
11141122
Type of date conversion. `epoch` = epoch milliseconds,
1115-
`iso`` = ISO8601, default is epoch.
1123+
`iso` = ISO8601. Default is epoch, except when orient is
1124+
table_schema, in which case this parameter is ignored
1125+
and iso formatting is always used.
11161126
double_precision : The number of decimal places to use when encoding
11171127
floating point values, default 10.
11181128
force_ascii : force encoded string to be ASCII, default True.
@@ -1136,6 +1146,41 @@ def to_json(self, path_or_buf=None, orient=None, date_format='epoch',
11361146
-------
11371147
same type as input object with filtered info axis
11381148
1149+
See Also
1150+
--------
1151+
pd.read_json
1152+
1153+
Examples
1154+
--------
1155+
1156+
>>> df = pd.DataFrame([['a', 'b'], ['c', 'd']],
1157+
... index=['row 1', 'row 2'],
1158+
... columns=['col 1', 'col 2'])
1159+
>>> df.to_json(orient='split')
1160+
'{"columns":["col 1","col 2"],
1161+
"index":["row 1","row 2"],
1162+
"data":[["a","b"],["c","d"]]}'
1163+
1164+
Encoding/decoding a Dataframe using ``'index'`` formatted JSON:
1165+
1166+
>>> df.to_json(orient='index')
1167+
'{"row 1":{"col 1":"a","col 2":"b"},"row 2":{"col 1":"c","col 2":"d"}}'
1168+
1169+
Encoding/decoding a Dataframe using ``'records'`` formatted JSON.
1170+
Note that index labels are not preserved with this encoding.
1171+
1172+
>>> df.to_json(orient='records')
1173+
'[{"col 1":"a","col 2":"b"},{"col 1":"c","col 2":"d"}]'
1174+
1175+
Encoding with Table Schema
1176+
1177+
>>> df.to_json(orient='table_schema')
1178+
'{"schema": {"fields": [{"name": "index", "type": "string"},
1179+
{"name": "col 1", "type": "string"},
1180+
{"name": "col 2", "type": "string"}],
1181+
"primary_key": "index"},
1182+
"data": [{"index":"row 1","col 1":"a","col 2":"b"},
1183+
{"index":"row 2","col 1":"c","col 2":"d"}]}'
11391184
"""
11401185

11411186
from pandas.io import json

0 commit comments

Comments
 (0)