-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Added to_json_schema #14904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Added to_json_schema #14904
Conversation
@pwalsh one data type question for you, what would a good JSON table schema type be for At the moment I don't attempt to distinguish between JSON Table Schema |
One more thing: this could also be the start for #9146, a roundtrip orient for JSON. The spec allows for additional properties at the "table" and "field" level. So we could have a {
'type': 'DataFrame',
'version': pd.__version__,
'orient': 'records',
'date_unit': 'ms'
} |
isn't this just another json format orient? or maybe need an argument schema=True in to_json() |
yes and no. This is just the schema, not the values. And this returns a dict instead of a serialized string. I've put the doc section in |
this is very odd for a top level method |
This is great to see so we at least have standardized types for us to work with on the front-end. It didn't occur to me that we would need to carve out the top level format for publishing both the schema and the data. I like the The matching data format is |
Agreed. It probably belongs on
|
I don't think this should be a method at all |
@TomAugspurger I'd use duration for timedelta. I'm going to have to research it a bit more if you think there is some inconsistency here. |
Hey @holdenk - could we support the same schema + data output for Spark DataFrames? |
We probably could, in fact the schema is already transfered between the JVM and PySpark using JSON so we might be able to just normalize on that format for interchange inside of PySpark its self. |
I like @jreback's suggestion to add a {
"application/tableschema+json": {
"schema": schema,
"data": data
}
} |
BTW, I have published a jupyterlab/notebook extension that will render JSON Table Schema: https://github.com/gnestor/jupyterlab_table This is more of a WIP until some standards are in place (e.g. a mimetype for JSON Table Schema, pandas compatibility, etc.). |
Related media types have been registered over the new year: |
OK, coming around on this. One problem: right now import json
import IPython
# passing a python dict, which IPython serializes
IPython.display.display({"application/json": {"A": [1, 2, 3]}}, raw=True)
# Any way to do this?
IPython.display.display({"application/json": json.dumps({"A": [1, 2, 3]})}, raw=True) or we could potentially include the mime-type in the already serialized data. |
@minrk ^^ usually I pass a direct dict to IPython.display.display with raw=True. I'm not sure how I'd pass something already encoded since this data would be part of an overall JSON object. |
Pardon me typing on mobile, I see now I'm repeating things already said. |
c40309c
to
dc4daa4
Compare
Codecov Report
@@ Coverage Diff @@
## master #14904 +/- ##
==========================================
- Coverage 91.07% 91.01% -0.06%
==========================================
Files 136 137 +1
Lines 49167 49228 +61
==========================================
+ Hits 44777 44806 +29
- Misses 4390 4422 +32
Continue to review full report at Codecov.
|
pandas/io/json.py
Outdated
@@ -1060,5 +1060,5 @@ def publish_tableschema(data): | |||
"""Temporary helper for testing w/ frontend""" | |||
from IPython.display import display | |||
mimetype = 'application/vnd.tableschema.v1+json' | |||
payload = data.to_json(orient='jsontable_schema') | |||
payload = data.to_json(orient='json_table_schema') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
pandas/io/json.py
Outdated
""" | ||
components = x.components | ||
seconds = '{}.{:0>3}{:0>3}{:0>3}'.format(components.seconds, | ||
components.milliseconds, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah let's do this in a separate PR
ideally also be able to parse this as well (open an issue of u don't do it in same PR)
@pwalsh what would the media type and structure be for the combined schema plus data: {
schema: {...jsonTableSchemaHere},
data: [...rows],
} |
Would it end up like this then, based on that spec? {
"resources": [{
"format": "json",
"data": [...],
"schema": "table-schema"
}],
"schemas": {
"table-schema": // inline here?
}
} |
No - in this case, the top-level object is an object within your resources array, with the data inlined, and the schema inlined, exactly as #14904 (comment) |
doc/source/options.rst
Outdated
@@ -392,6 +392,9 @@ display.width 80 Width of the display in characters. | |||
IPython qtconsole, or IDLE do not run in a | |||
terminal and hence it is not possible | |||
to correctly detect the width. | |||
display.html.table_schema True Whether to publish a Table Schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we set it to False by default?
doc/source/options.rst
Outdated
.. versionadded:: 0.20.0 | ||
|
||
``DataFrame`` and ``Series`` will publish a Table Schema representation | ||
by default. This can be disabled globally with the ``display.html.table_schema`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is also outdated (default False)
pandas/core/generic.py
Outdated
@@ -1151,14 +1190,55 @@ def to_json(self, path_or_buf=None, orient=None, date_format='epoch', | |||
|
|||
.. versionadded:: 0.19.0 | |||
|
|||
.. _Table Schema: http://specs.frictionlessdata.io/json-table-schema/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this link used somewhere?
doc/source/io.rst
Outdated
int64 integer | ||
float64 number | ||
bool boolean | ||
datetime64[ns] date |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
date -> datetime
pandas/io/json/table_schema.py
Outdated
column names to designate as the primary key. | ||
The default `None` will set `'primaryKey'` to the index | ||
level or levels if the index is unique. | ||
version : bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, default True
Lays the groundwork for pandas-dev#14386 This handles the schema part of the request there. We'll still need to do the work to publish the data to the frontend, but that can be done as a followup. DOC: More notes in prose docs Move files use isoformat updates Moved to to_json json_table no config refactor with classes Added duration tests more timedelta Change default orient Series test fixup docs JSON Table -> Table doc Change to table orient added version Handle Categorical Many more tests
e344a77
to
07d90cb
Compare
Needs a rebase. I'm super excited about this, thank you so much. |
Me too! Thanks for all your effort @TomAugspurger! |
Played with this a bit locally, I'll renew my interest in making views based on this. 😄 |
hey all - I also want to say thanks for the effort here - we at Open Knowledge International are very excited to see this land. |
@TomAugspurger Thanks a lot! 🎉 |
Thanks Joris! Was just rebasing without realizing you'd already done it and was very confused about the conflicts. |
Sorry, I just fixed the conflict using github before merging |
thanks @TomAugspurger |
xref pandas-dev#14904 Author: Jeff Reback <[email protected]> Closes pandas-dev#15322 from jreback/json and squashes the following commits: 0c2da60 [Jeff Reback] DOC: whatsnew update fa3deef [Jeff Reback] CLN: reorg pandas/io/json to sub-dirs
Lays the groundwork for pandas-dev#14386 This handles the schema part of the request there. We'll still need to do the work to publish the data to the frontend, but that can be done as a followup. Added publish to dataframe repr
Lays the groundwork for (but doesn't close) #14386
This handles the schema part of the request there. We'll still need to
do the work to publish the data to the frontend, but that can be done
as a followup.
Usage:
I think this is useful enough on its own to be part of the public API, so I've documented as such.
I've included a placeholder
publish_tableschema
that will not be included in the final commit.It's just to make @rgbkrk's life easier for prototyping the nteract frontend. I think the proper solution for publishing the schema + data will have to wait on ipython/ipython#10090