Add Indent Support in to_json #28130

WillAyd · 2019-08-24T22:33:50Z

closes add indent support to to_json method #12004
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

After some of the recent refactor this should be relatively easy to do after vendoring a few updates

pandas/core/generic.py

simonjayhawkins · 2019-08-25T00:24:10Z

pandas/core/generic.py

@@ -2249,6 +2249,7 @@ def to_json(
        lines=False,
        compression="infer",
        index=True,
+        indent=0,


add type hints when modifying code?

WillAyd

OK type hints added where feasible, comments where I think not

WillAyd · 2019-08-25T23:38:02Z

pandas/core/generic.py

+        double_precision: int = 10,
+        force_ascii: bool_t = True,
+        date_unit: str = "ms",
+        default_handler: Optional[Callable[[Any], Union[int, str, List, Dict]]] = None,


The return for the annotation here should match the Serializable alias in the _json file. Not sure if better to put in a shared place like _typing or duplicate here; open to suggestions

WillAyd · 2019-08-25T23:39:09Z

pandas/io/json/_json.py

@@ -31,20 +32,23 @@

 TABLE_SCHEMA_VERSION = "0.20.0"

+Serializable = Union[int, str, List, Dict]


If we wanted to go crazy could subscript List and Dict recursively and with int or str, but I don't think mypy yet supports recursive types. Figure this is a good enough tradeoff

WillAyd · 2019-08-25T23:40:34Z

pandas/io/json/_json.py

@@ -31,20 +32,23 @@

 TABLE_SCHEMA_VERSION = "0.20.0"

+Serializable = Union[int, str, List, Dict]
+

 # interface to/from
 def to_json(
    path_or_buf,
    obj,


I think obj should be FrameOrSeries, but since it gets passed too pandas.core.generic I was getting errors like

pandas/core/generic.py:2401: error: Value of type variable "FrameOrSeries" of "to_json" cannot be "NDFrame"

I think FrameOrSeries in _typing should be changed to FrameOrSeries = TypeVar("FrameOrSeries", bound="NDFrame") as used in #27646

Interesting idea - I think worth exploring as a separate PR for sure

Took a look at this after merging master and still doesn't work because the SeriesWriter _write method seems to re-use this variable for a dict. I'm not sure if that's needed so should be able to refactor but probably better as a follow up to not cram too much in here

WillAyd · 2019-08-25T23:41:05Z

pandas/io/json/_json.py

    ):
        self.obj = obj

        if orient is None:
-            orient = self._default_orient
+            orient = self._default_orient  # type: ignore


orient is probably better as an abstract property but I figure best to leave that refactor to a follow up

add a comment?

although adding an abstract property is straightforward, if you add if TYPE-CHECKING it won't affect runtime behaviour.

if TYPE_CHECKING: _default_orient = None # type: <...>

If it's cool I'd prefer to do this as a quick follow up. I find a lot of TYPE_CHECKING blocks to be distracting so like to avoid where possible. It won't be difficult just want to minimize diff in extension change

no problem.

I don't think we should avoid these blocks as too beneficial.

in a (hopefully not too distant) future PR, i'll be adding...

if TYPE_CHECKING: # attributes index and columns are created dynamically index = None # type: Index columns = None # type: Index

to frame.py so that i can get types for self.fmt.tr_frame.index.format and self.frame.index.nlevels etc.

in this case all tests pass without the if TYPE_CHECKING but would rather not change runtime behaviour if avoidable.

WillAyd · 2019-08-25T23:43:51Z

pandas/io/json/_json.py

@@ -31,20 +32,23 @@

 TABLE_SCHEMA_VERSION = "0.20.0"

+Serializable = Union[int, str, List, Dict]
+

 # interface to/from
 def to_json(
    path_or_buf,


I think this should be FilePathOrBuffer but was failing with

pandas/io/json/_json.py:97: error: Item "Path" of "Union[Path, IO[Any]]" has no attribute "write"

I think this is actually a problem with the return of _stringify_path since it doesn't indicate that Path objects get converted to strings. Can open an issue as a follow up but @simonjayhawkins may already be working on that

yeah, should already be fixed with overloads in #27732.

simonjayhawkins · 2019-08-27T09:12:51Z

@WillAyd can you merge master.

WillAyd · 2019-08-27T16:02:21Z

Actually just realized that the simplejson implementation prefers indent to be a string rather than an int. Let me see if I can work that in here

https://github.com/simplejson/simplejson/blob/5fccfc1d88cc70455d013fbf13f9a49a19cd7929/simplejson/__init__.py#L328

WillAyd · 2019-08-27T16:05:20Z

Nevermind ignore previous comment - mixed up my JSON libraries this morning. ujson only supports an int

WillAyd · 2019-08-27T18:26:16Z

OK I think this is ready. If it helps for review here is the reference implementation that I've vendored into our ujson copy:

https://github.com/esnme/ultrajson/pull/163/files

WillAyd · 2019-08-30T18:23:57Z

For completeness here are benchmarks showing no change:

· Running 22 total benchmarks (2 commits * 1 environments * 11 benchmarks)
[  0.00%] · For pandas commit 51db82d9 <master> (round 1/2):
[  0.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...............................................
[  0.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[  6.82%] ··· Running (io.json.ToJSON.time_to_json--).
[  9.09%] ··· Running (io.json.ToJSON.time_to_json_wide--).
[ 11.36%] ··· Running (io.json.ToJSONLines.time_delta_int_tstamp_lines--).....
[ 25.00%] · For pandas commit 638d0552 <json-indent2> (round 1/2):
[ 25.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...
[ 25.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 31.82%] ··· Running (io.json.ToJSON.time_to_json--).
[ 34.09%] ··· Running (io.json.ToJSON.time_to_json_wide--).
[ 36.36%] ··· Running (io.json.ToJSONLines.time_delta_int_tstamp_lines--).....
[ 50.00%] · For pandas commit 638d0552 <json-indent2> (round 2/2):
[ 50.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 52.27%] ··· io.json.ToJSON.mem_to_json                                                                                    ok
[ 52.27%] ··· ========= ==== ============= ============== =============== ==================
              --                                       frame
              --------- --------------------------------------------------------------------
                orient   df   df_date_idx   df_td_int_ts   df_int_floats   df_int_float_str
              ========= ==== ============= ============== =============== ==================
                split    0         0             0               0                0
               columns   0         0             0               0                0
                index    0         0             0               0                0
                values   0         0             0               0                0
               records   0         0             0               0                0
              ========= ==== ============= ============== =============== ==================

[ 54.55%] ··· io.json.ToJSON.mem_to_json_wide                                                                               ok
[ 54.55%] ··· ========= ==== ============= ============== =============== ==================
              --                                       frame
              --------- --------------------------------------------------------------------
                orient   df   df_date_idx   df_td_int_ts   df_int_floats   df_int_float_str
              ========= ==== ============= ============== =============== ==================
                split    0         0             0               0                0
               columns   0         0             0               0                0
                index    0         0             0               0                0
                values   0         0             0               0                0
               records   0         0             0               0                0
              ========= ==== ============= ============== =============== ==================

[ 56.82%] ··· io.json.ToJSON.time_to_json                                                                                   ok
[ 56.82%] ··· ========= ========== ============= ============== =============== ==================
              --                                          frame
              --------- --------------------------------------------------------------------------
                orient      df      df_date_idx   df_td_int_ts   df_int_floats   df_int_float_str
              ========= ========== ============= ============== =============== ==================
                split    93.9±5ms     94.2±9ms      98.5±6ms        102±5ms          102±20ms
               columns   95.6±2ms     308±20ms      352±40ms        321±60ms         313±40ms
                index    121±3ms      319±10ms      318±20ms        318±5ms          316±3ms
                values   78.3±3ms     78.6±1ms      79.3±1ms        88.4±1ms         87.5±1ms
               records   89.1±6ms     92.0±2ms      109±2ms         117±2ms          116±3ms
              ========= ========== ============= ============== =============== ==================

[ 59.09%] ··· io.json.ToJSON.time_to_json_wide                                                                              ok
[ 59.09%] ··· ========= ========== ============= ============== =============== ==================
              --                                          frame
              --------- --------------------------------------------------------------------------
                orient      df      df_date_idx   df_td_int_ts   df_int_floats   df_int_float_str
              ========= ========== ============= ============== =============== ==================
                split    110±1ms      110±2ms       168±4ms         154±6ms          181±20ms
               columns   123±1ms      139±2ms       190±2ms         176±2ms          200±3ms
                index    125±3ms      125±2ms      188±0.9ms        168±4ms          207±7ms
                values   109±9ms      112±7ms       170±10ms        149±10ms         191±20ms
               records   139±10ms     137±10ms      199±10ms        171±20ms         195±6ms
              ========= ========== ============= ============== =============== ==================

[ 61.36%] ··· io.json.ToJSONLines.time_delta_int_tstamp_lines                                                          147±3ms
[ 63.64%] ··· io.json.ToJSONLines.time_float_int_lines                                                                 163±4ms
[ 65.91%] ··· io.json.ToJSONLines.time_float_int_str_lines                                                            168±10ms
[ 68.18%] ··· io.json.ToJSONLines.time_floats_with_dt_index_lines                                                      131±7ms
[ 70.45%] ··· io.json.ToJSONLines.time_floats_with_int_idex_lines                                                      126±3ms
[ 72.73%] ··· Setting up io/json.py:200                                                                                     ok
[ 72.73%] ··· io.json.ToJSONMem.peakmem_float                                                                            62.3M
[ 75.00%] ··· io.json.ToJSONMem.peakmem_int                                                                              62.5M.
[ 75.00%] · For pandas commit 51db82d9 <master> (round 2/2):
[ 75.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...
[ 75.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 77.27%] ··· io.json.ToJSON.mem_to_json                                                                                    ok
[ 77.27%] ··· ========= ==== ============= ============== =============== ==================
              --                                       frame
              --------- --------------------------------------------------------------------
                orient   df   df_date_idx   df_td_int_ts   df_int_floats   df_int_float_str
              ========= ==== ============= ============== =============== ==================
                split    0         0             0               0                0
               columns   0         0             0               0                0
                index    0         0             0               0                0
                values   0         0             0               0                0
               records   0         0             0               0                0
              ========= ==== ============= ============== =============== ==================

[ 79.55%] ··· io.json.ToJSON.mem_to_json_wide                                                                               ok
[ 79.55%] ··· ========= ==== ============= ============== =============== ==================
              --                                       frame
              --------- --------------------------------------------------------------------
                orient   df   df_date_idx   df_td_int_ts   df_int_floats   df_int_float_str
              ========= ==== ============= ============== =============== ==================
                split    0         0             0               0                0
               columns   0         0             0               0                0
                index    0         0             0               0                0
                values   0         0             0               0                0
               records   0         0             0               0                0
              ========= ==== ============= ============== =============== ==================

[ 81.82%] ··· io.json.ToJSON.time_to_json                                                                                   ok
[ 81.82%] ··· ========= =========== ============= ============== =============== ==================
              --                                           frame
              --------- ---------------------------------------------------------------------------
                orient       df      df_date_idx   df_td_int_ts   df_int_floats   df_int_float_str
              ========= =========== ============= ============== =============== ==================
                split    90.9±10ms     88.8±6ms      92.2±4ms        98.5±7ms         107±2ms
               columns    107±10ms     313±20ms      298±20ms        320±7ms          324±20ms
                index    104±0.8ms     309±10ms      308±10ms        311±5ms          343±10ms
                values    75.1±1ms     76.5±2ms      80.3±3ms        122±20ms         85.3±6ms
               records   95.0±10ms    92.1±10ms      123±10ms        125±7ms          141±30ms
              ========= =========== ============= ============== =============== ==================

[ 84.09%] ··· io.json.ToJSON.time_to_json_wide                                                                              ok
[ 84.09%] ··· ========= =========== ============= ============== =============== ==================
              --                                           frame
              --------- ---------------------------------------------------------------------------
                orient       df      df_date_idx   df_td_int_ts   df_int_floats   df_int_float_str
              ========= =========== ============= ============== =============== ==================
                split     112±6ms      111±5ms       214±30ms        184±10ms         189±30ms
               columns    126±3ms      140±2ms       205±10ms        179±4ms          211±20ms
                index     125±2ms      123±2ms       186±7ms         169±5ms          215±10ms
                values    113±10ms     107±2ms       163±6ms         146±1ms          174±3ms
               records   122±0.7ms     135±10ms      239±50ms        167±3ms          207±20ms
              ========= =========== ============= ============== =============== ==================

[ 86.36%] ··· io.json.ToJSONLines.time_delta_int_tstamp_lines                                                          147±2ms
[ 88.64%] ··· io.json.ToJSONLines.time_float_int_lines                                                                 164±4ms
[ 90.91%] ··· io.json.ToJSONLines.time_float_int_str_lines                                                             171±5ms
[ 93.18%] ··· io.json.ToJSONLines.time_floats_with_dt_index_lines                                                      123±6ms
[ 95.45%] ··· io.json.ToJSONLines.time_floats_with_int_idex_lines                                                     149±20ms
[ 97.73%] ··· Setting up io/json.py:200                                                                                     ok
[ 97.73%] ··· io.json.ToJSONMem.peakmem_float                                                                            62.3M
[100.00%] ··· io.json.ToJSONMem.peakmem_int                                                                              62.3M
.
BENCHMARKS NOT SIGNIFICANTLY CHANGED.

Anyone have thoughts on this?

WillAyd · 2019-09-06T05:02:11Z

Validation added and changed the default to indent=None, with a documentation note on index=0 behavior and how it differs from stdlib

jreback · 2019-09-11T12:23:50Z

pandas/tests/io/json/test_json_table_schema.py

@@ -234,7 +236,11 @@ def test_build_series(self):
                ),
            ]
        )
-        assert result == expected
+
+        if PY35:


can you do this here

WillAyd · 2019-09-16T02:03:58Z

Ping on this one - I think good to go

doc/source/whatsnew/v1.0.0.rst

pandas/_libs/src/ujson/lib/ultrajsonenc.c

pandas/core/generic.py

pandas/tests/io/json/test_pandas.py

WillAyd · 2019-09-16T17:34:28Z

Thanks for the feedback @gfyoung . If no other comments I plan to merge this on Wednesday

botzill · 2020-01-28T13:12:43Z

Hi.

Is indent feature still in pending?

WillAyd added 7 commits May 18, 2019 19:44

Added test

bb9c174

Vendored ujson changes

4c84106

Pass indent to ext module

b99f42b

hacked together working example

367b494

Merge remote-tracking branch 'upstream/master' into json-indent2

1bcf354

Expanded test and clean implementation

43ab17b

Cleaned up docs and whitespace

592be66

WillAyd added the IO JSON read_json, to_json, json_normalize label Aug 24, 2019

WillAyd added this to the 1.0 milestone Aug 24, 2019

gfyoung reviewed Aug 24, 2019

View reviewed changes

pandas/core/generic.py Outdated Show resolved Hide resolved

simonjayhawkins reviewed Aug 25, 2019

View reviewed changes

WillAyd added 4 commits August 25, 2019 16:05

Annotated to_json in pandas.core.generic

5c0e8a3

Simple annotations for to_json

d11e47f

More hints in _json

21672ed

blackify

79c1cbc

WillAyd commented Aug 25, 2019

View reviewed changes

WillAyd force-pushed the json-indent2 branch from 52e6781 to 79c1cbc Compare August 26, 2019 01:52

Merge remote-tracking branch 'upstream/master' into json-indent2

0007e34

Merge remote-tracking branch 'upstream/master' into json-indent2

abdd27f

WillAyd added 3 commits August 27, 2019 11:24

Reused Scalar variable

2da6fbf

Fixed vendored changes

a4f740a

Replaced tabs with spaces

cd0c9e6

WillAyd added 3 commits August 27, 2019 11:53

isort fixup

c740359

whatsnew

2b5cb50

Merge remote-tracking branch 'upstream/master' into json-indent2

638d055

Validate int input

0024f41

Merge remote-tracking branch 'upstream/master' into json-indent2

4869425

jreback requested changes Sep 11, 2019

View reviewed changes

WillAyd added 6 commits September 11, 2019 08:52

Merge remote-tracking branch 'upstream/master' into json-indent2

f03f05f

Added helper function for Py35 compat

b894b8c

Merge remote-tracking branch 'upstream/master' into json-indent2

c4dba2e

Merge remote-tracking branch 'upstream/master' into json-indent2

dc68364

Merge remote-tracking branch 'upstream/master' into json-indent2

6da8684

Reverted line removal

dab8df1