-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Add Indent Support in to_json #28130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/generic.py
Outdated
@@ -2249,6 +2249,7 @@ def to_json( | |||
lines=False, | |||
compression="infer", | |||
index=True, | |||
indent=0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add type hints when modifying code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK type hints added where feasible, comments where I think not
pandas/core/generic.py
Outdated
double_precision: int = 10, | ||
force_ascii: bool_t = True, | ||
date_unit: str = "ms", | ||
default_handler: Optional[Callable[[Any], Union[int, str, List, Dict]]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return for the annotation here should match the Serializable
alias in the _json file. Not sure if better to put in a shared place like _typing or duplicate here; open to suggestions
pandas/io/json/_json.py
Outdated
@@ -31,20 +32,23 @@ | |||
|
|||
TABLE_SCHEMA_VERSION = "0.20.0" | |||
|
|||
Serializable = Union[int, str, List, Dict] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we wanted to go crazy could subscript List and Dict recursively and with int or str, but I don't think mypy yet supports recursive types. Figure this is a good enough tradeoff
@@ -31,20 +32,23 @@ | |||
|
|||
TABLE_SCHEMA_VERSION = "0.20.0" | |||
|
|||
Serializable = Union[int, str, List, Dict] | |||
|
|||
|
|||
# interface to/from | |||
def to_json( | |||
path_or_buf, | |||
obj, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think obj
should be FrameOrSeries, but since it gets passed too pandas.core.generic I was getting errors like
pandas/core/generic.py:2401: error: Value of type variable "FrameOrSeries" of "to_json" cannot be "NDFrame"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think FrameOrSeries in _typing should be changed to FrameOrSeries = TypeVar("FrameOrSeries", bound="NDFrame")
as used in #27646
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea - I think worth exploring as a separate PR for sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a look at this after merging master and still doesn't work because the SeriesWriter _write method seems to re-use this variable for a dict. I'm not sure if that's needed so should be able to refactor but probably better as a follow up to not cram too much in here
): | ||
self.obj = obj | ||
|
||
if orient is None: | ||
orient = self._default_orient | ||
orient = self._default_orient # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
orient
is probably better as an abstract property but I figure best to leave that refactor to a follow up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment?
although adding an abstract property is straightforward, if you add if TYPE-CHECKING it won't affect runtime behaviour.
if TYPE_CHECKING:
_default_orient = None # type: <...>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's cool I'd prefer to do this as a quick follow up. I find a lot of TYPE_CHECKING blocks to be distracting so like to avoid where possible. It won't be difficult just want to minimize diff in extension change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no problem.
I don't think we should avoid these blocks as too beneficial.
in a (hopefully not too distant) future PR, i'll be adding...
if TYPE_CHECKING: # attributes index and columns are created dynamically
index = None # type: Index
columns = None # type: Index
to frame.py so that i can get types for self.fmt.tr_frame.index.format
and self.frame.index.nlevels
etc.
in this case all tests pass without the if TYPE_CHECKING
but would rather not change runtime behaviour if avoidable.
@@ -31,20 +32,23 @@ | |||
|
|||
TABLE_SCHEMA_VERSION = "0.20.0" | |||
|
|||
Serializable = Union[int, str, List, Dict] | |||
|
|||
|
|||
# interface to/from | |||
def to_json( | |||
path_or_buf, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be FilePathOrBuffer
but was failing with
pandas/io/json/_json.py:97: error: Item "Path" of "Union[Path, IO[Any]]" has no attribute "write"
I think this is actually a problem with the return of _stringify_path
since it doesn't indicate that Path objects get converted to strings. Can open an issue as a follow up but @simonjayhawkins may already be working on that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, should already be fixed with overloads in #27732.
@WillAyd can you merge master. |
Actually just realized that the simplejson implementation prefers indent to be a string rather than an int. Let me see if I can work that in here |
Nevermind ignore previous comment - mixed up my JSON libraries this morning. ujson only supports an int |
OK I think this is ready. If it helps for review here is the reference implementation that I've vendored into our ujson copy: |
For completeness here are benchmarks showing no change: · Running 22 total benchmarks (2 commits * 1 environments * 11 benchmarks)
[ 0.00%] · For pandas commit 51db82d9 <master> (round 1/2):
[ 0.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...............................................
[ 0.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 6.82%] ··· Running (io.json.ToJSON.time_to_json--).
[ 9.09%] ··· Running (io.json.ToJSON.time_to_json_wide--).
[ 11.36%] ··· Running (io.json.ToJSONLines.time_delta_int_tstamp_lines--).....
[ 25.00%] · For pandas commit 638d0552 <json-indent2> (round 1/2):
[ 25.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...
[ 25.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 31.82%] ··· Running (io.json.ToJSON.time_to_json--).
[ 34.09%] ··· Running (io.json.ToJSON.time_to_json_wide--).
[ 36.36%] ··· Running (io.json.ToJSONLines.time_delta_int_tstamp_lines--).....
[ 50.00%] · For pandas commit 638d0552 <json-indent2> (round 2/2):
[ 50.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 52.27%] ··· io.json.ToJSON.mem_to_json ok
[ 52.27%] ··· ========= ==== ============= ============== =============== ==================
-- frame
--------- --------------------------------------------------------------------
orient df df_date_idx df_td_int_ts df_int_floats df_int_float_str
========= ==== ============= ============== =============== ==================
split 0 0 0 0 0
columns 0 0 0 0 0
index 0 0 0 0 0
values 0 0 0 0 0
records 0 0 0 0 0
========= ==== ============= ============== =============== ==================
[ 54.55%] ··· io.json.ToJSON.mem_to_json_wide ok
[ 54.55%] ··· ========= ==== ============= ============== =============== ==================
-- frame
--------- --------------------------------------------------------------------
orient df df_date_idx df_td_int_ts df_int_floats df_int_float_str
========= ==== ============= ============== =============== ==================
split 0 0 0 0 0
columns 0 0 0 0 0
index 0 0 0 0 0
values 0 0 0 0 0
records 0 0 0 0 0
========= ==== ============= ============== =============== ==================
[ 56.82%] ··· io.json.ToJSON.time_to_json ok
[ 56.82%] ··· ========= ========== ============= ============== =============== ==================
-- frame
--------- --------------------------------------------------------------------------
orient df df_date_idx df_td_int_ts df_int_floats df_int_float_str
========= ========== ============= ============== =============== ==================
split 93.9±5ms 94.2±9ms 98.5±6ms 102±5ms 102±20ms
columns 95.6±2ms 308±20ms 352±40ms 321±60ms 313±40ms
index 121±3ms 319±10ms 318±20ms 318±5ms 316±3ms
values 78.3±3ms 78.6±1ms 79.3±1ms 88.4±1ms 87.5±1ms
records 89.1±6ms 92.0±2ms 109±2ms 117±2ms 116±3ms
========= ========== ============= ============== =============== ==================
[ 59.09%] ··· io.json.ToJSON.time_to_json_wide ok
[ 59.09%] ··· ========= ========== ============= ============== =============== ==================
-- frame
--------- --------------------------------------------------------------------------
orient df df_date_idx df_td_int_ts df_int_floats df_int_float_str
========= ========== ============= ============== =============== ==================
split 110±1ms 110±2ms 168±4ms 154±6ms 181±20ms
columns 123±1ms 139±2ms 190±2ms 176±2ms 200±3ms
index 125±3ms 125±2ms 188±0.9ms 168±4ms 207±7ms
values 109±9ms 112±7ms 170±10ms 149±10ms 191±20ms
records 139±10ms 137±10ms 199±10ms 171±20ms 195±6ms
========= ========== ============= ============== =============== ==================
[ 61.36%] ··· io.json.ToJSONLines.time_delta_int_tstamp_lines 147±3ms
[ 63.64%] ··· io.json.ToJSONLines.time_float_int_lines 163±4ms
[ 65.91%] ··· io.json.ToJSONLines.time_float_int_str_lines 168±10ms
[ 68.18%] ··· io.json.ToJSONLines.time_floats_with_dt_index_lines 131±7ms
[ 70.45%] ··· io.json.ToJSONLines.time_floats_with_int_idex_lines 126±3ms
[ 72.73%] ··· Setting up io/json.py:200 ok
[ 72.73%] ··· io.json.ToJSONMem.peakmem_float 62.3M
[ 75.00%] ··· io.json.ToJSONMem.peakmem_int 62.5M.
[ 75.00%] · For pandas commit 51db82d9 <master> (round 2/2):
[ 75.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...
[ 75.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 77.27%] ··· io.json.ToJSON.mem_to_json ok
[ 77.27%] ··· ========= ==== ============= ============== =============== ==================
-- frame
--------- --------------------------------------------------------------------
orient df df_date_idx df_td_int_ts df_int_floats df_int_float_str
========= ==== ============= ============== =============== ==================
split 0 0 0 0 0
columns 0 0 0 0 0
index 0 0 0 0 0
values 0 0 0 0 0
records 0 0 0 0 0
========= ==== ============= ============== =============== ==================
[ 79.55%] ··· io.json.ToJSON.mem_to_json_wide ok
[ 79.55%] ··· ========= ==== ============= ============== =============== ==================
-- frame
--------- --------------------------------------------------------------------
orient df df_date_idx df_td_int_ts df_int_floats df_int_float_str
========= ==== ============= ============== =============== ==================
split 0 0 0 0 0
columns 0 0 0 0 0
index 0 0 0 0 0
values 0 0 0 0 0
records 0 0 0 0 0
========= ==== ============= ============== =============== ==================
[ 81.82%] ··· io.json.ToJSON.time_to_json ok
[ 81.82%] ··· ========= =========== ============= ============== =============== ==================
-- frame
--------- ---------------------------------------------------------------------------
orient df df_date_idx df_td_int_ts df_int_floats df_int_float_str
========= =========== ============= ============== =============== ==================
split 90.9±10ms 88.8±6ms 92.2±4ms 98.5±7ms 107±2ms
columns 107±10ms 313±20ms 298±20ms 320±7ms 324±20ms
index 104±0.8ms 309±10ms 308±10ms 311±5ms 343±10ms
values 75.1±1ms 76.5±2ms 80.3±3ms 122±20ms 85.3±6ms
records 95.0±10ms 92.1±10ms 123±10ms 125±7ms 141±30ms
========= =========== ============= ============== =============== ==================
[ 84.09%] ··· io.json.ToJSON.time_to_json_wide ok
[ 84.09%] ··· ========= =========== ============= ============== =============== ==================
-- frame
--------- ---------------------------------------------------------------------------
orient df df_date_idx df_td_int_ts df_int_floats df_int_float_str
========= =========== ============= ============== =============== ==================
split 112±6ms 111±5ms 214±30ms 184±10ms 189±30ms
columns 126±3ms 140±2ms 205±10ms 179±4ms 211±20ms
index 125±2ms 123±2ms 186±7ms 169±5ms 215±10ms
values 113±10ms 107±2ms 163±6ms 146±1ms 174±3ms
records 122±0.7ms 135±10ms 239±50ms 167±3ms 207±20ms
========= =========== ============= ============== =============== ==================
[ 86.36%] ··· io.json.ToJSONLines.time_delta_int_tstamp_lines 147±2ms
[ 88.64%] ··· io.json.ToJSONLines.time_float_int_lines 164±4ms
[ 90.91%] ··· io.json.ToJSONLines.time_float_int_str_lines 171±5ms
[ 93.18%] ··· io.json.ToJSONLines.time_floats_with_dt_index_lines 123±6ms
[ 95.45%] ··· io.json.ToJSONLines.time_floats_with_int_idex_lines 149±20ms
[ 97.73%] ··· Setting up io/json.py:200 ok
[ 97.73%] ··· io.json.ToJSONMem.peakmem_float 62.3M
[100.00%] ··· io.json.ToJSONMem.peakmem_int 62.3M
.
BENCHMARKS NOT SIGNIFICANTLY CHANGED. Anyone have thoughts on this? |
Validation added and changed the default to |
@@ -234,7 +236,11 @@ def test_build_series(self): | |||
), | |||
] | |||
) | |||
assert result == expected | |||
|
|||
if PY35: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you do this here
Ping on this one - I think good to go |
Thanks for the feedback @gfyoung . If no other comments I plan to merge this on Wednesday |
Hi. Is indent feature still in pending? |
to_json
method #12004black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
After some of the recent refactor this should be relatively easy to do after vendoring a few updates