Skip to content

Add Indent Support in to_json #28130

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 48 commits into from
Sep 18, 2019
Merged
Show file tree
Hide file tree
Changes from 37 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
bb9c174
Added test
WillAyd May 19, 2019
4c84106
Vendored ujson changes
WillAyd May 19, 2019
b99f42b
Pass indent to ext module
WillAyd May 19, 2019
367b494
hacked together working example
WillAyd May 19, 2019
1bcf354
Merge remote-tracking branch 'upstream/master' into json-indent2
WillAyd Aug 24, 2019
43ab17b
Expanded test and clean implementation
WillAyd Aug 24, 2019
592be66
Cleaned up docs and whitespace
WillAyd Aug 24, 2019
5c0e8a3
Annotated to_json in pandas.core.generic
WillAyd Aug 25, 2019
d11e47f
Simple annotations for to_json
WillAyd Aug 25, 2019
21672ed
More hints in _json
WillAyd Aug 25, 2019
79c1cbc
blackify
WillAyd Aug 25, 2019
0007e34
Merge remote-tracking branch 'upstream/master' into json-indent2
WillAyd Aug 26, 2019
abdd27f
Merge remote-tracking branch 'upstream/master' into json-indent2
WillAyd Aug 27, 2019
2da6fbf
Reused Scalar variable
WillAyd Aug 27, 2019
a4f740a
Fixed vendored changes
WillAyd Aug 27, 2019
cd0c9e6
Replaced tabs with spaces
WillAyd Aug 27, 2019
c740359
isort fixup
WillAyd Aug 27, 2019
2b5cb50
whatsnew
WillAyd Aug 27, 2019
638d055
Merge remote-tracking branch 'upstream/master' into json-indent2
WillAyd Aug 28, 2019
b870585
Added tests for all orients
WillAyd Sep 2, 2019
ba7f044
Fixed table schema
WillAyd Sep 2, 2019
df589e3
merge conflict fixup
WillAyd Sep 2, 2019
517377b
Simplified logic for building table schema
WillAyd Sep 2, 2019
4aec9d7
Fixed test, removed breakpoint
WillAyd Sep 2, 2019
1aa424d
Merge remote-tracking branch 'upstream/master' into json-indent2
WillAyd Sep 2, 2019
c896b8a
Added whatsnew for fixed issue
WillAyd Sep 2, 2019
b046061
Ran clang-format on objToJSON.c
WillAyd Sep 2, 2019
ae93309
lint fixups
WillAyd Sep 3, 2019
f037d05
Merge remote-tracking branch 'upstream/master' into json-indent2
WillAyd Sep 3, 2019
ccb9823
whitespace fixup
WillAyd Sep 3, 2019
7d757e4
Py35 compat
WillAyd Sep 3, 2019
95251b1
Merge remote-tracking branch 'upstream/master' into json-indent2
WillAyd Sep 3, 2019
65315c3
isort and lint fixups
WillAyd Sep 3, 2019
9827a94
Merge remote-tracking branch 'upstream/master' into json-indent2
WillAyd Sep 6, 2019
0b440e0
Changed default indent to None
WillAyd Sep 6, 2019
0024f41
Validate int input
WillAyd Sep 6, 2019
4869425
Merge remote-tracking branch 'upstream/master' into json-indent2
WillAyd Sep 11, 2019
f03f05f
Merge remote-tracking branch 'upstream/master' into json-indent2
WillAyd Sep 11, 2019
b894b8c
Added helper function for Py35 compat
WillAyd Sep 11, 2019
c4dba2e
Merge remote-tracking branch 'upstream/master' into json-indent2
WillAyd Sep 12, 2019
dc68364
Merge remote-tracking branch 'upstream/master' into json-indent2
WillAyd Sep 12, 2019
6da8684
Merge remote-tracking branch 'upstream/master' into json-indent2
WillAyd Sep 13, 2019
dab8df1
Reverted line removal
WillAyd Sep 13, 2019
b679fee
Merge remote-tracking branch 'upstream/master' into json-indent2
WillAyd Sep 16, 2019
966fadb
Whatsnew fix
WillAyd Sep 16, 2019
c8efda6
Simplified indent code
WillAyd Sep 16, 2019
5067eb7
Added comment for func
WillAyd Sep 16, 2019
f376f12
whitespace doc fixup
WillAyd Sep 16, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ including other versions of pandas.
Enhancements
~~~~~~~~~~~~
- :meth:`DataFrame.to_latex` now accepts ``caption`` and ``label`` arguments (:issue:`25436`)
- :meth:`DataFrame.to_json` now accepts an ``indent`` integer argument to enable pretty printing of JSON output (:issue:`12004`)
-

.. _whatsnew_1000.enhancements.other:
Expand Down Expand Up @@ -184,6 +185,7 @@ I/O
- Bug in :meth:`DataFrame.to_json` where using a Tuple as a column or index value and using ``orient="columns"`` or ``orient="index"`` would produce invalid JSON (:issue:`20500`)
- Improve infinity parsing. :meth:`read_csv` now interprets ``Infinity``, ``+Infinity``, ``-Infinity`` as floating point values (:issue:`10065`)
- Bug in :meth:`DataFrame.to_csv` where values were truncated when the length of ``na_rep`` was shorter than the text input data. (:issue:`25099`)
- Bug in :meth:`DataFrame.to_json` where a datetime column label would not be written out in iso format with ``orient="table"`` (:issue:`28130`)

Plotting
^^^^^^^^
Expand Down
4 changes: 4 additions & 0 deletions pandas/_libs/src/ujson/lib/ultrajson.h
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,10 @@ typedef struct __JSONObjectEncoder {
If true, '<', '>', and '&' characters will be encoded as \u003c, \u003e, and \u0026, respectively. If false, no special encoding will be used. */
int encodeHTMLChars;

/*
Configuration for spaces of indent */
int indent;

/*
Set to an error message if error occurred */
const char *errorMsg;
Expand Down
26 changes: 24 additions & 2 deletions pandas/_libs/src/ujson/lib/ultrajsonenc.c
Original file line number Diff line number Diff line change
Expand Up @@ -728,6 +728,20 @@ FASTCALL_ATTR INLINE_PREFIX void FASTCALL_MSVC strreverse(char *begin,
while (end > begin) aux = *end, *end-- = *begin, *begin++ = aux;
}

void Buffer_AppendIndentNewlineUnchecked(JSONObjectEncoder *enc)
{
if (enc->indent > 0) Buffer_AppendCharUnchecked(enc, '\n');
}

void Buffer_AppendIndentUnchecked(JSONObjectEncoder *enc, JSINT32 value)
{
int i;
if (enc->indent > 0)
while (value-- > 0)
for (i = 0; i < enc->indent; i++)
Buffer_AppendCharUnchecked(enc, ' ');
}

void Buffer_AppendIntUnchecked(JSONObjectEncoder *enc, JSINT32 value) {
char *wstr;
JSUINT32 uvalue = (value < 0) ? -value : value;
Expand Down Expand Up @@ -960,24 +974,28 @@ void encode(JSOBJ obj, JSONObjectEncoder *enc, const char *name,
enc->iterBegin(obj, &tc);

Buffer_AppendCharUnchecked(enc, '[');
Buffer_AppendIndentNewlineUnchecked (enc);

while (enc->iterNext(obj, &tc)) {
if (count > 0) {
Buffer_AppendCharUnchecked(enc, ',');
#ifndef JSON_NO_EXTRA_WHITESPACE
Buffer_AppendCharUnchecked(buffer, ' ');
#endif
Buffer_AppendIndentNewlineUnchecked (enc);
}

iterObj = enc->iterGetValue(obj, &tc);

enc->level++;
Buffer_AppendIndentUnchecked (enc, enc->level);
encode(iterObj, enc, NULL, 0);
count++;
}

enc->iterEnd(obj, &tc);
Buffer_Reserve(enc, 2);
Buffer_AppendIndentNewlineUnchecked (enc);
Buffer_AppendIndentUnchecked (enc, enc->level);
Buffer_AppendCharUnchecked(enc, ']');
break;
}
Expand All @@ -987,25 +1005,29 @@ void encode(JSOBJ obj, JSONObjectEncoder *enc, const char *name,
enc->iterBegin(obj, &tc);

Buffer_AppendCharUnchecked(enc, '{');
Buffer_AppendIndentNewlineUnchecked (enc);

while (enc->iterNext(obj, &tc)) {
if (count > 0) {
Buffer_AppendCharUnchecked(enc, ',');
#ifndef JSON_NO_EXTRA_WHITESPACE
Buffer_AppendCharUnchecked(enc, ' ');
#endif
Buffer_AppendIndentNewlineUnchecked (enc);
}

iterObj = enc->iterGetValue(obj, &tc);
objName = enc->iterGetName(obj, &tc, &szlen);

enc->level++;
Buffer_AppendIndentUnchecked (enc, enc->level);
encode(iterObj, enc, objName, szlen);
count++;
}

enc->iterEnd(obj, &tc);
Buffer_Reserve(enc, 2);
Buffer_AppendIndentNewlineUnchecked (enc);
Buffer_AppendIndentUnchecked (enc, enc->level);
Buffer_AppendCharUnchecked(enc, '}');
break;
}
Expand Down
24 changes: 17 additions & 7 deletions pandas/_libs/src/ujson/python/objToJSON.c
Original file line number Diff line number Diff line change
Expand Up @@ -2373,10 +2373,16 @@ char *Object_iterGetName(JSOBJ obj, JSONTypeContext *tc, size_t *outLen) {
}

PyObject *objToJSON(PyObject *self, PyObject *args, PyObject *kwargs) {
static char *kwlist[] = {
"obj", "ensure_ascii", "double_precision", "encode_html_chars",
"orient", "date_unit", "iso_dates", "default_handler",
NULL};
static char *kwlist[] = {"obj",
"ensure_ascii",
"double_precision",
"encode_html_chars",
"orient",
"date_unit",
"iso_dates",
"default_handler",
"indent",
NULL};

char buffer[65536];
char *ret;
Expand All @@ -2389,6 +2395,7 @@ PyObject *objToJSON(PyObject *self, PyObject *args, PyObject *kwargs) {
char *sdateFormat = NULL;
PyObject *oisoDates = 0;
PyObject *odefHandler = 0;
int indent = 0;

PyObjectEncoder pyEncoder = {{
Object_beginTypeContext,
Expand All @@ -2410,6 +2417,7 @@ PyObject *objToJSON(PyObject *self, PyObject *args, PyObject *kwargs) {
idoublePrecision,
1, // forceAscii
0, // encodeHTMLChars
0, // indent
}};
JSONObjectEncoder *encoder = (JSONObjectEncoder *)&pyEncoder;

Expand All @@ -2434,10 +2442,10 @@ PyObject *objToJSON(PyObject *self, PyObject *args, PyObject *kwargs) {

PRINTMARK();

if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|OiOssOO", kwlist, &oinput,
&oensureAscii, &idoublePrecision,
if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|OiOssOOi", kwlist,
&oinput, &oensureAscii, &idoublePrecision,
&oencodeHTMLChars, &sOrient, &sdateFormat,
&oisoDates, &odefHandler)) {
&oisoDates, &odefHandler, &indent)) {
return NULL;
}

Expand Down Expand Up @@ -2503,6 +2511,8 @@ PyObject *objToJSON(PyObject *self, PyObject *args, PyObject *kwargs) {
pyEncoder.defaultHandler = odefHandler;
}

encoder->indent = indent;

pyEncoder.originalOutputFormat = pyEncoder.outputFormat;
PRINTMARK();
ret = JSON_EncodeObject(oinput, encoder, buffer, sizeof(buffer));
Expand Down
2 changes: 1 addition & 1 deletion pandas/_typing.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
FilePathOrBuffer = Union[str, Path, IO[AnyStr]]

FrameOrSeries = TypeVar("FrameOrSeries", bound="NDFrame")
Scalar = Union[str, int, float]
Scalar = Union[str, int, float, bool]
Axis = Union[str, int]
Ordered = Optional[bool]

Expand Down
46 changes: 34 additions & 12 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
import re
from textwrap import dedent
from typing import (
Any,
Callable,
Dict,
FrozenSet,
Expand Down Expand Up @@ -61,7 +62,7 @@
from pandas.core.dtypes.missing import isna, notna

import pandas as pd
from pandas._typing import Dtype, FilePathOrBuffer
from pandas._typing import Dtype, FilePathOrBuffer, Scalar
from pandas.core import missing, nanops
import pandas.core.algorithms as algos
from pandas.core.base import PandasObject, SelectionMixin
Expand Down Expand Up @@ -2249,17 +2250,18 @@ def to_excel(

def to_json(
self,
path_or_buf=None,
orient=None,
date_format=None,
double_precision=10,
force_ascii=True,
date_unit="ms",
default_handler=None,
lines=False,
compression="infer",
index=True,
):
path_or_buf: Optional[FilePathOrBuffer] = None,
orient: Optional[str] = None,
date_format: Optional[str] = None,
double_precision: int = 10,
force_ascii: bool_t = True,
date_unit: str = "ms",
default_handler: Optional[Callable[[Any], Union[Scalar, List, Dict]]] = None,
lines: bool_t = False,
compression: Optional[str] = "infer",
index: bool_t = True,
indent: Optional[int] = None,
) -> Optional[str]:
"""
Convert the object to a JSON string.

Expand Down Expand Up @@ -2339,6 +2341,11 @@ def to_json(

.. versionadded:: 0.23.0

indent : integer, optional
Length of whitespace used to indent each record.

.. versionadded:: 1.0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this match the semantics of the stdlib, e.g. default is None and 0 means insert newlines, also can be a string, is that supported? https://docs.python.org/3/library/json.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this only accepts int for now and 0 does not insert new lines. This matches ujson behavior instead of stdlib

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to support this in the long term I just think will take a little bit of effort to bridge the gap with our vendored ujson. Thinking for now it might make the most sense to change the signature to indent=None and have that map to indent=0, documenting the the indent=0 behavior is different from stdlib.

I think that should avoid a deprecation cycle in the future for indent=0 behavior if we eventually can mirror the stdlib's support


Returns
-------
None or str
Expand All @@ -2349,6 +2356,13 @@ def to_json(
--------
read_json

Notes
-----
The behavior of ``indent=0`` varies from the stdlib, which does not
indent the output but does insert newlines. Currently, ``indent=0``
and the default ``indent=None`` are equivalent in pandas, though this
may change in a future release.

Examples
--------

Expand Down Expand Up @@ -2399,6 +2413,13 @@ def to_json(
date_format = "iso"
elif date_format is None:
date_format = "epoch"

config.is_nonnegative_int(indent)
if indent is None:
int_indent = 0
else:
int_indent = indent

return json.to_json(
path_or_buf=path_or_buf,
obj=self,
Expand All @@ -2411,6 +2432,7 @@ def to_json(
lines=lines,
compression=compression,
index=index,
indent=int_indent,
)

def to_hdf(self, path_or_buf, key, **kwargs):
Expand Down
Loading