Skip to content

DEP: Enforce numpy keyword deprecation in read_json #49083

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Oct 20, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 0 additions & 70 deletions doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2111,8 +2111,6 @@ is ``None``. To explicitly force ``Series`` parsing, pass ``typ=series``
* ``convert_axes`` : boolean, try to convert the axes to the proper dtypes, default is ``True``
* ``convert_dates`` : a list of columns to parse for dates; If ``True``, then try to parse date-like columns, default is ``True``.
* ``keep_default_dates`` : boolean, default ``True``. If parsing dates, then parse the default date-like columns.
* ``numpy`` : direct decoding to NumPy arrays. default is ``False``;
Supports numeric data only, although labels may be non-numeric. Also note that the JSON ordering **MUST** be the same for each term if ``numpy=True``.
* ``precise_float`` : boolean, default ``False``. Set to enable usage of higher precision (strtod) function when decoding string to double values. Default (``False``) is to use fast but less precise builtin functionality.
* ``date_unit`` : string, the timestamp unit to detect if converting dates. Default
None. By default the timestamp precision will be detected, if this is not desired
Expand Down Expand Up @@ -2216,74 +2214,6 @@ Dates written in nanoseconds need to be read back in nanoseconds:
dfju = pd.read_json(json, date_unit="ns")
dfju

The Numpy parameter
+++++++++++++++++++

.. note::
This param has been deprecated as of version 1.0.0 and will raise a ``FutureWarning``.

This supports numeric data only. Index and columns labels may be non-numeric, e.g. strings, dates etc.

If ``numpy=True`` is passed to ``read_json`` an attempt will be made to sniff
an appropriate dtype during deserialization and to subsequently decode directly
to NumPy arrays, bypassing the need for intermediate Python objects.

This can provide speedups if you are deserialising a large amount of numeric
data:

.. ipython:: python

randfloats = np.random.uniform(-100, 1000, 10000)
randfloats.shape = (1000, 10)
dffloats = pd.DataFrame(randfloats, columns=list("ABCDEFGHIJ"))

jsonfloats = dffloats.to_json()

.. ipython:: python

%timeit pd.read_json(jsonfloats)

.. ipython:: python
:okwarning:

%timeit pd.read_json(jsonfloats, numpy=True)

The speedup is less noticeable for smaller datasets:

.. ipython:: python

jsonfloats = dffloats.head(100).to_json()

.. ipython:: python

%timeit pd.read_json(jsonfloats)

.. ipython:: python
:okwarning:

%timeit pd.read_json(jsonfloats, numpy=True)

.. warning::

Direct NumPy decoding makes a number of assumptions and may fail or produce
unexpected output if these assumptions are not satisfied:

- data is numeric.

- data is uniform. The dtype is sniffed from the first value decoded.
A ``ValueError`` may be raised, or incorrect output may be produced
if this condition is not satisfied.

- labels are ordered. Labels are only read from the first container, it is assumed
that each subsequent row / column has been encoded in the same order. This should be satisfied if the
data was encoded using ``to_json`` but may not be the case if the JSON
is from another source.

.. ipython:: python
:suppress:

os.remove("test.json")

.. _io.json_normalize:

Normalization
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,7 @@ Removal of prior version deprecations/changes
- Removed the ``numeric_only`` keyword from :meth:`Categorical.min` and :meth:`Categorical.max` in favor of ``skipna`` (:issue:`48821`)
- Removed :func:`is_extension_type` in favor of :func:`is_extension_array_dtype` (:issue:`29457`)
- Remove :meth:`DataFrameGroupBy.pad` and :meth:`DataFrameGroupBy.backfill` (:issue:`45076`)
- Remove ``numpy`` argument from :func:`read_json` (:issue:`30636`)
- Removed the ``center`` keyword in :meth:`DataFrame.expanding` (:issue:`20647`)
- Enforced :meth:`Rolling.count` with ``min_periods=None`` to default to the size of the window (:issue:`31302`)

Expand Down
87 changes: 3 additions & 84 deletions pandas/_libs/src/ujson/python/JSONtoObj.c
Original file line number Diff line number Diff line change
Expand Up @@ -83,12 +83,6 @@ JSOBJ Object_npyNewArrayList(void *prv, void *decoder);
JSOBJ Object_npyEndArrayList(void *prv, JSOBJ obj);
int Object_npyArrayListAddItem(void *prv, JSOBJ obj, JSOBJ value);

// labelled support, encode keys and values of JS object into separate numpy
// arrays
JSOBJ Object_npyNewObject(void *prv, void *decoder);
JSOBJ Object_npyEndObject(void *prv, JSOBJ obj);
int Object_npyObjectAddKey(void *prv, JSOBJ obj, JSOBJ name, JSOBJ value);

// free the numpy context buffer
void Npy_releaseContext(NpyArrContext *npyarr) {
PRINTMARK();
Expand Down Expand Up @@ -374,68 +368,6 @@ int Object_npyArrayListAddItem(void *prv, JSOBJ obj, JSOBJ value) {
return 1;
}

JSOBJ Object_npyNewObject(void *prv, void *_decoder) {
PyObjectDecoder *decoder = (PyObjectDecoder *)_decoder;
PRINTMARK();
if (decoder->curdim > 1) {
PyErr_SetString(PyExc_ValueError,
"labels only supported up to 2 dimensions");
return NULL;
}

return ((JSONObjectDecoder *)decoder)->newArray(prv, decoder);
}

JSOBJ Object_npyEndObject(void *prv, JSOBJ obj) {
PyObject *list;
npy_intp labelidx;
NpyArrContext *npyarr = (NpyArrContext *)obj;
PRINTMARK();
if (!npyarr) {
return NULL;
}

labelidx = npyarr->dec->curdim - 1;

list = npyarr->labels[labelidx];
if (list) {
npyarr->labels[labelidx] = PyArray_FROM_O(list);
Py_DECREF(list);
}

return (PyObject *)((JSONObjectDecoder *)npyarr->dec)->endArray(prv, obj);
}

int Object_npyObjectAddKey(void *prv, JSOBJ obj, JSOBJ name, JSOBJ value) {
PyObject *label, *labels;
npy_intp labelidx;
// add key to label array, value to values array
NpyArrContext *npyarr = (NpyArrContext *)obj;
PRINTMARK();
if (!npyarr) {
return 0;
}

label = (PyObject *)name;
labelidx = npyarr->dec->curdim - 1;

if (!npyarr->labels[labelidx]) {
npyarr->labels[labelidx] = PyList_New(0);
}
labels = npyarr->labels[labelidx];
// only fill label array once, assumes all column labels are the same
// for 2-dimensional arrays.
if (PyList_Check(labels) && PyList_GET_SIZE(labels) <= npyarr->elcount) {
PyList_Append(labels, label);
}

if (((JSONObjectDecoder *)npyarr->dec)->arrayAddItem(prv, obj, value)) {
Py_DECREF(label);
return 1;
}
return 0;
}

int Object_objectAddKey(void *prv, JSOBJ obj, JSOBJ name, JSOBJ value) {
int ret = PyDict_SetItem(obj, name, value);
Py_DECREF((PyObject *)name);
Expand Down Expand Up @@ -494,7 +426,7 @@ static void Object_releaseObject(void *prv, JSOBJ obj, void *_decoder) {
}
}

static char *g_kwlist[] = {"obj", "precise_float", "numpy",
static char *g_kwlist[] = {"obj", "precise_float",
"labelled", "dtype", NULL};

PyObject *JSONToObj(PyObject *self, PyObject *args, PyObject *kwargs) {
Expand All @@ -505,7 +437,7 @@ PyObject *JSONToObj(PyObject *self, PyObject *args, PyObject *kwargs) {
JSONObjectDecoder *decoder;
PyObjectDecoder pyDecoder;
PyArray_Descr *dtype = NULL;
int numpy = 0, labelled = 0;
int labelled = 0;

JSONObjectDecoder dec = {
Object_newString, Object_objectAddKey, Object_arrayAddItem,
Expand All @@ -528,7 +460,7 @@ PyObject *JSONToObj(PyObject *self, PyObject *args, PyObject *kwargs) {
decoder = (JSONObjectDecoder *)&pyDecoder;

if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|OiiO&", g_kwlist, &arg,
&opreciseFloat, &numpy, &labelled,
&opreciseFloat, &labelled,
PyArray_DescrConverter2, &dtype)) {
Npy_releaseContext(pyDecoder.npyarr);
return NULL;
Expand All @@ -554,19 +486,6 @@ PyObject *JSONToObj(PyObject *self, PyObject *args, PyObject *kwargs) {
decoder->errorStr = NULL;
decoder->errorOffset = NULL;

if (numpy) {
pyDecoder.dtype = dtype;
decoder->newArray = Object_npyNewArray;
decoder->endArray = Object_npyEndArray;
decoder->arrayAddItem = Object_npyArrayAddItem;

if (labelled) {
decoder->newObject = Object_npyNewObject;
decoder->endObject = Object_npyEndObject;
decoder->objectAddKey = Object_npyObjectAddKey;
}
}

ret = JSON_DecodeObject(decoder, PyBytes_AS_STRING(sarg),
PyBytes_GET_SIZE(sarg));

Expand Down
Loading