Replaced void pointers with Types in JSON Datetime Conversions #30283

WillAyd · 2019-12-15T23:45:07Z

This also incorporates the clean up from #30271

Benchmarks show the following benefit, though most likely just noise:

.       before           after         ratio
     [04fce81f]       [f8ed75d0]
     <stash~6^2>       <json-dates>
-         336±9ms          302±4ms     0.90  io.json.ToJSON.time_to_json('columns', 'df_int_floats')

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

One follow up to this can drastically speed up datetime index handling - right now these are done in the Python space but can reuse these same functions

WillAyd · 2019-12-15T23:46:07Z

pandas/_libs/src/ujson/python/objToJSON.c

 }

-static void *PandasDateTimeStructToJSON(npy_datetimestruct *dts,


The diff is a little difficult to read here, but this got rid of the ToJSON functions and instead replaces them with explicit ToIso or ToEpoch functions with strong types

jbrockmendel · 2019-12-16T18:54:19Z

pandas/_libs/src/ujson/python/objToJSON.c

+    *len = (size_t)get_datetime_iso_8601_strlen(0, base);
+    char *result = PyObject_Malloc(*len);
+
+    if (result == NULL) {


how would we get here? if len == 0?

This would happen if malloc fails

https://stackoverflow.com/a/12434865/621736

So failing to address it would mean getting something like a segfault? the PyErr_NoMemory just below raises back in py-space?

I think it would segfault if you tried to access the memory if malloc failed to allocate it. PyErr_NoMemory would just set the global error indicator so when the extension exits that should raise back in the Python space (assuming nothing else clears it out)

FWIW the error handling here is copy / paste from here:

pandas/pandas/_libs/src/ujson/python/objToJSON.c

Line 422 in 98cb432

PyErr_NoMemory();

There are similar checks elsewhere throughout the module (not always consistent) so maybe worth unifying in a follow up

pandas/_libs/src/ujson/python/objToJSON.c

jbrockmendel · 2019-12-18T20:49:32Z

pandas/_libs/src/ujson/python/objToJSON.c

-                             size_t *_outLen) {
-  return PyUnicode_AsUTF8AndSize(_obj, _outLen);
+static char *PyUnicodeToUTF8(JSOBJ _obj, JSONTypeContext *tc, size_t *_outLen) {
+    return (char *)PyUnicode_AsUTF8AndSize(_obj, (Py_ssize_t *)_outLen);


PyUnicode_AsUTF8AndSize came up in the thread about unicode surrogates. do we need to worry about those here?

@WillAyd i think this is the same function where passing a surrogate unicode character can cause a segfault. can that be ruled out here?

jbrockmendel · 2019-12-18T20:54:06Z

With usual caveat about my C-fu being weak, LGTM. cc @jreback

jbrockmendel · 2019-12-23T21:38:54Z

@jreback gentle ping, just realized this is a blocker for #28595.

jreback · 2019-12-24T17:28:03Z

thanks @WillAyd looks more clear

…s-dev#30283)

WillAyd added 14 commits December 13, 2019 19:07

new implementations

d4b6c72

Merge remote-tracking branch 'upstream/master' into json-dates

e7d4880

Stop point

d2c6d36

Fixed issue with npy_datetime

6b6b669

Fixed ISO str len return

ffd07a1

infinite loop fix

9f7d9ab

revert encode labels typo

3fc9b90

dt iso fix

7471721

Fixed pydatetime truncation

a5201f9

clang format

946633e

Renamed JSON to UTF8

087df67

Char qualifiers

47c6ea1

Removed unused part of signature

04f2d16

clang format

f8ed75d

WillAyd commented Dec 15, 2019

View reviewed changes

simonjayhawkins added the IO JSON read_json, to_json, json_normalize label Dec 16, 2019

jbrockmendel reviewed Dec 16, 2019

View reviewed changes

pandas/_libs/src/ujson/python/objToJSON.c Show resolved Hide resolved

jbrockmendel reviewed Dec 16, 2019

View reviewed changes

pandas/_libs/src/ujson/python/objToJSON.c Show resolved Hide resolved

jbrockmendel reviewed Dec 16, 2019

View reviewed changes

pandas/_libs/src/ujson/python/objToJSON.c Show resolved Hide resolved

Warnings fixup

98cb432

WillAyd mentioned this pull request Dec 17, 2019

BUG: Timedelta not formatted correctly in to_json #28595

Closed

5 tasks

Merge remote-tracking branch 'upstream/master' into json-dates

09b00fd

jbrockmendel reviewed Dec 18, 2019

View reviewed changes

jreback added this to the 1.0 milestone Dec 24, 2019

jreback merged commit 2a6c2d7 into pandas-dev:master Dec 24, 2019

WillAyd deleted the json-dates branch December 24, 2019 17:28

AlexKirko pushed a commit to AlexKirko/pandas that referenced this pull request Dec 29, 2019

Replaced void pointers with Types in JSON Datetime Conversions (panda…

7e0b915

…s-dev#30283)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Replaced void pointers with Types in JSON Datetime Conversions #30283

Replaced void pointers with Types in JSON Datetime Conversions #30283

Uh oh!

WillAyd commented Dec 15, 2019

Uh oh!

WillAyd Dec 15, 2019

Uh oh!

jbrockmendel Dec 16, 2019

Uh oh!

WillAyd Dec 16, 2019

Uh oh!

jbrockmendel Dec 16, 2019

Uh oh!

WillAyd Dec 16, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jbrockmendel Dec 18, 2019

Uh oh!

jbrockmendel Dec 21, 2019

Uh oh!

jbrockmendel commented Dec 18, 2019

Uh oh!

jbrockmendel commented Dec 23, 2019

Uh oh!

jreback commented Dec 24, 2019

Uh oh!

Uh oh!

		}

		static void PandasDateTimeStructToJSON(npy_datetimestruct dts,

Uh oh!

Replaced void pointers with Types in JSON Datetime Conversions #30283

Replaced void pointers with Types in JSON Datetime Conversions #30283

Uh oh!

Conversation

WillAyd commented Dec 15, 2019

Uh oh!

WillAyd Dec 15, 2019

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Dec 16, 2019

Choose a reason for hiding this comment

Uh oh!

WillAyd Dec 16, 2019

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Dec 16, 2019

Choose a reason for hiding this comment

Uh oh!

WillAyd Dec 16, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jbrockmendel Dec 18, 2019

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Dec 21, 2019

Choose a reason for hiding this comment

Uh oh!

jbrockmendel commented Dec 18, 2019

Uh oh!

jbrockmendel commented Dec 23, 2019

Uh oh!

jreback commented Dec 24, 2019

Uh oh!

Uh oh!