Skip to content

Replaced void pointers with Types in JSON Datetime Conversions #30283

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Dec 24, 2019

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Dec 15, 2019

This also incorporates the clean up from #30271

Benchmarks show the following benefit, though most likely just noise:

.       before           after         ratio
     [04fce81f]       [f8ed75d0]
     <stash~6^2>       <json-dates>
-         336±9ms          302±4ms     0.90  io.json.ToJSON.time_to_json('columns', 'df_int_floats')

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

One follow up to this can drastically speed up datetime index handling - right now these are done in the Python space but can reuse these same functions

}

static void *PandasDateTimeStructToJSON(npy_datetimestruct *dts,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff is a little difficult to read here, but this got rid of the ToJSON functions and instead replaces them with explicit ToIso or ToEpoch functions with strong types

@simonjayhawkins simonjayhawkins added the IO JSON read_json, to_json, json_normalize label Dec 16, 2019
*len = (size_t)get_datetime_iso_8601_strlen(0, base);
char *result = PyObject_Malloc(*len);

if (result == NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would we get here? if len == 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would happen if malloc fails

https://stackoverflow.com/a/12434865/621736

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So failing to address it would mean getting something like a segfault? the PyErr_NoMemory just below raises back in py-space?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would segfault if you tried to access the memory if malloc failed to allocate it. PyErr_NoMemory would just set the global error indicator so when the extension exits that should raise back in the Python space (assuming nothing else clears it out)

FWIW the error handling here is copy / paste from here:

There are similar checks elsewhere throughout the module (not always consistent) so maybe worth unifying in a follow up

size_t *_outLen) {
return PyUnicode_AsUTF8AndSize(_obj, _outLen);
static char *PyUnicodeToUTF8(JSOBJ _obj, JSONTypeContext *tc, size_t *_outLen) {
return (char *)PyUnicode_AsUTF8AndSize(_obj, (Py_ssize_t *)_outLen);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyUnicode_AsUTF8AndSize came up in the thread about unicode surrogates. do we need to worry about those here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd i think this is the same function where passing a surrogate unicode character can cause a segfault. can that be ruled out here?

@jbrockmendel
Copy link
Member

With usual caveat about my C-fu being weak, LGTM. cc @jreback

@jbrockmendel
Copy link
Member

@jreback gentle ping, just realized this is a blocker for #28595.

@jreback jreback added this to the 1.0 milestone Dec 24, 2019
@jreback jreback merged commit 2a6c2d7 into pandas-dev:master Dec 24, 2019
@jreback
Copy link
Contributor

jreback commented Dec 24, 2019

thanks @WillAyd looks more clear

@WillAyd WillAyd deleted the json-dates branch December 24, 2019 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants