-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Implement C Level Timedelta ISO Function; fix JSON usage #30903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -407,6 +406,30 @@ static char *int64ToIso(int64_t value, NPY_DATETIMEUNIT base, size_t *len) { | |||
return result; | |||
} | |||
|
|||
/* Converts the int64_t representation of a duration to ISO; mutates len */ | |||
static char *int64ToIsoDuration(int64_t value, size_t *len) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think after this PR going to split these conversion functions off into a separate module as the JSON one is getting rather big
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other PR is #31057 should probably be merged first
@@ -1615,6 +1649,11 @@ char **NpyArr_encodeLabels(PyArrayObject *labels, PyObjectEncoder *enc, | |||
ret[i] = PyObject_Malloc(len + 1); | |||
memcpy(ret[i], cLabel, len + 1); | |||
|
|||
if (is_datetimelike) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes a memory leak in the current implementation where any datetimelike values used in the index would leak their respective character representation out; will add a whatsnew once the file is created
Current benchmarks: before after ratio
[447a3b00] [4a94f159]
<master> <json-timedelta>
+ 152±3ms 209±10ms 1.38 io.json.ToJSONISO.time_iso_format('values')
+ 168±3ms 213±10ms 1.27 io.json.ToJSONISO.time_iso_format('records')
+ 188±4ms 235±10ms 1.25 io.json.ToJSONISO.time_iso_format('split')
+ 195±4ms 238±10ms 1.22 io.json.ToJSONISO.time_iso_format('columns')
- 103±10ms 79.2±6ms 0.77 io.json.ToJSON.time_to_json('values', 'df_td_int_ts')
- 137±20ms 95.9±1ms 0.70 io.json.ToJSON.time_to_json('values', 'df_int_float_str') So the time_iso_format checks are taking a little bit longer but note that they were broken before, i.e. on master you get this: >>> pd.Series([pd.Timedelta(days=1)], index=[pd.Timedelta(days=1)]).to_json(date_format="iso")
'{"P1DT0H0M0S":"1970-01-02T00:00:00.000Z"}' Where we are writing as dates and not durations. I think just a matter of the timedelta -> iso function not being as fast (using sprintf instead of totally hand-coded) but maybe OK for now |
@@ -905,3 +905,37 @@ int make_iso_8601_datetime(npy_datetimestruct *dts, char *outstr, int outlen, | |||
outlen); | |||
return -1; | |||
} | |||
|
|||
|
|||
int make_iso_8601_timedelta(pandas_timedeltastruct *tds, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we do also have a isoformat function for time deltas defined here:
pandas/pandas/_libs/tslibs/timedeltas.pyx
Line 1097 in cdffa43
def isoformat(self) -> str: |
But I think the path to get that to work in the JSON module would be way more complicated than implementing in C as we do here (and less consistent with other datetime handling).
Maybe as a follow up we want to use this C function in the Timedelta class
pls rebase |
LGTM cc @jreback |
needs rebase, other good to go i think |
completely closes #28256. ? can you check the perf again. |
Yea closes; added some more test coverage as well. Here is an updated benchmark: before after ratio
[402c5cdd] [ef08ad6c]
<json-test-cleanup~1^2> <json-timedelta>
+ 143±20ms 225±10ms 1.57 io.json.ToJSONISO.time_iso_format('values')
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED. |
this is slower? |
Slower but correct |
worth the tradeoff every time. |
thanks, correct wins every time. |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
similar to @cbertinato great work in #28595 but done deeper in the C extension