REGR: to_dict(orient={"records", "split", "dict"}) slowdown #42486

mzeitlin11 · 2021-07-11T19:09:07Z

closes BUG: df.to_dict(orient="records") significantly slower in Pandas 1.3.0 #42352
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

Regression was due to each scalar value hitting is_datetime_or_timedelta_dtype -> _is_dtype_type where it falls most of the way through.

jreback · 2021-07-11T19:11:57Z

pls
run a lot of asvs

this is a very hot path and don't want to affect other things

mzeitlin11 · 2021-07-11T19:12:26Z

pandas/core/dtypes/cast.py

@@ -196,6 +193,8 @@ def maybe_box_native(value: Scalar) -> Scalar:
        value = int(value)  # type: ignore[arg-type]
    elif is_bool(value):
        value = bool(value)
+    elif isinstance(value, (np.datetime64, np.timedelta64)):


AFAICT this condition should be equivalent to previous. There was this comment and followup https://github.com/pandas-dev/pandas/pull/37648/files#r576461677, but the tests seem to contradict by mapping datetime -> datetime (behavior which isn't changed here)

dont you need datetime, timedelta here as well (e.g. python scalar types)? or i suppose these are already boxed right so we don't need them?

Yep exactly, these are already boxed correctly (and we look to have some test coverage for this, for example in pandas.tests.frame.methods.test_to_dict::TestDataFrameToDict::test_to_dict_tz)

mzeitlin11 · 2021-07-11T19:13:25Z

this is a very hot path and don't want to affect other things

EDIT: The function modified is maybe_box_native (added in #37648), and I can only find it hit in to_dict paths

to_dict asvs:

       before           after         ratio
     [87d78559]       [614a926a]
     <master>         <regr_to_dict_perf>
-        484±10ms          168±2ms     0.35  frame_methods.ToDict.time_to_dict_datetimelike('dict')
-         479±3ms          164±3ms     0.34  frame_methods.ToDict.time_to_dict_datetimelike('records')
-         465±4ms          159±9ms     0.34  frame_methods.ToDict.time_to_dict_datetimelike('split')
-         149±2ms         21.6±2ms     0.15  frame_methods.ToDict.time_to_dict_ints('records')
-         141±6ms       13.7±0.4ms     0.10  frame_methods.ToDict.time_to_dict_ints('dict')
-         135±6ms       12.9±0.4ms     0.10  frame_methods.ToDict.time_to_dict_ints('split')

jreback · 2021-07-11T19:15:41Z

no run all the asvs

or at the very least casting and construction ones

mzeitlin11 · 2021-07-11T19:16:07Z

no run all the asvs

or at the very least casting and construction ones

Sorry, please see comment above, just edited

jreback

kk perf looks fine (i see we dont' use this routine except where you are testing)

jreback · 2021-07-12T00:32:57Z

pandas/core/dtypes/cast.py

@@ -196,6 +193,8 @@ def maybe_box_native(value: Scalar) -> Scalar:
        value = int(value)  # type: ignore[arg-type]
    elif is_bool(value):
        value = bool(value)
+    elif isinstance(value, (np.datetime64, np.timedelta64)):


dont you need datetime, timedelta here as well (e.g. python scalar types)? or i suppose these are already boxed right so we don't need them?

jreback · 2021-07-12T12:43:46Z

thanks @mzeitlin11

…", "dict"}) slowdown

jreback · 2021-07-12T12:43:55Z

@meeseeksdev backport 1.3.x

lumberbot-app · 2021-07-12T12:44:02Z

Something went wrong ... Please have a look at my logs.

…) slowdown (#42502) Co-authored-by: Matthew Zeitlin <[email protected]>

mzeitlin11 added 2 commits July 11, 2021 14:53

REGR: to_dict(orient=records) slowdown

3c3924f

Add whatsnew, smaller benchmark

614a926

mzeitlin11 added Dtype Conversions Unexpected or buggy dtype conversions Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version labels Jul 11, 2021

mzeitlin11 added this to the 1.3.1 milestone Jul 11, 2021

mzeitlin11 commented Jul 11, 2021

View reviewed changes

jreback requested changes Jul 12, 2021

View reviewed changes

Merge branch 'master' into regr_to_dict_perf

a69a0eb

jreback approved these changes Jul 12, 2021

View reviewed changes

jreback merged commit 5831b06 into pandas-dev:master Jul 12, 2021

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Jul 12, 2021

Backport PR pandas-dev#42486: REGR: to_dict(orient={"records", "split…

6cab702

…", "dict"}) slowdown

meeseeksmachine mentioned this pull request Jul 12, 2021

Backport PR #42486 on branch 1.3.x (REGR: to_dict(orient={"records", "split", "dict"}) slowdown) #42502

Merged

mzeitlin11 deleted the regr_to_dict_perf branch July 12, 2021 12:46

jreback pushed a commit that referenced this pull request Jul 12, 2021

Backport PR #42486: REGR: to_dict(orient={"records", "split", "dict"}…

f4278a6

…) slowdown (#42502) Co-authored-by: Matthew Zeitlin <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REGR: to_dict(orient={"records", "split", "dict"}) slowdown #42486

REGR: to_dict(orient={"records", "split", "dict"}) slowdown #42486

mzeitlin11 commented Jul 11, 2021

jreback commented Jul 11, 2021 •

edited

Loading

mzeitlin11 Jul 11, 2021

jreback Jul 12, 2021

mzeitlin11 Jul 12, 2021

mzeitlin11 commented Jul 11, 2021 •

edited

Loading

jreback commented Jul 11, 2021

mzeitlin11 commented Jul 11, 2021

jreback left a comment

jreback Jul 12, 2021

jreback commented Jul 12, 2021

jreback commented Jul 12, 2021

lumberbot-app bot commented Jul 12, 2021

REGR: to_dict(orient={"records", "split", "dict"}) slowdown #42486

REGR: to_dict(orient={"records", "split", "dict"}) slowdown #42486

Conversation

mzeitlin11 commented Jul 11, 2021

jreback commented Jul 11, 2021 • edited Loading

mzeitlin11 Jul 11, 2021

Choose a reason for hiding this comment

jreback Jul 12, 2021

Choose a reason for hiding this comment

mzeitlin11 Jul 12, 2021

Choose a reason for hiding this comment

mzeitlin11 commented Jul 11, 2021 • edited Loading

jreback commented Jul 11, 2021

mzeitlin11 commented Jul 11, 2021

jreback left a comment

Choose a reason for hiding this comment

jreback Jul 12, 2021

Choose a reason for hiding this comment

jreback commented Jul 12, 2021

jreback commented Jul 12, 2021

lumberbot-app bot commented Jul 12, 2021

jreback commented Jul 11, 2021 •

edited

Loading

mzeitlin11 commented Jul 11, 2021 •

edited

Loading