REF: de-duplicate nested-dict handling in DataFrame construction #41785

jbrockmendel · 2021-06-02T14:33:22Z

closes #xxxx
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

This was the motivation for #41707. This will introduce some overhead, but will also prevent the behaviors from getting out of sync again.

jreback · 2021-06-02T18:39:59Z

oh i like this!

jorisvandenbossche · 2021-06-08T14:40:56Z

This might have caused a big slowdown in one of the benchmarks: https://pandas.pydata.org/speed/pandas/#frame_ctor.FromDicts.time_nested_dict_int64?python=3.8&Cython=0.29.21

@jbrockmendel can you verify this?

jbrockmendel · 2021-06-08T18:52:28Z

That looks plausible; noted in the OP we'd take a perf hit in these cases

jorisvandenbossche · 2021-06-08T20:27:32Z

Yes, but "some overhead" turns out to be a 6x slowdown.
So then having the cython implementation (and the added effort of ensuring it is kept in sync) might be worth it.

jbrockmendel · 2021-06-10T17:00:18Z

do you have anything in particular in mind? could revert until after the release

jorisvandenbossche · 2021-06-10T20:24:10Z

I didn't follow the original context, but IIUC, this PR didn't "fix" anything, correct? (like a bug) But it did remove a faster cython implementation to reduce maintenance cost in the idea the added overhead would only be small?

If that's the case, and the overhead actually turns out to be not so small (6x according to the benchmark), then it's a tradeoff we need to make: do we find the speed worth it to maintain the cython implementation.
If we say yes to that, then yes I suppose this means to revert the PR; but it would not be only until after the release (not sure what the point of that would be).

jbrockmendel · 2021-06-13T16:17:59Z

I didn't follow the original context, but IIUC, this PR didn't "fix" anything, correct? (like a bug) But it did remove a faster cython implementation to reduce maintenance cost in the idea the added overhead would only be small?

This context is that this follows #41707, which did fix a bug in the Series constructor that didn't affect DataFrame construction because they had separate implementations.

jorisvandenbossche · 2021-06-26T07:54:46Z

I opened #42248 to keep track of this regression.

…das-dev#41785)

Co-authored-by: jbrockmendel <[email protected]>

REF: de-duplicate nested-dict handling in DataFrame construction

ac881f9

jreback added Clean Refactor Internal refactoring of code labels Jun 2, 2021

jreback added this to the 1.3 milestone Jun 2, 2021

jreback merged commit a8b313f into pandas-dev:master Jun 2, 2021

jbrockmendel deleted the ref-homogenize-3 branch June 2, 2021 19:30

jorisvandenbossche mentioned this pull request Jun 26, 2021

PERF: regression in DataFrame construction from nested dict #42248

Closed

jbrockmendel added a commit to jbrockmendel/pandas that referenced this pull request Jul 1, 2021

PERF/REGR: revert pandas-dev#41785

58fbfc1

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021

REF: de-duplicate nested-dict handling in DataFrame construction (pan…

5bb9efb

…das-dev#41785)

jreback pushed a commit that referenced this pull request Jul 8, 2021

PERF/REGR: revert #41785 (#42338)

82eb380

meeseeksmachine mentioned this pull request Jul 8, 2021

Backport PR #42338 on branch 1.3.x (PERF/REGR: revert #41785) #42444

Merged

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Jul 8, 2021

Backport PR pandas-dev#42338: PERF/REGR: revert pandas-dev#41785

d48ea59

simonjayhawkins pushed a commit that referenced this pull request Jul 8, 2021

Backport PR #42338: PERF/REGR: revert #41785 (#42444)

92c4814

Co-authored-by: jbrockmendel <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

REF: de-duplicate nested-dict handling in DataFrame construction #41785

REF: de-duplicate nested-dict handling in DataFrame construction #41785

jbrockmendel commented Jun 2, 2021

Uh oh!

jreback commented Jun 2, 2021

Uh oh!

jorisvandenbossche commented Jun 8, 2021

Uh oh!

jbrockmendel commented Jun 8, 2021

Uh oh!

jorisvandenbossche commented Jun 8, 2021

Uh oh!

jbrockmendel commented Jun 10, 2021

Uh oh!

jorisvandenbossche commented Jun 10, 2021

Uh oh!

jbrockmendel commented Jun 13, 2021

Uh oh!

jorisvandenbossche commented Jun 26, 2021

Uh oh!

Uh oh!

Uh oh!

REF: de-duplicate nested-dict handling in DataFrame construction #41785

REF: de-duplicate nested-dict handling in DataFrame construction #41785

Conversation

jbrockmendel commented Jun 2, 2021

Uh oh!

jreback commented Jun 2, 2021

Uh oh!

jorisvandenbossche commented Jun 8, 2021

Uh oh!

jbrockmendel commented Jun 8, 2021

Uh oh!

jorisvandenbossche commented Jun 8, 2021

Uh oh!

jbrockmendel commented Jun 10, 2021

Uh oh!

jorisvandenbossche commented Jun 10, 2021

Uh oh!

jbrockmendel commented Jun 13, 2021

Uh oh!

jorisvandenbossche commented Jun 26, 2021

Uh oh!

Uh oh!