WIP/REF: BlockManager.setitem_blockwise #39302

jbrockmendel · 2021-01-20T17:35:48Z

Posting largely for discussion with @phofl.

Motivations:

Try to end up with fewer code paths for setitem so that we can ensure consistent inplace/view/copy behavior (xref API: setitem copy/view behavior ndarray vs Categorical vs other EA #38896)
Simplify (ATM this does not simplify things)
Perf: iterating over blocks instead of columns may save some copies (though doing the Index.intersection calls may go the other way) (havent done any profiling yet)

Non-trivial overlap with #39044.
Some of this would be simplified by #39163.
No logical overlap with #39266, but will require rebase.

…f-setitem_inplace

pep8speaks · 2021-01-20T17:35:57Z

Hello @jbrockmendel! Thanks for opening this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file pandas/core/arrays/datetimelike.py:

Line 619:89: E501 line too long (91 > 88 characters)

In the file pandas/core/frame.py:

Line 3222:89: E501 line too long (104 > 88 characters)

phofl · 2021-01-20T21:23:09Z

Is there a specific place where I can be of help?

jbrockmendel · 2021-01-20T21:41:47Z

Mostly just want to keep you in the loop on what i have in mind since you're working on similar parts of the code.

jbrockmendel · 2021-01-23T21:10:37Z

@phofl this fails the test added in #39280. Thoughts on an appropriate fix?

(a branch that only has the edits to DataFrame._setitem_array has the same failure)

phofl · 2021-01-23T22:17:16Z

Not per se a fix, but I had similar difficulties in #39341

We have to account for duplicate column names in these checks. Maybe using get_indexer_for or something like that if there is a faster alternative. Simply comparing the lenght does not work with duplicates

phofl · 2021-01-24T15:21:00Z

You could use

if self.columns.is_unique:
    len_key = len(key)
else:
    len_key = len(self.columns.get_indexer_non_unique(key))
if len(value) != len_key:
    raise ValueError("Columns must be same length as key")

as a helper functions for both len(value) != len(key) checks. Test will keep failing because the column is now cast to integer, but this is good since this is consistent with other setitem cases. If you like I could implement this in #39341. I am dispatching to the else block there right now, since this does the trick currently.

jbrockmendel · 2021-01-28T21:25:54Z

mothballing

jbrockmendel added 24 commits January 14, 2021 13:58

TST: split coercion tests

46acf44

REF: implement BlockManager.setitem2

5ca4ab2

checkpoint tests passing

77e09e2

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

1c1be1d

…f-setitem_inplace

checkpoint passing

46126ab

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

da2dda3

…f-setitem_inplace

port test from ref-setitem-blockwise

2624c2e

cleanup

51b90a3

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

3fe6d45

…f-setitem_inplace

checkpoint passing

08d57ba

checkpoint passing

19124fe

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

35915d4

…f-setitem_inplace

rename

96f5664

REF: avoid going through iloc

c1e0b0f

port from other PRs

83f5545

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

da87965

…f-setitem_inplace

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

0513231

…f-setitem_inplace

cleanup

8718444

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

86782b3

…f-setitem_inplace

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

53e4628

…f-setitem_inplace

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

d723835

…f-setitem_inplace

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

2dd1ce4

…f-setitem_inplace

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

19a5553

…f-setitem_inplace

cleanup

604f796

jbrockmendel mentioned this pull request Jan 20, 2021

BUG: setting dt64 values into Series[int] incorrectly casting dt64->int #39266

Merged

4 tasks

jbrockmendel added the Indexing Related to indexing on series/frames, not to indexes themselves label Jan 26, 2021

jbrockmendel mentioned this pull request Jan 27, 2021

Clean up DataFrame.setitem behavior for duplicate columns #39403

Merged

2 tasks

jbrockmendel added the Mothballed Temporarily-closed PR the author plans to return to label Jan 28, 2021

jbrockmendel closed this Jan 28, 2021

jbrockmendel deleted the ref-setitem_inplace branch November 8, 2021 16:36

jbrockmendel removed the Mothballed Temporarily-closed PR the author plans to return to label Nov 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP/REF: BlockManager.setitem_blockwise #39302

WIP/REF: BlockManager.setitem_blockwise #39302

jbrockmendel commented Jan 20, 2021

pep8speaks commented Jan 20, 2021

phofl commented Jan 20, 2021

jbrockmendel commented Jan 20, 2021

jbrockmendel commented Jan 23, 2021

phofl commented Jan 23, 2021 •

edited

Loading

phofl commented Jan 24, 2021

jbrockmendel commented Jan 28, 2021

WIP/REF: BlockManager.setitem_blockwise #39302

WIP/REF: BlockManager.setitem_blockwise #39302

Conversation

jbrockmendel commented Jan 20, 2021

pep8speaks commented Jan 20, 2021

phofl commented Jan 20, 2021

jbrockmendel commented Jan 20, 2021

jbrockmendel commented Jan 23, 2021

phofl commented Jan 23, 2021 • edited Loading

phofl commented Jan 24, 2021

jbrockmendel commented Jan 28, 2021

phofl commented Jan 23, 2021 •

edited

Loading