REF: move reshaping of array for setitem from DataFrame into BlockManager internals #39722

jorisvandenbossche · 2021-02-10T13:14:51Z

I think ideally the DataFrame does not need to be aware of how the underlying manager stores the data (as 2D, transposed or not), so moving the logic of ensuring 2D/transposed values from the DataFrame set_item-related method into BlockManager.iset.

This will help for the ArrayManager, so we don't have to re-reshape there.

…ager internals

jbrockmendel · 2021-02-10T16:48:15Z

pandas/tests/extension/test_numpy.py

@@ -350,9 +350,9 @@ def test_fillna_fill_other(self, data_missing):


 class TestReshaping(BaseNumPyTests, base.BaseReshapingTests):
-    @skip_nested
+    @pytest.mark.skip(reason="Incorrect expected.")


why did this change?

I don't fully understand how testing with the PandasArrays works, but, within the merge implementation, we set a column in the resulting dataframe (the column with the key values) using an Index.
And doing an extract_array on an Int64Index gives a PandasDtype("int64") column when running those tests. And because in this PR I changed it such that the values being set are not converted to a 1D array before ending up in BlockManager.iset, the extension dtype is now preserved. And so the expected result from the base test is wrong

this turns out to be a PITA in a bunch of places. could just monkeypatch extract_array to extract PandasArray too

jbrockmendel · 2021-02-10T16:49:07Z

pandas/core/frame.py

@@ -3889,7 +3885,6 @@ def insert(self, loc, column, value, allow_duplicates: bool = False) -> None:
                "'self.flags.allows_duplicate_labels' is False."
            )
        value = self._sanitize_column(value)
-        value = _maybe_atleast_2d(value)


i think this removed all the usages of maybe_atleast_2d, so can remove the function

Indeed, will remove that in one of my next PRs

jbrockmendel · 2021-02-10T16:49:28Z

pandas/core/internals/managers.py

@@ -1013,6 +1013,9 @@ def value_getitem(placement):
                return value

        else:
+            if value.ndim == 2:
+                value = value.T
+
            if value.ndim == self.ndim - 1:


this can become an elif

indeed, will also include in a next PR

Actually, it needs to be an if, because this case needs to end up in the final else to define the value_getitem function

jbrockmendel · 2021-02-10T16:49:35Z

pandas/core/internals/managers.py

@@ -1135,6 +1138,9 @@ def insert(self, loc: int, item: Hashable, value, allow_duplicates: bool = False
        # insert to the axis; this could possibly raise a TypeError
        new_axis = self.items.insert(loc, item)

+        if value.ndim == 2:
+            value = value.T
+
        if value.ndim == self.ndim - 1 and not is_extension_array_dtype(value.dtype):


pandas/core/frame.py

jreback · 2021-02-10T16:58:46Z

sorry for so quick @jbrockmendel

will wait in future

jorisvandenbossche · 2021-02-10T17:01:07Z

Yeah, I explicitly marked Brock for review ;)

jorisvandenbossche added 2 commits February 10, 2021 12:46

REF: move reshaping of array for setitem from DataFrame into BlockMan…

3513414

…ager internals

fix insert

c8e58ca

jorisvandenbossche added Refactor Internal refactoring of code Indexing Related to indexing on series/frames, not to indexes themselves Internals Related to non-user accessible pandas implementation labels Feb 10, 2021

jorisvandenbossche requested a review from jbrockmendel February 10, 2021 13:14

jorisvandenbossche added this to the 1.3 milestone Feb 10, 2021

jreback merged commit 83479e1 into pandas-dev:master Feb 10, 2021

jorisvandenbossche deleted the am-iset branch February 10, 2021 14:44

jbrockmendel reviewed Feb 10, 2021

View reviewed changes

pandas/core/frame.py Show resolved Hide resolved

jorisvandenbossche mentioned this pull request Feb 10, 2021

[ArrayManager] Indexing - implement iset #39734

Merged

This was referenced Feb 23, 2021

[ArrayManager] DataFrame constructors #39991

Merged

TST: find solution for extension/tests_numpy.py (PandasArray base extension tests) #40021

Open

simonjayhawkins mentioned this pull request Aug 3, 2021

BUG: 1.3.0 column assignment via single columnnp.matrix behaviour change #42376

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

REF: move reshaping of array for setitem from DataFrame into BlockManager internals #39722

REF: move reshaping of array for setitem from DataFrame into BlockManager internals #39722

Uh oh!

jorisvandenbossche commented Feb 10, 2021

Uh oh!

jbrockmendel Feb 10, 2021

Uh oh!

jorisvandenbossche Feb 10, 2021

Uh oh!

jbrockmendel Feb 10, 2021

Uh oh!

jbrockmendel Feb 10, 2021

Uh oh!

jorisvandenbossche Feb 10, 2021

Uh oh!

jbrockmendel Feb 10, 2021

Uh oh!

jorisvandenbossche Feb 10, 2021

Uh oh!

jorisvandenbossche Feb 10, 2021

Uh oh!

jbrockmendel Feb 10, 2021

Uh oh!

Uh oh!

jreback commented Feb 10, 2021

Uh oh!

jorisvandenbossche commented Feb 10, 2021

Uh oh!

Uh oh!

Uh oh!

REF: move reshaping of array for setitem from DataFrame into BlockManager internals #39722

REF: move reshaping of array for setitem from DataFrame into BlockManager internals #39722

Uh oh!

Conversation

jorisvandenbossche commented Feb 10, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jreback commented Feb 10, 2021

Uh oh!

jorisvandenbossche commented Feb 10, 2021

Uh oh!

Uh oh!