Skip to content

REF: move reshaping of array for setitem from DataFrame into BlockManager internals #39722

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 10, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 1 addition & 8 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -3272,16 +3272,13 @@ def _set_item_frame_value(self, key, value: DataFrame) -> None:

# now align rows
value = _reindex_for_setitem(value, self.index)
value = value.T
self._set_item_mgr(key, value)

def _iset_item_mgr(self, loc: int, value) -> None:
self._mgr.iset(loc, value)
self._clear_item_cache()

def _set_item_mgr(self, key, value):
value = _maybe_atleast_2d(value)

try:
loc = self._info_axis.get_loc(key)
except KeyError:
Expand All @@ -3298,7 +3295,6 @@ def _set_item_mgr(self, key, value):

def _iset_item(self, loc: int, value):
value = self._sanitize_column(value)
value = _maybe_atleast_2d(value)
self._iset_item_mgr(loc, value)

# check if we are modifying a copy
Expand Down Expand Up @@ -3328,7 +3324,7 @@ def _set_item(self, key, value):
if not self.columns.is_unique or isinstance(self.columns, MultiIndex):
existing_piece = self[key]
if isinstance(existing_piece, DataFrame):
value = np.tile(value, (len(existing_piece.columns), 1))
value = np.tile(value, (len(existing_piece.columns), 1)).T

self._set_item_mgr(key, value)

Expand Down Expand Up @@ -3889,7 +3885,6 @@ def insert(self, loc, column, value, allow_duplicates: bool = False) -> None:
"'self.flags.allows_duplicate_labels' is False."
)
value = self._sanitize_column(value)
value = _maybe_atleast_2d(value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this removed all the usages of maybe_atleast_2d, so can remove the function

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, will remove that in one of my next PRs

self._mgr.insert(loc, column, value, allow_duplicates=allow_duplicates)

def assign(self, **kwargs) -> DataFrame:
Expand Down Expand Up @@ -3994,8 +3989,6 @@ def _sanitize_column(self, value):
value = maybe_convert_platform(value)
else:
value = com.asarray_tuplesafe(value)
elif value.ndim == 2:
value = value.copy().T
elif isinstance(value, Index):
value = value.copy(deep=True)
else:
Expand Down
6 changes: 6 additions & 0 deletions pandas/core/internals/managers.py
Original file line number Diff line number Diff line change
Expand Up @@ -1013,6 +1013,9 @@ def value_getitem(placement):
return value

else:
if value.ndim == 2:
value = value.T

if value.ndim == self.ndim - 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can become an elif

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, will also include in a next PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it needs to be an if, because this case needs to end up in the final else to define the value_getitem function

value = safe_reshape(value, (1,) + value.shape)

Expand Down Expand Up @@ -1135,6 +1138,9 @@ def insert(self, loc: int, item: Hashable, value, allow_duplicates: bool = False
# insert to the axis; this could possibly raise a TypeError
new_axis = self.items.insert(loc, item)

if value.ndim == 2:
value = value.T

if value.ndim == self.ndim - 1 and not is_extension_array_dtype(value.dtype):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elif

# TODO(EA2D): special case not needed with 2D EAs
value = safe_reshape(value, (1,) + value.shape)
Expand Down
4 changes: 2 additions & 2 deletions pandas/tests/extension/test_numpy.py
Original file line number Diff line number Diff line change
Expand Up @@ -350,9 +350,9 @@ def test_fillna_fill_other(self, data_missing):


class TestReshaping(BaseNumPyTests, base.BaseReshapingTests):
@skip_nested
@pytest.mark.skip(reason="Incorrect expected.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand how testing with the PandasArrays works, but, within the merge implementation, we set a column in the resulting dataframe (the column with the key values) using an Index.
And doing an extract_array on an Int64Index gives a PandasDtype("int64") column when running those tests. And because in this PR I changed it such that the values being set are not converted to a 1D array before ending up in BlockManager.iset, the extension dtype is now preserved. And so the expected result from the base test is wrong

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this turns out to be a PITA in a bunch of places. could just monkeypatch extract_array to extract PandasArray too

def test_merge(self, data, na_value):
# Fails creating expected
# Fails creating expected (key column becomes a PandasDtype because)
super().test_merge(data, na_value)

@skip_nested
Expand Down