Skip to content

REF/API: DataFrame.__setitem__ never operate in-place #39510

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 35 commits into from
Mar 16, 2021
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
a5c1f5e
REF: DataFrame._setitem_array dont use iloc.__setitem__
jbrockmendel Jan 27, 2021
15f5265
REF: DataFrame._setitem_array dont use iloc.__setitem__
jbrockmendel Jan 31, 2021
73302c0
Merge branch 'master' of https://github.com/pandas-dev/pandas into se…
jbrockmendel Jan 31, 2021
ceeeb78
Merge branch 'master' into setitem-frame-no-defer
jbrockmendel Jan 31, 2021
6b304b8
Merge branch 'master' into setitem-frame-no-defer
jbrockmendel Feb 2, 2021
fe9be7e
Merge branch 'master' into setitem-frame-no-defer
jbrockmendel Feb 2, 2021
1789fca
mypy fixup
jbrockmendel Feb 2, 2021
6400be3
Merge branch 'master' into setitem-frame-no-defer
jbrockmendel Feb 2, 2021
a1cdd19
mypy fixup
jbrockmendel Feb 2, 2021
870ae37
Merge branch 'master' into setitem-frame-no-defer
jbrockmendel Feb 2, 2021
c32a427
32bit compat
jbrockmendel Feb 2, 2021
d9d8beb
Merge branch 'master' into setitem-frame-no-defer
jbrockmendel Feb 2, 2021
8a283d9
troubleshoot windows builds
jbrockmendel Feb 2, 2021
9718f6e
Merge branch 'master' of https://github.com/pandas-dev/pandas into se…
jbrockmendel Feb 3, 2021
9f6d8e7
troubleshoot 32bit
jbrockmendel Feb 3, 2021
dd53abc
Merge branch 'master' into setitem-frame-no-defer
jbrockmendel Feb 3, 2021
f6b09b4
Merge branch 'master' into setitem-frame-no-defer
jbrockmendel Feb 7, 2021
c107e77
TST: missed raising cases
jbrockmendel Feb 7, 2021
ae3ae1b
Merge branch 'master' into setitem-frame-no-defer
jbrockmendel Feb 18, 2021
c631079
Merge branch 'master' into setitem-frame-no-defer
jbrockmendel Feb 23, 2021
4cfe863
port phofl tests
jbrockmendel Feb 23, 2021
03e6eb9
Merge branch 'master' into setitem-frame-no-defer
jbrockmendel Feb 25, 2021
d280f60
Merge branch 'master' into setitem-frame-no-defer
jbrockmendel Feb 26, 2021
9db5537
Merge branch 'master' into setitem-frame-no-defer
jbrockmendel Feb 27, 2021
4054aea
Merge branch 'master' into setitem-frame-no-defer
jbrockmendel Feb 27, 2021
bd76479
Merge branch 'master' into setitem-frame-no-defer
jbrockmendel Mar 4, 2021
7ea792f
whatsnew
jbrockmendel Mar 4, 2021
20f6a16
clarify whatsnew
jbrockmendel Mar 4, 2021
fdbf6f3
Merge branch 'master' into setitem-frame-no-defer
jbrockmendel Mar 7, 2021
2c6da5d
Merge branch 'master' into setitem-frame-no-defer
jbrockmendel Mar 8, 2021
2d0cf9e
comment
jbrockmendel Mar 8, 2021
e1ed083
whatsnew
jbrockmendel Mar 8, 2021
6896e86
Merge branch 'master' into setitem-frame-no-defer
jbrockmendel Mar 12, 2021
1af8686
xfail only for BM
jbrockmendel Mar 13, 2021
c8b8e23
Merge branch 'master' into setitem-frame-no-defer
jbrockmendel Mar 16, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 55 additions & 6 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -3257,6 +3257,7 @@ def _setitem_slice(self, key: slice, value):
def _setitem_array(self, key, value):
# also raises Exception if object array with NA values
if com.is_bool_indexer(key):
# bool indexer is indexing along rows
if len(key) != len(self.index):
raise ValueError(
f"Item wrong length {len(key)} instead of {len(self.index)}!"
Expand All @@ -3265,18 +3266,66 @@ def _setitem_array(self, key, value):
indexer = key.nonzero()[0]
self._check_setitem_copy()
self.iloc[indexer] = value

else:
if isinstance(value, DataFrame):
check_key_length(self.columns, key, value)
for k1, k2 in zip(key, value.columns):
self[k1] = value[k2]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really like to move this stuff out of here into another module, but i guess that's for another day

elif not is_list_like(value):
for col in key:
self[col] = value

elif isinstance(value, np.ndarray) and value.ndim == 2:
self._iset_not_inplace(key, value)

elif np.ndim(value) > 1:
# list of lists
value = DataFrame(value).values
return self._setitem_array(key, value)

else:
self.loc._ensure_listlike_indexer(key, axis=1, value=value)
indexer = self.loc._get_listlike_indexer(
key, axis=1, raise_missing=False
)[1]
self._check_setitem_copy()
self.iloc[:, indexer] = value
self._iset_not_inplace(key, value)

def _iset_not_inplace(self, key, value):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment (or informal docstring) here stating briefly which cases this covers? (eg I think only for listlike key and value?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, will do

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, LMK if you find this informative

I think comments are addressed, so this is ready for another look (once the CI runs)

def igetitem(obj, i: int):
# Note: we catch DataFrame obj before getting here, but
# hypothetically would return obj.iloc[:, i]
if isinstance(obj, np.ndarray):
return obj[..., i]
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the future would be really nice to move lots of this elsewhere (e.g. indexing.py or similar)

return obj[i]

if self.columns.is_unique:
if np.shape(value)[-1] != len(key):
raise ValueError("Columns must be same length as key")

for i, col in enumerate(key):
self[col] = igetitem(value, i)

else:

ilocs = self.columns.get_indexer_non_unique(key)[0]
if (ilocs < 0).any():
# key entries not in self.columns
raise NotImplementedError

if np.shape(value)[-1] != len(ilocs):
raise ValueError("Columns must be same length as key")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are all of the branches covered? this is a lot of new code

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not the NotImplementedError on L3276; not sure what to do there

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the rest is covered? what about the previous paths this was taking, can they be removed now? (e.g. they are uncovered)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated with a test that hits the two raise ValueError("Columns must be same length as key") lines. everything other than the NotImplementedError is reached


assert np.ndim(value) <= 2

orig_columns = self.columns

# Using self.iloc[:, i] = ... may set values inplace, which
# by convention we do not do in __setitem__
try:
self.columns = Index(range(len(self.columns)))
for i, iloc in enumerate(ilocs):
self[iloc] = igetitem(value, i)
finally:
self.columns = orig_columns

def _setitem_frame(self, key, value):
# support boolean setting with DataFrame input, e.g.
Expand Down
19 changes: 18 additions & 1 deletion pandas/tests/frame/indexing/test_setitem.py
Original file line number Diff line number Diff line change
Expand Up @@ -334,6 +334,10 @@ def test_setitem_complete_column_with_array(self):
"d": [1, 1, 1],
}
)
expected["c"] = expected["c"].astype(arr.dtype)
expected["d"] = expected["d"].astype(arr.dtype)
assert expected["c"].dtype == arr.dtype
assert expected["d"].dtype == arr.dtype
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What changed here? (I wouldn't expect the above test to change behaviour related to this PR, as it's adding new columns)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC this was needed for the 32bit builds

tm.assert_frame_equal(df, expected)

@pytest.mark.parametrize("dtype", ["f8", "i8", "u8"])
Expand Down Expand Up @@ -373,11 +377,24 @@ def test_setitem_frame_duplicate_columns(self):
[np.nan, 1, 2, np.nan, 4, 5],
[np.nan, 1, 2, np.nan, 4, 5],
],
columns=cols,
dtype="object",
)
expected[2] = expected[2].astype(np.int64)
expected[5] = expected[5].astype(np.int64)
expected.columns = cols
tm.assert_frame_equal(df, expected)

def test_setitem_frame_duplicate_columns_size_mismatch(self):
# GH#39510
cols = ["A", "B", "C"] * 2
df = DataFrame(index=range(3), columns=cols)
with pytest.raises(ValueError, match="Columns must be same length as key"):
df[["A"]] = (0, 3, 5)

df2 = df.iloc[:, :3] # unique columns
with pytest.raises(ValueError, match="Columns must be same length as key"):
df2[["A"]] = (0, 3, 5)

@pytest.mark.parametrize("cols", [["a", "b", "c"], ["a", "a", "a"]])
def test_setitem_df_wrong_column_number(self, cols):
# GH#38604
Expand Down
9 changes: 8 additions & 1 deletion pandas/tests/indexing/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,14 @@ def test_setitem_ndarray_3d(self, index, frame_or_series, indexer_sli):
)
else:
err = ValueError
msg = r"Buffer has wrong number of dimensions \(expected 1, got 3\)|"
msg = "|".join(
[
r"Buffer has wrong number of dimensions \(expected 1, got 3\)",
"Cannot set values with ndim > 1",
"Index data must be 1-dimensional",
"Array conditional must be same shape as self",
]
)

with pytest.raises(err, match=msg):
idxr[nd3] = 0
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/reshape/test_pivot.py
Original file line number Diff line number Diff line change
Expand Up @@ -966,7 +966,7 @@ def test_margins_dtype(self):
# GH 17013

df = self.data.copy()
df[["D", "E", "F"]] = np.arange(len(df) * 3).reshape(len(df), 3)
df[["D", "E", "F"]] = np.arange(len(df) * 3).reshape(len(df), 3).astype("i8")

mi_val = list(product(["bar", "foo"], ["one", "two"])) + [("All", "")]
mi = MultiIndex.from_tuples(mi_val, names=("A", "B"))
Expand Down