Skip to content

Commit b8e14ec

Browse files
authored
PDEP6 implementation pt 1: block.setitem, block.putmask (#50626)
* pt1 * fixup test collection * fixup warnings * add comments * fixup warnings * fixup test_indexing * fixup test_set_value * fixup test_where * fixup test_asof * add one more explicit upcast * fixup test_update * fixup test_constructors * fixup test_stack_unstack * catch warnings in test_update_dtypes * fixup all test_update * start fixing up setitem * finish fixing up test_setitem * more fixups * catch numpy-dev warning * fixup some more * fixup test_indexing * fixup test_function * fixup test_multi; * fixup test_base * fixup test_impl * fixup multiindex/test_setitem * fixup test_scalar * fixup test_loc * fixup test_iloc * fixup test_at * fixup test_groupby * fixup some doc warnings * post-merge fixup * change dtype in doctest * fixup doctest * explicit cast in test * fixup test for COW * fixup COW * catch warnings in testsetitemcastingequivalents * wip * fixup setitem test int key! * getting there! * fixup test_setitem * getting there * fixup remaining warnings * fix test_update * fixup some failing test * one more * simplify * simplify and remove some false-positives * clean up * remove final filterwarnings * undo unrelated change * fixup raises_chained_assignment_error * remove another filterwarnings * fixup interchange test * better parametrisation * okwarning => codeblock * okwarning => codeblock in v1.3.0 * one more codeblock * avoid upcast * post-merge fixup * docs fixup; * post-merge fixup * remove more upcasts * adapt test from EA types * move test to series/indexing * add tests about warnings * fixup tests * add dataframe tests too * fixup tests * simplify * try-fix docs build * post-merge fixup * raise assertionerror if self.dtype equals new_dtype * add todo for test case which should warn * add more todos * post-merge fixup * move logic for determining warning into the test * uncomment test --------- Co-authored-by: MarcoGorelli <>
1 parent 7b48899 commit b8e14ec

21 files changed

+616
-191
lines changed

doc/source/user_guide/categorical.rst

+1
Original file line numberDiff line numberDiff line change
@@ -779,6 +779,7 @@ Setting values by assigning categorical data will also check that the ``categori
779779
Assigning a ``Categorical`` to parts of a column of other types will use the values:
780780

781781
.. ipython:: python
782+
:okwarning:
782783
783784
df = pd.DataFrame({"a": [1, 1, 1, 1, 1], "b": ["a", "a", "a", "a", "a"]})
784785
df.loc[1:2, "a"] = pd.Categorical(["b", "b"], categories=["a", "b"])

doc/source/whatsnew/v0.21.0.rst

+13-8
Original file line numberDiff line numberDiff line change
@@ -425,13 +425,12 @@ Note that this also changes the sum of an empty ``Series``. Previously this alwa
425425
In [1]: pd.Series([]).sum()
426426
Out[1]: 0
427427
428-
but for consistency with the all-NaN case, this was changed to return NaN as well:
428+
but for consistency with the all-NaN case, this was changed to return 0 as well:
429429

430-
.. ipython:: python
431-
:okwarning:
432-
433-
pd.Series([]).sum()
430+
.. code-block:: ipython
434431
432+
In [2]: pd.Series([]).sum()
433+
Out[2]: 0
435434
436435
.. _whatsnew_0210.api_breaking.loc:
437436

@@ -755,10 +754,16 @@ Previously assignments, ``.where()`` and ``.fillna()`` with a ``bool`` assignmen
755754
756755
New behavior
757756

758-
.. ipython:: python
757+
.. code-block:: ipython
759758
760-
s[1] = True
761-
s
759+
In [7]: s[1] = True
760+
761+
In [8]: s
762+
Out[8]:
763+
0 1
764+
1 True
765+
2 3
766+
Length: 3, dtype: object
762767
763768
Previously, as assignment to a datetimelike with a non-datetimelike would coerce the
764769
non-datetime-like item being assigned (:issue:`14145`).

doc/source/whatsnew/v1.3.0.rst

+22-9
Original file line numberDiff line numberDiff line change
@@ -501,13 +501,17 @@ Consistent casting with setting into Boolean Series
501501
Setting non-boolean values into a :class:`Series` with ``dtype=bool`` now consistently
502502
casts to ``dtype=object`` (:issue:`38709`)
503503

504-
.. ipython:: python
504+
.. code-block:: ipython
505+
506+
In [1]: orig = pd.Series([True, False])
507+
508+
In [2]: ser = orig.copy()
509+
510+
In [3]: ser.iloc[1] = np.nan
505511
506-
orig = pd.Series([True, False])
507-
ser = orig.copy()
508-
ser.iloc[1] = np.nan
509-
ser2 = orig.copy()
510-
ser2.iloc[1] = 2.0
512+
In [4]: ser2 = orig.copy()
513+
514+
In [5]: ser2.iloc[1] = 2.0
511515
512516
*Previous behavior*:
513517

@@ -527,10 +531,19 @@ casts to ``dtype=object`` (:issue:`38709`)
527531
528532
*New behavior*:
529533

530-
.. ipython:: python
534+
.. code-block:: ipython
535+
536+
In [1]: ser
537+
Out [1]:
538+
0 True
539+
1 NaN
540+
dtype: object
531541
532-
ser
533-
ser2
542+
In [2]:ser2
543+
Out [2]:
544+
0 True
545+
1 2.0
546+
dtype: object
534547
535548
536549
.. _whatsnew_130.notable_bug_fixes.rolling_groupby_column:

pandas/core/internals/blocks.py

+20-4
Original file line numberDiff line numberDiff line change
@@ -443,7 +443,7 @@ def split_and_operate(self, func, *args, **kwargs) -> list[Block]:
443443
# Up/Down-casting
444444

445445
@final
446-
def coerce_to_target_dtype(self, other) -> Block:
446+
def coerce_to_target_dtype(self, other, warn_on_upcast: bool = False) -> Block:
447447
"""
448448
coerce the current block to a dtype compat for other
449449
we will return a block, possibly object, and not raise
@@ -452,7 +452,21 @@ def coerce_to_target_dtype(self, other) -> Block:
452452
and will receive the same block
453453
"""
454454
new_dtype = find_result_type(self.values.dtype, other)
455-
455+
if warn_on_upcast:
456+
warnings.warn(
457+
f"Setting an item of incompatible dtype is deprecated "
458+
"and will raise in a future error of pandas. "
459+
f"Value '{other}' has dtype incompatible with {self.values.dtype}, "
460+
"please explicitly cast to a compatible dtype first.",
461+
FutureWarning,
462+
stacklevel=find_stack_level(),
463+
)
464+
if self.dtype == new_dtype:
465+
raise AssertionError(
466+
f"Did not expect new dtype {new_dtype} to equal self.dtype "
467+
f"{self.values.dtype}. Please report a bug at "
468+
"https://github.com/pandas-dev/pandas/issues."
469+
)
456470
return self.astype(new_dtype, copy=False)
457471

458472
@final
@@ -1111,7 +1125,7 @@ def setitem(self, indexer, value, using_cow: bool = False) -> Block:
11111125
casted = np_can_hold_element(values.dtype, value)
11121126
except LossySetitemError:
11131127
# current dtype cannot store value, coerce to common dtype
1114-
nb = self.coerce_to_target_dtype(value)
1128+
nb = self.coerce_to_target_dtype(value, warn_on_upcast=True)
11151129
return nb.setitem(indexer, value)
11161130
else:
11171131
if self.dtype == _dtype_obj:
@@ -1177,7 +1191,9 @@ def putmask(self, mask, new, using_cow: bool = False) -> list[Block]:
11771191

11781192
if not is_list_like(new):
11791193
# using just new[indexer] can't save us the need to cast
1180-
return self.coerce_to_target_dtype(new).putmask(mask, new)
1194+
return self.coerce_to_target_dtype(
1195+
new, warn_on_upcast=True
1196+
).putmask(mask, new)
11811197
else:
11821198
indexer = mask.nonzero()[0]
11831199
nb = self.setitem(indexer, new[indexer], using_cow=using_cow)

pandas/tests/copy_view/test_indexing.py

+22-4
Original file line numberDiff line numberDiff line change
@@ -925,11 +925,20 @@ def test_column_as_series_set_with_upcast(
925925
s[0] = "foo"
926926
expected = Series([1, 2, 3], name="a")
927927
elif using_copy_on_write or using_array_manager:
928-
s[0] = "foo"
928+
with tm.assert_produces_warning(FutureWarning, match="incompatible dtype"):
929+
s[0] = "foo"
929930
expected = Series(["foo", 2, 3], dtype=object, name="a")
930931
else:
931932
with pd.option_context("chained_assignment", "warn"):
932-
with tm.assert_produces_warning(SettingWithCopyWarning):
933+
msg = "|".join(
934+
[
935+
"A value is trying to be set on a copy of a slice from a DataFrame",
936+
"Setting an item of incompatible dtype is deprecated",
937+
]
938+
)
939+
with tm.assert_produces_warning(
940+
(SettingWithCopyWarning, FutureWarning), match=msg
941+
):
933942
s[0] = "foo"
934943
expected = Series(["foo", 2, 3], dtype=object, name="a")
935944

@@ -1020,7 +1029,10 @@ def test_dataframe_add_column_from_series(backend, using_copy_on_write):
10201029
],
10211030
)
10221031
def test_set_value_copy_only_necessary_column(
1023-
using_copy_on_write, indexer_func, indexer, val
1032+
using_copy_on_write,
1033+
indexer_func,
1034+
indexer,
1035+
val,
10241036
):
10251037
# When setting inplace, only copy column that is modified instead of the whole
10261038
# block (by splitting the block)
@@ -1029,7 +1041,13 @@ def test_set_value_copy_only_necessary_column(
10291041
df_orig = df.copy()
10301042
view = df[:]
10311043

1032-
indexer_func(df)[indexer] = val
1044+
if val == "a" and indexer[0] != slice(None):
1045+
with tm.assert_produces_warning(
1046+
FutureWarning, match="Setting an item of incompatible dtype is deprecated"
1047+
):
1048+
indexer_func(df)[indexer] = val
1049+
else:
1050+
indexer_func(df)[indexer] = val
10331051

10341052
if using_copy_on_write:
10351053
assert np.shares_memory(get_array(df, "b"), get_array(view, "b"))

pandas/tests/copy_view/test_methods.py

+6-3
Original file line numberDiff line numberDiff line change
@@ -1405,15 +1405,18 @@ def test_putmask_aligns_rhs_no_reference(using_copy_on_write, dtype):
14051405
assert np.shares_memory(arr_a, get_array(df, "a"))
14061406

14071407

1408-
@pytest.mark.parametrize("val, exp", [(5.5, True), (5, False)])
1409-
def test_putmask_dont_copy_some_blocks(using_copy_on_write, val, exp):
1408+
@pytest.mark.parametrize(
1409+
"val, exp, warn", [(5.5, True, FutureWarning), (5, False, None)]
1410+
)
1411+
def test_putmask_dont_copy_some_blocks(using_copy_on_write, val, exp, warn):
14101412
df = DataFrame({"a": [1, 2], "b": 1, "c": 1.5})
14111413
view = df[:]
14121414
df_orig = df.copy()
14131415
indexer = DataFrame(
14141416
[[True, False, False], [True, False, False]], columns=list("abc")
14151417
)
1416-
df[indexer] = val
1418+
with tm.assert_produces_warning(warn, match="incompatible dtype"):
1419+
df[indexer] = val
14171420

14181421
if using_copy_on_write:
14191422
assert not np.shares_memory(get_array(view, "a"), get_array(df, "a"))

pandas/tests/frame/indexing/test_coercion.py

+24-6
Original file line numberDiff line numberDiff line change
@@ -51,19 +51,31 @@ def test_37477():
5151
expected = DataFrame({"A": [1, 2, 3], "B": [3, 1.2, 5]})
5252

5353
df = orig.copy()
54-
df.at[1, "B"] = 1.2
54+
with tm.assert_produces_warning(
55+
FutureWarning, match="Setting an item of incompatible dtype"
56+
):
57+
df.at[1, "B"] = 1.2
5558
tm.assert_frame_equal(df, expected)
5659

5760
df = orig.copy()
58-
df.loc[1, "B"] = 1.2
61+
with tm.assert_produces_warning(
62+
FutureWarning, match="Setting an item of incompatible dtype"
63+
):
64+
df.loc[1, "B"] = 1.2
5965
tm.assert_frame_equal(df, expected)
6066

6167
df = orig.copy()
62-
df.iat[1, 1] = 1.2
68+
with tm.assert_produces_warning(
69+
FutureWarning, match="Setting an item of incompatible dtype"
70+
):
71+
df.iat[1, 1] = 1.2
6372
tm.assert_frame_equal(df, expected)
6473

6574
df = orig.copy()
66-
df.iloc[1, 1] = 1.2
75+
with tm.assert_produces_warning(
76+
FutureWarning, match="Setting an item of incompatible dtype"
77+
):
78+
df.iloc[1, 1] = 1.2
6779
tm.assert_frame_equal(df, expected)
6880

6981

@@ -94,11 +106,17 @@ def test_26395(indexer_al):
94106
expected = DataFrame({"D": [0, 0, 2]}, index=["A", "B", "C"], dtype=np.int64)
95107
tm.assert_frame_equal(df, expected)
96108

97-
indexer_al(df)["C", "D"] = 44.5
109+
with tm.assert_produces_warning(
110+
FutureWarning, match="Setting an item of incompatible dtype"
111+
):
112+
indexer_al(df)["C", "D"] = 44.5
98113
expected = DataFrame({"D": [0, 0, 44.5]}, index=["A", "B", "C"], dtype=np.float64)
99114
tm.assert_frame_equal(df, expected)
100115

101-
indexer_al(df)["C", "D"] = "hello"
116+
with tm.assert_produces_warning(
117+
FutureWarning, match="Setting an item of incompatible dtype"
118+
):
119+
indexer_al(df)["C", "D"] = "hello"
102120
expected = DataFrame({"D": [0, 0, "hello"]}, index=["A", "B", "C"], dtype=object)
103121
tm.assert_frame_equal(df, expected)
104122

pandas/tests/frame/indexing/test_indexing.py

+78-8
Original file line numberDiff line numberDiff line change
@@ -335,12 +335,18 @@ def test_setitem(self, float_frame, using_copy_on_write):
335335
def test_setitem2(self):
336336
# dtype changing GH4204
337337
df = DataFrame([[0, 0]])
338-
df.iloc[0] = np.nan
338+
with tm.assert_produces_warning(
339+
FutureWarning, match="Setting an item of incompatible dtype"
340+
):
341+
df.iloc[0] = np.nan
339342
expected = DataFrame([[np.nan, np.nan]])
340343
tm.assert_frame_equal(df, expected)
341344

342345
df = DataFrame([[0, 0]])
343-
df.loc[0] = np.nan
346+
with tm.assert_produces_warning(
347+
FutureWarning, match="Setting an item of incompatible dtype"
348+
):
349+
df.loc[0] = np.nan
344350
tm.assert_frame_equal(df, expected)
345351

346352
def test_setitem_boolean(self, float_frame):
@@ -1332,12 +1338,22 @@ def test_loc_expand_empty_frame_keep_midx_names(self):
13321338
)
13331339
tm.assert_frame_equal(df, expected)
13341340

1335-
@pytest.mark.parametrize("val", ["x", 1])
1336-
@pytest.mark.parametrize("idxr", ["a", ["a"]])
1337-
def test_loc_setitem_rhs_frame(self, idxr, val):
1341+
@pytest.mark.parametrize(
1342+
"val, idxr, warn",
1343+
[
1344+
("x", "a", None), # TODO: this should warn as well
1345+
("x", ["a"], None), # TODO: this should warn as well
1346+
(1, "a", None), # TODO: this should warn as well
1347+
(1, ["a"], FutureWarning),
1348+
],
1349+
)
1350+
def test_loc_setitem_rhs_frame(self, idxr, val, warn):
13381351
# GH#47578
13391352
df = DataFrame({"a": [1, 2]})
1340-
with tm.assert_produces_warning(None):
1353+
1354+
with tm.assert_produces_warning(
1355+
warn, match="Setting an item of incompatible dtype"
1356+
):
13411357
df.loc[:, idxr] = DataFrame({"a": [val, 11]}, index=[1, 2])
13421358
expected = DataFrame({"a": [np.nan, val]})
13431359
tm.assert_frame_equal(df, expected)
@@ -1537,8 +1553,11 @@ def test_setitem(self, uint64_frame):
15371553
# With NaN: because uint64 has no NaN element,
15381554
# the column should be cast to object.
15391555
df2 = df.copy()
1540-
df2.iloc[1, 1] = pd.NaT
1541-
df2.iloc[1, 2] = pd.NaT
1556+
with tm.assert_produces_warning(
1557+
FutureWarning, match="Setting an item of incompatible dtype"
1558+
):
1559+
df2.iloc[1, 1] = pd.NaT
1560+
df2.iloc[1, 2] = pd.NaT
15421561
result = df2["B"]
15431562
tm.assert_series_equal(notna(result), Series([True, False, True], name="B"))
15441563
tm.assert_series_equal(
@@ -1851,3 +1870,54 @@ def test_setitem_dict_and_set_disallowed_multiindex(self, key):
18511870
)
18521871
with pytest.raises(TypeError, match="as an indexer is not supported"):
18531872
df.loc[key] = 1
1873+
1874+
1875+
class TestSetitemValidation:
1876+
# This is adapted from pandas/tests/arrays/masked/test_indexing.py
1877+
# but checks for warnings instead of errors.
1878+
def _check_setitem_invalid(self, df, invalid, indexer):
1879+
msg = "Setting an item of incompatible dtype is deprecated"
1880+
msg = re.escape(msg)
1881+
1882+
orig_df = df.copy()
1883+
1884+
# iloc
1885+
with tm.assert_produces_warning(FutureWarning, match=msg):
1886+
df.iloc[indexer, 0] = invalid
1887+
df = orig_df.copy()
1888+
1889+
# loc
1890+
with tm.assert_produces_warning(FutureWarning, match=msg):
1891+
df.loc[indexer, "a"] = invalid
1892+
df = orig_df.copy()
1893+
1894+
_invalid_scalars = [
1895+
1 + 2j,
1896+
"True",
1897+
"1",
1898+
"1.0",
1899+
pd.NaT,
1900+
np.datetime64("NaT"),
1901+
np.timedelta64("NaT"),
1902+
]
1903+
_indexers = [0, [0], slice(0, 1), [True, False, False]]
1904+
1905+
@pytest.mark.parametrize(
1906+
"invalid", _invalid_scalars + [1, 1.0, np.int64(1), np.float64(1)]
1907+
)
1908+
@pytest.mark.parametrize("indexer", _indexers)
1909+
def test_setitem_validation_scalar_bool(self, invalid, indexer):
1910+
df = DataFrame({"a": [True, False, False]}, dtype="bool")
1911+
self._check_setitem_invalid(df, invalid, indexer)
1912+
1913+
@pytest.mark.parametrize("invalid", _invalid_scalars + [True, 1.5, np.float64(1.5)])
1914+
@pytest.mark.parametrize("indexer", _indexers)
1915+
def test_setitem_validation_scalar_int(self, invalid, any_int_numpy_dtype, indexer):
1916+
df = DataFrame({"a": [1, 2, 3]}, dtype=any_int_numpy_dtype)
1917+
self._check_setitem_invalid(df, invalid, indexer)
1918+
1919+
@pytest.mark.parametrize("invalid", _invalid_scalars + [True])
1920+
@pytest.mark.parametrize("indexer", _indexers)
1921+
def test_setitem_validation_scalar_float(self, invalid, float_numpy_dtype, indexer):
1922+
df = DataFrame({"a": [1, 2, None]}, dtype=float_numpy_dtype)
1923+
self._check_setitem_invalid(df, invalid, indexer)

0 commit comments

Comments
 (0)