Skip to content

PDEP6 implementation pt 1: block.setitem, block.putmask #50626

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 106 commits into from
Jul 27, 2023
Merged
Show file tree
Hide file tree
Changes from 104 commits
Commits
Show all changes
106 commits
Select commit Hold shift + click to select a range
ccfcb8d
pt1
Dec 28, 2022
30dde08
fixup test collection
Dec 29, 2022
d2dfa3c
fixup warnings
Dec 29, 2022
e4ea519
add comments
Dec 30, 2022
e160d7f
fixup warnings
Dec 30, 2022
f7971f0
fixup test_indexing
Dec 30, 2022
373a30e
fixup test_set_value
Dec 30, 2022
26ab49d
fixup test_where
Dec 30, 2022
cc7cc5a
fixup test_asof
Dec 30, 2022
0362481
add one more explicit upcast
Dec 30, 2022
120aee7
fixup test_update
Dec 30, 2022
7972c08
fixup test_constructors
Dec 30, 2022
a16dcb8
fixup test_stack_unstack
Dec 30, 2022
25dc26f
catch warnings in test_update_dtypes
Jan 7, 2023
d45b312
fixup all test_update
Jan 7, 2023
e4ed811
start fixing up setitem
Jan 7, 2023
ace5e05
finish fixing up test_setitem
Jan 7, 2023
34a6194
more fixups
Jan 7, 2023
0cfefa4
catch numpy-dev warning
Jan 8, 2023
150fa9a
fixup some more
Jan 8, 2023
1cc3f50
fixup test_indexing
Jan 8, 2023
3f15670
fixup test_function
Jan 13, 2023
d0de3f1
fixup test_multi;
Jan 13, 2023
dc6c60b
fixup test_base
Jan 13, 2023
e33563a
fixup test_impl
Jan 13, 2023
37e0520
fixup multiindex/test_setitem
Jan 13, 2023
2dd85ab
fixup test_scalar
Jan 13, 2023
24ca7c2
fixup test_loc
Jan 13, 2023
3aed02f
fixup test_iloc
Jan 13, 2023
25f0693
fixup test_at
Jan 13, 2023
0e5fb73
fixup test_groupby
Jan 13, 2023
9f56342
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
Jan 13, 2023
9b9e975
fixup some doc warnings
Jan 13, 2023
2f980c6
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
Jan 26, 2023
044227f
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
Mar 17, 2023
741b37d
post-merge fixup
Mar 17, 2023
c536d8a
change dtype in doctest
Mar 17, 2023
142992e
fixup doctest
Mar 17, 2023
4551855
explicit cast in test
Mar 17, 2023
f6e34ef
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
Mar 20, 2023
01b0c72
fixup test for COW
Mar 20, 2023
d9c2225
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
Apr 12, 2023
6676bad
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
Apr 14, 2023
df740e6
fixup COW
Apr 14, 2023
9a54956
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
Apr 24, 2023
8386032
catch warnings in testsetitemcastingequivalents
Apr 24, 2023
39decb2
wip
Apr 24, 2023
7549888
fixup setitem test int key!
Apr 24, 2023
527fa2d
getting there!
Apr 24, 2023
caa35c3
fixup test_setitem
Apr 24, 2023
cc1542d
getting there
Apr 24, 2023
6636a37
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
Apr 25, 2023
492d443
fixup remaining warnings
Apr 25, 2023
eb8dd0f
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
Apr 25, 2023
90a64ab
fix test_update
Apr 25, 2023
272b735
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
Apr 26, 2023
b3f6b93
fixup some failing test
Apr 26, 2023
b8532cc
one more
Apr 26, 2023
81bba3c
simplify
Apr 26, 2023
87922b5
simplify and remove some false-positives
Apr 26, 2023
f314dd1
clean up
Apr 27, 2023
6b5bc73
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
Apr 27, 2023
8cad201
remove final filterwarnings
Apr 27, 2023
d3b12ab
undo unrelated change
Apr 27, 2023
adc0022
fixup raises_chained_assignment_error
Apr 27, 2023
d5bdfcf
remove another filterwarnings
Apr 27, 2023
37836e5
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
Apr 27, 2023
1a23fe7
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
Apr 27, 2023
dfa8ed2
fixup interchange test
Apr 27, 2023
3efe0a5
better parametrisation
Apr 27, 2023
cc95eec
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
May 4, 2023
4ba259d
okwarning => codeblock
May 4, 2023
c21aa4d
okwarning => codeblock in v1.3.0
May 4, 2023
05ffc27
one more codeblock
May 4, 2023
6612690
avoid upcast
May 4, 2023
d1aba37
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
May 4, 2023
adaf4e4
post-merge fixup
May 4, 2023
f194434
docs fixup;
May 4, 2023
6dc7fd1
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
May 5, 2023
82ecca2
post-merge fixup
May 5, 2023
a9d5891
remove more upcasts
May 5, 2023
0ccb541
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
MarcoGorelli May 8, 2023
aec0c87
adapt test from EA types
MarcoGorelli May 8, 2023
72e5609
move test to series/indexing
MarcoGorelli May 8, 2023
1455263
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
MarcoGorelli May 12, 2023
d16eea6
add tests about warnings
MarcoGorelli May 12, 2023
25198d4
fixup tests
MarcoGorelli May 12, 2023
ba6daa5
add dataframe tests too
MarcoGorelli May 12, 2023
9b15bc2
fixup tests
MarcoGorelli May 12, 2023
ebd1a50
simplify
MarcoGorelli May 12, 2023
a835281
try-fix docs build
MarcoGorelli May 12, 2023
908430b
Merge branch 'main' into pt1-deprecate-ban-upcasting
MarcoGorelli May 17, 2023
9802696
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
MarcoGorelli May 22, 2023
5ee4ebf
post-merge fixup
MarcoGorelli May 22, 2023
bb37b09
raise assertionerror if self.dtype equals new_dtype
MarcoGorelli May 22, 2023
f46f7e3
Merge branch 'pt1-deprecate-ban-upcasting' of github.com:MarcoGorelli…
MarcoGorelli May 22, 2023
7df15e9
add todo for test case which should warn
MarcoGorelli May 22, 2023
f323d2a
add more todos
MarcoGorelli May 22, 2023
5f5a6a5
post-merge fixup
MarcoGorelli May 22, 2023
0fb017b
Merge branch 'main' into pt1-deprecate-ban-upcasting
MarcoGorelli May 26, 2023
e540404
Merge branch 'main' into pt1-deprecate-ban-upcasting
MarcoGorelli Jun 15, 2023
dce5f36
Merge branch 'main' into pt1-deprecate-ban-upcasting
MarcoGorelli Jun 23, 2023
095048c
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
MarcoGorelli Jul 3, 2023
0381c62
move logic for determining warning into the test
MarcoGorelli Jul 3, 2023
e04a8c1
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
MarcoGorelli Jul 24, 2023
a843e30
uncomment test
MarcoGorelli Jul 24, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/user_guide/categorical.rst
Original file line number Diff line number Diff line change
Expand Up @@ -779,6 +779,7 @@ Setting values by assigning categorical data will also check that the ``categori
Assigning a ``Categorical`` to parts of a column of other types will use the values:

.. ipython:: python
:okwarning:

df = pd.DataFrame({"a": [1, 1, 1, 1, 1], "b": ["a", "a", "a", "a", "a"]})
df.loc[1:2, "a"] = pd.Categorical(["b", "b"], categories=["a", "b"])
Expand Down
21 changes: 13 additions & 8 deletions doc/source/whatsnew/v0.21.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -425,13 +425,12 @@ Note that this also changes the sum of an empty ``Series``. Previously this alwa
In [1]: pd.Series([]).sum()
Out[1]: 0

but for consistency with the all-NaN case, this was changed to return NaN as well:
but for consistency with the all-NaN case, this was changed to return 0 as well:

.. ipython:: python
:okwarning:

pd.Series([]).sum()
.. code-block:: ipython

In [2]: pd.Series([]).sum()
Out[2]: 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unrelated to this PR, just incorrect AFAICT. (mostly a note-to-self)


.. _whatsnew_0210.api_breaking.loc:

Expand Down Expand Up @@ -755,10 +754,16 @@ Previously assignments, ``.where()`` and ``.fillna()`` with a ``bool`` assignmen

New behavior

.. ipython:: python
.. code-block:: ipython

s[1] = True
s
In [7]: s[1] = True

In [8]: s
Out[8]:
0 1
1 True
2 3
Length: 3, dtype: object

Previously, as assignment to a datetimelike with a non-datetimelike would coerce the
non-datetime-like item being assigned (:issue:`14145`).
Expand Down
31 changes: 22 additions & 9 deletions doc/source/whatsnew/v1.3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -501,13 +501,17 @@ Consistent casting with setting into Boolean Series
Setting non-boolean values into a :class:`Series` with ``dtype=bool`` now consistently
casts to ``dtype=object`` (:issue:`38709`)

.. ipython:: python
.. code-block:: ipython

In [1]: orig = pd.Series([True, False])

In [2]: ser = orig.copy()

In [3]: ser.iloc[1] = np.nan

orig = pd.Series([True, False])
ser = orig.copy()
ser.iloc[1] = np.nan
ser2 = orig.copy()
ser2.iloc[1] = 2.0
In [4]: ser2 = orig.copy()

In [5]: ser2.iloc[1] = 2.0

*Previous behavior*:

Expand All @@ -527,10 +531,19 @@ casts to ``dtype=object`` (:issue:`38709`)

*New behavior*:

.. ipython:: python
.. code-block:: ipython

In [1]: ser
Out [1]:
0 True
1 NaN
dtype: object

ser
ser2
In [2]:ser2
Out [2]:
0 True
1 2.0
dtype: object


.. _whatsnew_130.notable_bug_fixes.rolling_groupby_column:
Expand Down
26 changes: 22 additions & 4 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
cast,
final,
)
import warnings

import numpy as np

Expand Down Expand Up @@ -43,6 +44,7 @@
)
from pandas.errors import AbstractMethodError
from pandas.util._decorators import cache_readonly
from pandas.util._exceptions import find_stack_level
from pandas.util._validators import validate_bool_kwarg

from pandas.core.dtypes.astype import (
Expand Down Expand Up @@ -415,7 +417,7 @@ def split_and_operate(self, func, *args, **kwargs) -> list[Block]:
# Up/Down-casting

@final
def coerce_to_target_dtype(self, other) -> Block:
def coerce_to_target_dtype(self, other, warn_on_upcast: bool = False) -> Block:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be changed to raise_on_upcast when the deprecation is enforced

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would also cover 'shift' (xref #53802). I don't recall that being a part of PDEP6. Am I remembering wrong?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i see there's a new kwd, never mind

"""
coerce the current block to a dtype compat for other
we will return a block, possibly object, and not raise
Expand All @@ -424,7 +426,21 @@ def coerce_to_target_dtype(self, other) -> Block:
and will receive the same block
"""
new_dtype = find_result_type(self.values, other)

if warn_on_upcast:
warnings.warn(
f"Setting an item of incompatible dtype is deprecated "
"and will raise in a future error of pandas. "
f"Value '{other}' has dtype incompatible with {self.dtype}, "
"please explicitly cast to a compatible dtype first.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking aloud. It would be difficult to suggest which dtype to cast? I guess object would be safest though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in practice i think it will almost always be x->object or int->float

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe - but I'd suggest doing this separately if it's OK, this has already been open since January 😄 reckon we can get this in for 2.1? It should be pretty quick to rebase #53405, can do that right after this one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this would be fine as a follow up

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me

FutureWarning,
stacklevel=find_stack_level(),
)
if self.dtype == new_dtype:
raise AssertionError(
f"Did not expect new dtype {new_dtype} to equal self.dtype "
f"{self.dtype}. Please report a bug at "
"https://github.com/pandas-dev/pandas/issues."
)
return self.astype(new_dtype, copy=False)

@final
Expand Down Expand Up @@ -1075,7 +1091,7 @@ def setitem(self, indexer, value, using_cow: bool = False) -> Block:
casted = np_can_hold_element(values.dtype, value)
except LossySetitemError:
# current dtype cannot store value, coerce to common dtype
nb = self.coerce_to_target_dtype(value)
nb = self.coerce_to_target_dtype(value, warn_on_upcast=True)
return nb.setitem(indexer, value)
else:
if self.dtype == _dtype_obj:
Expand Down Expand Up @@ -1148,7 +1164,9 @@ def putmask(self, mask, new, using_cow: bool = False) -> list[Block]:

if not is_list_like(new):
# using just new[indexer] can't save us the need to cast
return self.coerce_to_target_dtype(new).putmask(mask, new)
return self.coerce_to_target_dtype(
new, warn_on_upcast=True
).putmask(mask, new)
else:
indexer = mask.nonzero()[0]
nb = self.setitem(indexer, new[indexer], using_cow=using_cow)
Expand Down
26 changes: 22 additions & 4 deletions pandas/tests/copy_view/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -924,11 +924,20 @@ def test_column_as_series_set_with_upcast(
s[0] = "foo"
expected = Series([1, 2, 3], name="a")
elif using_copy_on_write or using_array_manager:
s[0] = "foo"
with tm.assert_produces_warning(FutureWarning, match="incompatible dtype"):
s[0] = "foo"
expected = Series(["foo", 2, 3], dtype=object, name="a")
else:
with pd.option_context("chained_assignment", "warn"):
with tm.assert_produces_warning(SettingWithCopyWarning):
msg = "|".join(
[
"A value is trying to be set on a copy of a slice from a DataFrame",
"Setting an item of incompatible dtype is deprecated",
]
)
with tm.assert_produces_warning(
(SettingWithCopyWarning, FutureWarning), match=msg
):
s[0] = "foo"
expected = Series(["foo", 2, 3], dtype=object, name="a")

Expand Down Expand Up @@ -1019,7 +1028,10 @@ def test_dataframe_add_column_from_series(backend, using_copy_on_write):
],
)
def test_set_value_copy_only_necessary_column(
using_copy_on_write, indexer_func, indexer, val
using_copy_on_write,
indexer_func,
indexer,
val,
):
# When setting inplace, only copy column that is modified instead of the whole
# block (by splitting the block)
Expand All @@ -1028,7 +1040,13 @@ def test_set_value_copy_only_necessary_column(
df_orig = df.copy()
view = df[:]

indexer_func(df)[indexer] = val
if val == "a" and indexer[0] != slice(None):
with tm.assert_produces_warning(
FutureWarning, match="Setting an item of incompatible dtype is deprecated"
):
indexer_func(df)[indexer] = val
else:
indexer_func(df)[indexer] = val

if using_copy_on_write:
assert np.shares_memory(get_array(df, "b"), get_array(view, "b"))
Expand Down
9 changes: 6 additions & 3 deletions pandas/tests/copy_view/test_methods.py
Original file line number Diff line number Diff line change
Expand Up @@ -1402,15 +1402,18 @@ def test_putmask_aligns_rhs_no_reference(using_copy_on_write, dtype):
assert np.shares_memory(arr_a, get_array(df, "a"))


@pytest.mark.parametrize("val, exp", [(5.5, True), (5, False)])
def test_putmask_dont_copy_some_blocks(using_copy_on_write, val, exp):
@pytest.mark.parametrize(
"val, exp, warn", [(5.5, True, FutureWarning), (5, False, None)]
)
def test_putmask_dont_copy_some_blocks(using_copy_on_write, val, exp, warn):
df = DataFrame({"a": [1, 2], "b": 1, "c": 1.5})
view = df[:]
df_orig = df.copy()
indexer = DataFrame(
[[True, False, False], [True, False, False]], columns=list("abc")
)
df[indexer] = val
with tm.assert_produces_warning(warn, match="incompatible dtype"):
df[indexer] = val

if using_copy_on_write:
assert not np.shares_memory(get_array(view, "a"), get_array(df, "a"))
Expand Down
30 changes: 24 additions & 6 deletions pandas/tests/frame/indexing/test_coercion.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,19 +51,31 @@ def test_37477():
expected = DataFrame({"A": [1, 2, 3], "B": [3, 1.2, 5]})

df = orig.copy()
df.at[1, "B"] = 1.2
with tm.assert_produces_warning(
FutureWarning, match="Setting an item of incompatible dtype"
):
df.at[1, "B"] = 1.2
tm.assert_frame_equal(df, expected)

df = orig.copy()
df.loc[1, "B"] = 1.2
with tm.assert_produces_warning(
FutureWarning, match="Setting an item of incompatible dtype"
):
df.loc[1, "B"] = 1.2
tm.assert_frame_equal(df, expected)

df = orig.copy()
df.iat[1, 1] = 1.2
with tm.assert_produces_warning(
FutureWarning, match="Setting an item of incompatible dtype"
):
df.iat[1, 1] = 1.2
tm.assert_frame_equal(df, expected)

df = orig.copy()
df.iloc[1, 1] = 1.2
with tm.assert_produces_warning(
FutureWarning, match="Setting an item of incompatible dtype"
):
df.iloc[1, 1] = 1.2
tm.assert_frame_equal(df, expected)


Expand Down Expand Up @@ -94,11 +106,17 @@ def test_26395(indexer_al):
expected = DataFrame({"D": [0, 0, 2]}, index=["A", "B", "C"], dtype=np.int64)
tm.assert_frame_equal(df, expected)

indexer_al(df)["C", "D"] = 44.5
with tm.assert_produces_warning(
FutureWarning, match="Setting an item of incompatible dtype"
):
indexer_al(df)["C", "D"] = 44.5
expected = DataFrame({"D": [0, 0, 44.5]}, index=["A", "B", "C"], dtype=np.float64)
tm.assert_frame_equal(df, expected)

indexer_al(df)["C", "D"] = "hello"
with tm.assert_produces_warning(
FutureWarning, match="Setting an item of incompatible dtype"
):
indexer_al(df)["C", "D"] = "hello"
expected = DataFrame({"D": [0, 0, "hello"]}, index=["A", "B", "C"], dtype=object)
tm.assert_frame_equal(df, expected)

Expand Down
Loading