From 3bb6c162ad51141153280ba9930e96c34d1e6994 Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Fri, 8 Dec 2023 16:54:15 +0100 Subject: [PATCH 1/4] DOC: add deprecation of chained assignment to 2.2 whatsnew --- doc/source/whatsnew/v2.2.0.rst | 64 ++++++++++++++++++++++++++++++++++ 1 file changed, 64 insertions(+) diff --git a/doc/source/whatsnew/v2.2.0.rst b/doc/source/whatsnew/v2.2.0.rst index c878fd2664dc4..c8f07a8a275fc 100644 --- a/doc/source/whatsnew/v2.2.0.rst +++ b/doc/source/whatsnew/v2.2.0.rst @@ -358,6 +358,70 @@ Other API changes Deprecations ~~~~~~~~~~~~ +Chained assignment +^^^^^^^^^^^^^^^^^^ + +In preparation of larger upcoming changes to the copy / view behaviour in pandas 3.0 +(:ref:`copy_on_write`, PDEP-7), we started deprecating *chained assignment*. + +Chained assignment occurs when you try to update a pandas DataFrame or Series through +two subsequent indexing operations. Depending on the type and order of those operations +this currently does or does not work, as explained in :ref:`indexing.view_versus_copy`. + +A typical example is as follows: + +.. code-block:: python + + df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]}) + + # first selecting rows with a mask, then assigning values to a column + # -> this has never worked and raises a SettingWithCopyWarning + df[df["bar"] > 5]["foo"] = 100 + + # first selecting the column, and then assigning to a subset of that column + # -> this currently works + df["foo"][df["bar"] > 5] = 100 + +This second example of chained assignment currently works to update the original ``df``. +This will no longer work in pandas 3.0, and therefore we started deprecating this: + +.. code-block:: python + + >>> df["foo"][df["bar"] > 5] = 100 + FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0! + You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy. + A typical example is when you are setting values in a column of a DataFrame, like: + + df["col"][row_indexer] = value + + Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`. + + See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy + +You can fix this warning and ensure your code is ready for pandas 3.0 by removing +the usage of chained assignment. Typically, this can be done by doing the assignment +in a single step using for example ``.loc``. For the example above, we can do: + +.. code-block:: python + + df.loc[df["bar"] > 5, "foo"] = 100 + +The same deprecation applies to inplace methods that are done in a chained manner, such as: + +.. code-block:: python + + >>> df["foo"].fillna(0, inplace=True) + FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. + The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. + + For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. + +When the goal is to update the column in the DataFrame ``df``, the alternative here is +to call the method on ``df`` itself, such as ``df.fillna({"foo": 0}, inplace=True)``. + +TODO See more details in migration guide . + + Deprecate aliases ``M``, ``Q``, ``Y``, etc. in favour of ``ME``, ``QE``, ``YE``, etc. for offsets ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ From c5e340be0a33a95cb2f9cdabfc88367559de1a6d Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Thu, 21 Dec 2023 22:46:34 +0100 Subject: [PATCH 2/4] update link --- doc/source/user_guide/copy_on_write.rst | 2 ++ doc/source/whatsnew/v2.2.0.rst | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/doc/source/user_guide/copy_on_write.rst b/doc/source/user_guide/copy_on_write.rst index bc233f4323e2a..8219c5b52873e 100644 --- a/doc/source/user_guide/copy_on_write.rst +++ b/doc/source/user_guide/copy_on_write.rst @@ -52,6 +52,8 @@ it explicitly disallows this. With CoW enabled, ``df`` is unchanged: The following sections will explain what this means and how it impacts existing applications. +.. _copy_on_write_migration: + Migrating to Copy-on-Write -------------------------- diff --git a/doc/source/whatsnew/v2.2.0.rst b/doc/source/whatsnew/v2.2.0.rst index ebed57016ed09..af9a28553c947 100644 --- a/doc/source/whatsnew/v2.2.0.rst +++ b/doc/source/whatsnew/v2.2.0.rst @@ -427,7 +427,7 @@ The same deprecation applies to inplace methods that are done in a chained manne When the goal is to update the column in the DataFrame ``df``, the alternative here is to call the method on ``df`` itself, such as ``df.fillna({"foo": 0}, inplace=True)``. -TODO See more details in migration guide . +See more details in the :ref:`migration guide `. Deprecate aliases ``M``, ``Q``, ``Y``, etc. in favour of ``ME``, ``QE``, ``YE``, etc. for offsets From ba154cae65d0f46cf40f09059453164a0ed3f8fa Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Thu, 21 Dec 2023 22:47:23 +0100 Subject: [PATCH 3/4] remove link to old docs --- doc/source/whatsnew/v2.2.0.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/whatsnew/v2.2.0.rst b/doc/source/whatsnew/v2.2.0.rst index af9a28553c947..5902afea4f868 100644 --- a/doc/source/whatsnew/v2.2.0.rst +++ b/doc/source/whatsnew/v2.2.0.rst @@ -374,7 +374,7 @@ In preparation of larger upcoming changes to the copy / view behaviour in pandas Chained assignment occurs when you try to update a pandas DataFrame or Series through two subsequent indexing operations. Depending on the type and order of those operations -this currently does or does not work, as explained in :ref:`indexing.view_versus_copy`. +this currently does or does not work. A typical example is as follows: From 12b1d999694d5b9d0b195eb1617e4212e8d762a7 Mon Sep 17 00:00:00 2001 From: Patrick Hoefler <61934744+phofl@users.noreply.github.com> Date: Thu, 21 Dec 2023 23:15:23 +0100 Subject: [PATCH 4/4] Update v2.2.0.rst --- doc/source/whatsnew/v2.2.0.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/whatsnew/v2.2.0.rst b/doc/source/whatsnew/v2.2.0.rst index 4742c6ac80594..991daf6e65c8e 100644 --- a/doc/source/whatsnew/v2.2.0.rst +++ b/doc/source/whatsnew/v2.2.0.rst @@ -512,7 +512,7 @@ The same deprecation applies to inplace methods that are done in a chained manne When the goal is to update the column in the DataFrame ``df``, the alternative here is to call the method on ``df`` itself, such as ``df.fillna({"foo": 0}, inplace=True)``. -See more details in the :ref:`migration guide `. +See more details in the :ref:`migration guide `. Deprecate aliases ``M``, ``Q``, ``Y``, etc. in favour of ``ME``, ``QE``, ``YE``, etc. for offsets