Skip to content

DOC: Start migration guide for Copy-on-Write #56298

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Dec 14, 2023
49 changes: 39 additions & 10 deletions doc/source/user_guide/copy_on_write.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,14 +58,14 @@ Migrating to Copy-on-Write
Copy-on-Write will be the default and only mode in pandas 3.0. This means that users
need to migrate their code to be compliant with CoW rules.

The default mode in pandas will raises warnings for certain cases that will actively
The default mode in pandas will raise warnings for certain cases that will actively
change behavior and thus change user intended behavior.

We added another mode, e.g.

```
pd.options.mode.copy_on_write = "warn"
```
.. code-block:: python

pd.options.mode.copy_on_write = "warn"

that will warn for every operation that will change behavior with CoW. We expect this mode
to be very noisy, since many cases that we don't expect that they will influence users will
Expand All @@ -75,7 +75,7 @@ are the only cases that need to be addressed to make existing code work with CoW

The following few items describe the user visible changes:

**ChainedAssignment will never work**
**Chained assignment will never work**

``loc`` should be used as an alternative. Check the
:ref:`chained assignment section <copy_on_write_chained_assignment>` for more details.
Expand All @@ -94,32 +94,61 @@ rules. The returned array is set to non-writeable to protect against this behavi
Creating a copy of this array allows modification. You can also make the array
writeable again if you don't care about the pandas object anymore.

See the section about :ref:`read-only NumPy arrays <copy_on_write_read_only_na>`
for more details.

**Only one pandas object is updated at once**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code example you use is already a practical example of it, but I wonder if we should explicitly call out "modifying a column as a Series no longer works"? As I think this will be one of the main use cases where right now the user intended the propagation of the modification (while with dataframe row slices, I think that is much less the case)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep added


The following code snippet updates both ``df`` and ``view`` without CoW:
The following code snippet updates both ``df`` and ``subset`` without CoW:

.. ipython:: python

df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
subset = df["foo"]
subset.iloc[0] = 100
dfr
df

This won't be possible anymore with CoW, since the CoW rules explicitly forbid this.
This includes updating a single column as a :class:`Series` and relying on the change
propagating back to the parent :class:`DataFrame`.
This statement can be rewritten into a single statement with ``loc`` or ``iloc`` if
this behavior is necessary. :meth:`DataFrame.where` is another suitable alternative
for this case.

Updating a column selected from a :class:`DataFrame` with an inplace method will
also not work anymore.

.. ipython:: python
:okwarning:

df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
df["foo"].replace(1, 5, inplace=True)
df

This is another form of chained assignment. This can generally be rewritten in 2
different forms:

.. ipython:: python

df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
df.replace({"foo": 1}, {"foo": 5}, inplace=True)
df

A different alternative would be to not use ``inplace``:

.. ipython:: python

df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
df["foo"] = df["foo"].replace(1, 5)
df

**Constructors now copy NumPy arrays by default**

The Series and DataFrame constructors will now copy NumPy array by default when not
otherwise specified. This was changed to avoid mutating a pandas object when the
NumPy array is changed inplace outside of pandas. You can set ``copy=False`` to
avoid this copy.

See the section about :ref:`read-only NumPy arrays <copy_on_write_read_only_na>`
for more details.

Description
-----------

Expand Down