Skip to content

Commit f1fae79

Browse files
authored
DOC: Start migration guide for Copy-on-Write (#56298)
1 parent 1387c4f commit f1fae79

File tree

1 file changed

+100
-1
lines changed

1 file changed

+100
-1
lines changed

doc/source/user_guide/copy_on_write.rst

+100-1
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ Copy-on-Write was first introduced in version 1.5.0. Starting from version 2.0 m
1616
optimizations that become possible through CoW are implemented and supported. All possible
1717
optimizations are supported starting from pandas 2.1.
1818

19-
We expect that CoW will be enabled by default in version 3.0.
19+
CoW will be enabled by default in version 3.0.
2020

2121
CoW will lead to more predictable behavior since it is not possible to update more than
2222
one object with one statement, e.g. indexing operations or methods won't have side-effects. Additionally, through
@@ -52,6 +52,103 @@ it explicitly disallows this. With CoW enabled, ``df`` is unchanged:
5252
The following sections will explain what this means and how it impacts existing
5353
applications.
5454

55+
Migrating to Copy-on-Write
56+
--------------------------
57+
58+
Copy-on-Write will be the default and only mode in pandas 3.0. This means that users
59+
need to migrate their code to be compliant with CoW rules.
60+
61+
The default mode in pandas will raise warnings for certain cases that will actively
62+
change behavior and thus change user intended behavior.
63+
64+
We added another mode, e.g.
65+
66+
.. code-block:: python
67+
68+
pd.options.mode.copy_on_write = "warn"
69+
70+
that will warn for every operation that will change behavior with CoW. We expect this mode
71+
to be very noisy, since many cases that we don't expect that they will influence users will
72+
also emit a warning. We recommend checking this mode and analyzing the warnings, but it is
73+
not necessary to address all of these warning. The first two items of the following lists
74+
are the only cases that need to be addressed to make existing code work with CoW.
75+
76+
The following few items describe the user visible changes:
77+
78+
**Chained assignment will never work**
79+
80+
``loc`` should be used as an alternative. Check the
81+
:ref:`chained assignment section <copy_on_write_chained_assignment>` for more details.
82+
83+
**Accessing the underlying array of a pandas object will return a read-only view**
84+
85+
86+
.. ipython:: python
87+
88+
ser = pd.Series([1, 2, 3])
89+
ser.to_numpy()
90+
91+
This example returns a NumPy array that is a view of the Series object. This view can
92+
be modified and thus also modify the pandas object. This is not compliant with CoW
93+
rules. The returned array is set to non-writeable to protect against this behavior.
94+
Creating a copy of this array allows modification. You can also make the array
95+
writeable again if you don't care about the pandas object anymore.
96+
97+
See the section about :ref:`read-only NumPy arrays <copy_on_write_read_only_na>`
98+
for more details.
99+
100+
**Only one pandas object is updated at once**
101+
102+
The following code snippet updates both ``df`` and ``subset`` without CoW:
103+
104+
.. ipython:: python
105+
106+
df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
107+
subset = df["foo"]
108+
subset.iloc[0] = 100
109+
df
110+
111+
This won't be possible anymore with CoW, since the CoW rules explicitly forbid this.
112+
This includes updating a single column as a :class:`Series` and relying on the change
113+
propagating back to the parent :class:`DataFrame`.
114+
This statement can be rewritten into a single statement with ``loc`` or ``iloc`` if
115+
this behavior is necessary. :meth:`DataFrame.where` is another suitable alternative
116+
for this case.
117+
118+
Updating a column selected from a :class:`DataFrame` with an inplace method will
119+
also not work anymore.
120+
121+
.. ipython:: python
122+
:okwarning:
123+
124+
df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
125+
df["foo"].replace(1, 5, inplace=True)
126+
df
127+
128+
This is another form of chained assignment. This can generally be rewritten in 2
129+
different forms:
130+
131+
.. ipython:: python
132+
133+
df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
134+
df.replace({"foo": {1: 5}}, inplace=True)
135+
df
136+
137+
A different alternative would be to not use ``inplace``:
138+
139+
.. ipython:: python
140+
141+
df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
142+
df["foo"] = df["foo"].replace(1, 5)
143+
df
144+
145+
**Constructors now copy NumPy arrays by default**
146+
147+
The Series and DataFrame constructors will now copy NumPy array by default when not
148+
otherwise specified. This was changed to avoid mutating a pandas object when the
149+
NumPy array is changed inplace outside of pandas. You can set ``copy=False`` to
150+
avoid this copy.
151+
55152
Description
56153
-----------
57154

@@ -163,6 +260,8 @@ With copy on write this can be done by using ``loc``.
163260
164261
df.loc[df["bar"] > 5, "foo"] = 100
165262
263+
.. _copy_on_write_read_only_na:
264+
166265
Read-only NumPy arrays
167266
----------------------
168267

0 commit comments

Comments
 (0)