@@ -8,16 +8,12 @@ Copy-on-Write (CoW)
8
8
9
9
.. note ::
10
10
11
- Copy-on-Write will become the default in pandas 3.0. We recommend
12
- :ref: `turning it on now <copy_on_write_enabling >`
13
- to benefit from all improvements.
11
+ Copy-on-Write is now the default with pandas 3.0.
14
12
15
13
Copy-on-Write was first introduced in version 1.5.0. Starting from version 2.0 most of the
16
14
optimizations that become possible through CoW are implemented and supported. All possible
17
15
optimizations are supported starting from pandas 2.1.
18
16
19
- CoW will be enabled by default in version 3.0.
20
-
21
17
CoW will lead to more predictable behavior since it is not possible to update more than
22
18
one object with one statement, e.g. indexing operations or methods won't have side-effects. Additionally, through
23
19
delaying copies as long as possible, the average performance and memory usage will improve.
@@ -29,21 +25,25 @@ pandas indexing behavior is tricky to understand. Some operations return views w
29
25
other return copies. Depending on the result of the operation, mutating one object
30
26
might accidentally mutate another:
31
27
32
- .. ipython :: python
28
+ .. code-block :: ipython
33
29
34
- df = pd.DataFrame({" foo" : [1 , 2 , 3 ], " bar" : [4 , 5 , 6 ]})
35
- subset = df[" foo" ]
36
- subset.iloc[0 ] = 100
37
- df
30
+ In [1]: df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
31
+ In [2]: subset = df["foo"]
32
+ In [3]: subset.iloc[0] = 100
33
+ In [4]: df
34
+ Out[4]:
35
+ foo bar
36
+ 0 100 4
37
+ 1 2 5
38
+ 2 3 6
38
39
39
- Mutating ``subset ``, e.g. updating its values, also updates ``df ``. The exact behavior is
40
+
41
+ Mutating ``subset ``, e.g. updating its values, also updated ``df ``. The exact behavior was
40
42
hard to predict. Copy-on-Write solves accidentally modifying more than one object,
41
- it explicitly disallows this. With CoW enabled, ``df `` is unchanged:
43
+ it explicitly disallows this. ``df `` is unchanged:
42
44
43
45
.. ipython :: python
44
46
45
- pd.options.mode.copy_on_write = True
46
-
47
47
df = pd.DataFrame({" foo" : [1 , 2 , 3 ], " bar" : [4 , 5 , 6 ]})
48
48
subset = df[" foo" ]
49
49
subset.iloc[0 ] = 100
@@ -57,13 +57,13 @@ applications.
57
57
Migrating to Copy-on-Write
58
58
--------------------------
59
59
60
- Copy-on-Write will be the default and only mode in pandas 3.0. This means that users
60
+ Copy-on-Write is the default and only mode in pandas 3.0. This means that users
61
61
need to migrate their code to be compliant with CoW rules.
62
62
63
- The default mode in pandas will raise warnings for certain cases that will actively
63
+ The default mode in pandas < 3.0 raises warnings for certain cases that will actively
64
64
change behavior and thus change user intended behavior.
65
65
66
- We added another mode, e.g.
66
+ pandas 2.2 has a warning mode
67
67
68
68
.. code-block :: python
69
69
@@ -84,7 +84,6 @@ The following few items describe the user visible changes:
84
84
85
85
**Accessing the underlying array of a pandas object will return a read-only view **
86
86
87
-
88
87
.. ipython :: python
89
88
90
89
ser = pd.Series([1 , 2 , 3 ])
@@ -101,16 +100,21 @@ for more details.
101
100
102
101
**Only one pandas object is updated at once **
103
102
104
- The following code snippet updates both ``df `` and ``subset `` without CoW:
103
+ The following code snippet updated both ``df `` and ``subset `` without CoW:
105
104
106
- .. ipython :: python
105
+ .. code-block :: ipython
107
106
108
- df = pd.DataFrame({" foo" : [1 , 2 , 3 ], " bar" : [4 , 5 , 6 ]})
109
- subset = df[" foo" ]
110
- subset.iloc[0 ] = 100
111
- df
107
+ In [1]: df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
108
+ In [2]: subset = df["foo"]
109
+ In [3]: subset.iloc[0] = 100
110
+ In [4]: df
111
+ Out[4]:
112
+ foo bar
113
+ 0 100 4
114
+ 1 2 5
115
+ 2 3 6
112
116
113
- This won't be possible anymore with CoW, since the CoW rules explicitly forbid this.
117
+ This is not possible anymore with CoW, since the CoW rules explicitly forbid this.
114
118
This includes updating a single column as a :class: `Series ` and relying on the change
115
119
propagating back to the parent :class: `DataFrame `.
116
120
This statement can be rewritten into a single statement with ``loc `` or ``iloc `` if
@@ -146,7 +150,7 @@ A different alternative would be to not use ``inplace``:
146
150
147
151
**Constructors now copy NumPy arrays by default **
148
152
149
- The Series and DataFrame constructors will now copy NumPy array by default when not
153
+ The Series and DataFrame constructors now copies a NumPy array by default when not
150
154
otherwise specified. This was changed to avoid mutating a pandas object when the
151
155
NumPy array is changed inplace outside of pandas. You can set ``copy=False `` to
152
156
avoid this copy.
@@ -162,7 +166,7 @@ that shares data with another DataFrame or Series object inplace.
162
166
This avoids side-effects when modifying values and hence, most methods can avoid
163
167
actually copying the data and only trigger a copy when necessary.
164
168
165
- The following example will operate inplace with CoW :
169
+ The following example will operate inplace:
166
170
167
171
.. ipython :: python
168
172
@@ -207,15 +211,17 @@ listed in :ref:`Copy-on-Write optimizations <copy_on_write.optimizations>`.
207
211
208
212
Previously, when operating on views, the view and the parent object was modified:
209
213
210
- .. ipython :: python
211
-
212
- with pd.option_context(" mode.copy_on_write" , False ):
213
- df = pd.DataFrame({" foo" : [1 , 2 , 3 ], " bar" : [4 , 5 , 6 ]})
214
- view = df[:]
215
- df.iloc[0 , 0 ] = 100
214
+ .. code-block :: ipython
216
215
217
- df
218
- view
216
+ In [1]: df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
217
+ In [2]: subset = df["foo"]
218
+ In [3]: subset.iloc[0] = 100
219
+ In [4]: df
220
+ Out[4]:
221
+ foo bar
222
+ 0 100 4
223
+ 1 2 5
224
+ 2 3 6
219
225
220
226
CoW triggers a copy when ``df `` is changed to avoid mutating ``view `` as well:
221
227
@@ -236,16 +242,19 @@ Chained Assignment
236
242
Chained assignment references a technique where an object is updated through
237
243
two subsequent indexing operations, e.g.
238
244
239
- .. ipython :: python
240
- :okwarning:
245
+ .. code-block :: ipython
241
246
242
- with pd.option_context(" mode.copy_on_write" , False ):
243
- df = pd.DataFrame({" foo" : [1 , 2 , 3 ], " bar" : [4 , 5 , 6 ]})
244
- df[" foo" ][df[" bar" ] > 5 ] = 100
245
- df
247
+ In [1]: df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
248
+ In [2]: df["foo"][df["bar"] > 5] = 100
249
+ In [3]: df
250
+ Out[3]:
251
+ foo bar
252
+ 0 100 4
253
+ 1 2 5
254
+ 2 3 6
246
255
247
- The column ``foo `` is updated where the column ``bar `` is greater than 5.
248
- This violates the CoW principles though, because it would have to modify the
256
+ The column ``foo `` was updated where the column ``bar `` is greater than 5.
257
+ This violated the CoW principles though, because it would have to modify the
249
258
view ``df["foo"] `` and ``df `` in one step. Hence, chained assignment will
250
259
consistently never work and raise a ``ChainedAssignmentError `` warning
251
260
with CoW enabled:
@@ -272,7 +281,6 @@ shares data with the initial DataFrame:
272
281
273
282
The array is a copy if the initial DataFrame consists of more than one array:
274
283
275
-
276
284
.. ipython :: python
277
285
278
286
df = pd.DataFrame({" a" : [1 , 2 ], " b" : [1.5 , 2.5 ]})
@@ -347,22 +355,3 @@ and :meth:`DataFrame.rename`.
347
355
348
356
These methods return views when Copy-on-Write is enabled, which provides a significant
349
357
performance improvement compared to the regular execution.
350
-
351
- .. _copy_on_write_enabling :
352
-
353
- How to enable CoW
354
- -----------------
355
-
356
- Copy-on-Write can be enabled through the configuration option ``copy_on_write ``. The option can
357
- be turned on __globally__ through either of the following:
358
-
359
- .. ipython :: python
360
-
361
- pd.set_option(" mode.copy_on_write" , True )
362
-
363
- pd.options.mode.copy_on_write = True
364
-
365
- .. ipython :: python
366
- :suppress:
367
-
368
- pd.options.mode.copy_on_write = False
0 commit comments