@@ -7,8 +7,8 @@ Copy-on-Write (CoW)
7
7
*******************
8
8
9
9
Copy-on-Write was first introduced in version 1.5.0. Starting from version 2.0 most of the
10
- optimizations that become possible through CoW are implemented and supported. A complete list
11
- can be found at :ref: ` Copy-on-Write optimizations < copy_on_write.optimizations >` .
10
+ optimizations that become possible through CoW are implemented and supported. All possible
11
+ optimizations are supported starting from pandas 2.1 .
12
12
13
13
We expect that CoW will be enabled by default in version 3.0.
14
14
@@ -154,66 +154,86 @@ With copy on write this can be done by using ``loc``.
154
154
155
155
df.loc[df[" bar" ] > 5 , " foo" ] = 100
156
156
157
+ Read-only NumPy arrays
158
+ ----------------------
159
+
160
+ Accessing the underlying NumPy array of a DataFrame will return a read-only array if the array
161
+ shares data with the initial DataFrame:
162
+
163
+ The array is a copy if the initial DataFrame consists of more than one array:
164
+
165
+
166
+ .. ipython :: python
167
+
168
+ df = pd.DataFrame({" a" : [1 , 2 ], " b" : [1.5 , 2.5 ]})
169
+ df.to_numpy()
170
+
171
+ The array shares data with the DataFrame if the DataFrame consists of only one NumPy array:
172
+
173
+ .. ipython :: python
174
+
175
+ df = pd.DataFrame({" a" : [1 , 2 ], " b" : [3 , 4 ]})
176
+ df.to_numpy()
177
+
178
+ This array is read-only, which means that it can't be modified inplace:
179
+
180
+ .. ipython :: python
181
+ :okexcept:
182
+
183
+ arr = df.to_numpy()
184
+ arr[0 , 0 ] = 100
185
+
186
+ The same holds true for a Series, since a Series always consists of a single array.
187
+
188
+ There are two potential solution to this:
189
+
190
+ - Trigger a copy manually if you want to avoid updating DataFrames that share memory with your array.
191
+ - Make the array writeable. This is a more performant solution but circumvents Copy-on-Write rules, so
192
+ it should be used with caution.
193
+
194
+ .. ipython :: python
195
+
196
+ arr = df.to_numpy()
197
+ arr.flags.writeable = True
198
+ arr[0 , 0 ] = 100
199
+ arr
200
+
201
+ Patterns to avoid
202
+ -----------------
203
+
204
+ No defensive copy will be performed if two objects share the same data while
205
+ you are modifying one object inplace.
206
+
207
+ .. ipython :: python
208
+
209
+ df = pd.DataFrame({" a" : [1 , 2 , 3 ], " b" : [4 , 5 , 6 ]})
210
+ df2 = df.reset_index()
211
+ df2.iloc[0 , 0 ] = 100
212
+
213
+ This creates two objects that share data and thus the setitem operation will trigger a
214
+ copy. This is not necessary if the initial object ``df `` isn't needed anymore.
215
+ Simply reassigning to the same variable will invalidate the reference that is
216
+ held by the object.
217
+
218
+ .. ipython :: python
219
+
220
+ df = pd.DataFrame({" a" : [1 , 2 , 3 ], " b" : [4 , 5 , 6 ]})
221
+ df = df.reset_index()
222
+ df.iloc[0 , 0 ] = 100
223
+
224
+ No copy is necessary in this example.
225
+ Creating multiple references keeps unnecessary references alive
226
+ and thus will hurt performance with Copy-on-Write.
227
+
157
228
.. _copy_on_write.optimizations :
158
229
159
230
Copy-on-Write optimizations
160
231
---------------------------
161
232
162
233
A new lazy copy mechanism that defers the copy until the object in question is modified
163
234
and only if this object shares data with another object. This mechanism was added to
164
- following methods:
165
-
166
- - :meth: `DataFrame.reset_index ` / :meth: `Series.reset_index `
167
- - :meth: `DataFrame.set_index `
168
- - :meth: `DataFrame.set_axis ` / :meth: `Series.set_axis `
169
- - :meth: `DataFrame.set_flags ` / :meth: `Series.set_flags `
170
- - :meth: `DataFrame.rename_axis ` / :meth: `Series.rename_axis `
171
- - :meth: `DataFrame.reindex ` / :meth: `Series.reindex `
172
- - :meth: `DataFrame.reindex_like ` / :meth: `Series.reindex_like `
173
- - :meth: `DataFrame.assign `
174
- - :meth: `DataFrame.drop `
175
- - :meth: `DataFrame.dropna ` / :meth: `Series.dropna `
176
- - :meth: `DataFrame.select_dtypes `
177
- - :meth: `DataFrame.align ` / :meth: `Series.align `
178
- - :meth: `Series.to_frame `
179
- - :meth: `DataFrame.rename ` / :meth: `Series.rename `
180
- - :meth: `DataFrame.add_prefix ` / :meth: `Series.add_prefix `
181
- - :meth: `DataFrame.add_suffix ` / :meth: `Series.add_suffix `
182
- - :meth: `DataFrame.drop_duplicates ` / :meth: `Series.drop_duplicates `
183
- - :meth: `DataFrame.droplevel ` / :meth: `Series.droplevel `
184
- - :meth: `DataFrame.reorder_levels ` / :meth: `Series.reorder_levels `
185
- - :meth: `DataFrame.between_time ` / :meth: `Series.between_time `
186
- - :meth: `DataFrame.filter ` / :meth: `Series.filter `
187
- - :meth: `DataFrame.head ` / :meth: `Series.head `
188
- - :meth: `DataFrame.tail ` / :meth: `Series.tail `
189
- - :meth: `DataFrame.isetitem `
190
- - :meth: `DataFrame.pipe ` / :meth: `Series.pipe `
191
- - :meth: `DataFrame.pop ` / :meth: `Series.pop `
192
- - :meth: `DataFrame.replace ` / :meth: `Series.replace `
193
- - :meth: `DataFrame.shift ` / :meth: `Series.shift `
194
- - :meth: `DataFrame.sort_index ` / :meth: `Series.sort_index `
195
- - :meth: `DataFrame.sort_values ` / :meth: `Series.sort_values `
196
- - :meth: `DataFrame.squeeze ` / :meth: `Series.squeeze `
197
- - :meth: `DataFrame.swapaxes `
198
- - :meth: `DataFrame.swaplevel ` / :meth: `Series.swaplevel `
199
- - :meth: `DataFrame.take ` / :meth: `Series.take `
200
- - :meth: `DataFrame.to_timestamp ` / :meth: `Series.to_timestamp `
201
- - :meth: `DataFrame.to_period ` / :meth: `Series.to_period `
202
- - :meth: `DataFrame.truncate `
203
- - :meth: `DataFrame.iterrows `
204
- - :meth: `DataFrame.tz_convert ` / :meth: `Series.tz_localize `
205
- - :meth: `DataFrame.fillna ` / :meth: `Series.fillna `
206
- - :meth: `DataFrame.interpolate ` / :meth: `Series.interpolate `
207
- - :meth: `DataFrame.ffill ` / :meth: `Series.ffill `
208
- - :meth: `DataFrame.bfill ` / :meth: `Series.bfill `
209
- - :meth: `DataFrame.where ` / :meth: `Series.where `
210
- - :meth: `DataFrame.infer_objects ` / :meth: `Series.infer_objects `
211
- - :meth: `DataFrame.astype ` / :meth: `Series.astype `
212
- - :meth: `DataFrame.convert_dtypes ` / :meth: `Series.convert_dtypes `
213
- - :meth: `DataFrame.join `
214
- - :meth: `DataFrame.eval `
215
- - :func: `concat `
216
- - :func: `merge `
235
+ methods that don't require a copy of the underlying data. Popular examples are :meth: `DataFrame.drop ` for ``axis=1 ``
236
+ and :meth: `DataFrame.rename `.
217
237
218
238
These methods return views when Copy-on-Write is enabled, which provides a significant
219
239
performance improvement compared to the regular execution.
0 commit comments