Skip to content

Commit 7b8c6f6

Browse files
authored
DOC: Adjust user guide for CoW docs (#55337)
1 parent 68e3c4b commit 7b8c6f6

File tree

1 file changed

+75
-55
lines changed

1 file changed

+75
-55
lines changed

doc/source/user_guide/copy_on_write.rst

+75-55
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@ Copy-on-Write (CoW)
77
*******************
88

99
Copy-on-Write was first introduced in version 1.5.0. Starting from version 2.0 most of the
10-
optimizations that become possible through CoW are implemented and supported. A complete list
11-
can be found at :ref:`Copy-on-Write optimizations <copy_on_write.optimizations>`.
10+
optimizations that become possible through CoW are implemented and supported. All possible
11+
optimizations are supported starting from pandas 2.1.
1212

1313
We expect that CoW will be enabled by default in version 3.0.
1414

@@ -154,66 +154,86 @@ With copy on write this can be done by using ``loc``.
154154
155155
df.loc[df["bar"] > 5, "foo"] = 100
156156
157+
Read-only NumPy arrays
158+
----------------------
159+
160+
Accessing the underlying NumPy array of a DataFrame will return a read-only array if the array
161+
shares data with the initial DataFrame:
162+
163+
The array is a copy if the initial DataFrame consists of more than one array:
164+
165+
166+
.. ipython:: python
167+
168+
df = pd.DataFrame({"a": [1, 2], "b": [1.5, 2.5]})
169+
df.to_numpy()
170+
171+
The array shares data with the DataFrame if the DataFrame consists of only one NumPy array:
172+
173+
.. ipython:: python
174+
175+
df = pd.DataFrame({"a": [1, 2], "b": [3, 4]})
176+
df.to_numpy()
177+
178+
This array is read-only, which means that it can't be modified inplace:
179+
180+
.. ipython:: python
181+
:okexcept:
182+
183+
arr = df.to_numpy()
184+
arr[0, 0] = 100
185+
186+
The same holds true for a Series, since a Series always consists of a single array.
187+
188+
There are two potential solution to this:
189+
190+
- Trigger a copy manually if you want to avoid updating DataFrames that share memory with your array.
191+
- Make the array writeable. This is a more performant solution but circumvents Copy-on-Write rules, so
192+
it should be used with caution.
193+
194+
.. ipython:: python
195+
196+
arr = df.to_numpy()
197+
arr.flags.writeable = True
198+
arr[0, 0] = 100
199+
arr
200+
201+
Patterns to avoid
202+
-----------------
203+
204+
No defensive copy will be performed if two objects share the same data while
205+
you are modifying one object inplace.
206+
207+
.. ipython:: python
208+
209+
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
210+
df2 = df.reset_index()
211+
df2.iloc[0, 0] = 100
212+
213+
This creates two objects that share data and thus the setitem operation will trigger a
214+
copy. This is not necessary if the initial object ``df`` isn't needed anymore.
215+
Simply reassigning to the same variable will invalidate the reference that is
216+
held by the object.
217+
218+
.. ipython:: python
219+
220+
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
221+
df = df.reset_index()
222+
df.iloc[0, 0] = 100
223+
224+
No copy is necessary in this example.
225+
Creating multiple references keeps unnecessary references alive
226+
and thus will hurt performance with Copy-on-Write.
227+
157228
.. _copy_on_write.optimizations:
158229

159230
Copy-on-Write optimizations
160231
---------------------------
161232

162233
A new lazy copy mechanism that defers the copy until the object in question is modified
163234
and only if this object shares data with another object. This mechanism was added to
164-
following methods:
165-
166-
- :meth:`DataFrame.reset_index` / :meth:`Series.reset_index`
167-
- :meth:`DataFrame.set_index`
168-
- :meth:`DataFrame.set_axis` / :meth:`Series.set_axis`
169-
- :meth:`DataFrame.set_flags` / :meth:`Series.set_flags`
170-
- :meth:`DataFrame.rename_axis` / :meth:`Series.rename_axis`
171-
- :meth:`DataFrame.reindex` / :meth:`Series.reindex`
172-
- :meth:`DataFrame.reindex_like` / :meth:`Series.reindex_like`
173-
- :meth:`DataFrame.assign`
174-
- :meth:`DataFrame.drop`
175-
- :meth:`DataFrame.dropna` / :meth:`Series.dropna`
176-
- :meth:`DataFrame.select_dtypes`
177-
- :meth:`DataFrame.align` / :meth:`Series.align`
178-
- :meth:`Series.to_frame`
179-
- :meth:`DataFrame.rename` / :meth:`Series.rename`
180-
- :meth:`DataFrame.add_prefix` / :meth:`Series.add_prefix`
181-
- :meth:`DataFrame.add_suffix` / :meth:`Series.add_suffix`
182-
- :meth:`DataFrame.drop_duplicates` / :meth:`Series.drop_duplicates`
183-
- :meth:`DataFrame.droplevel` / :meth:`Series.droplevel`
184-
- :meth:`DataFrame.reorder_levels` / :meth:`Series.reorder_levels`
185-
- :meth:`DataFrame.between_time` / :meth:`Series.between_time`
186-
- :meth:`DataFrame.filter` / :meth:`Series.filter`
187-
- :meth:`DataFrame.head` / :meth:`Series.head`
188-
- :meth:`DataFrame.tail` / :meth:`Series.tail`
189-
- :meth:`DataFrame.isetitem`
190-
- :meth:`DataFrame.pipe` / :meth:`Series.pipe`
191-
- :meth:`DataFrame.pop` / :meth:`Series.pop`
192-
- :meth:`DataFrame.replace` / :meth:`Series.replace`
193-
- :meth:`DataFrame.shift` / :meth:`Series.shift`
194-
- :meth:`DataFrame.sort_index` / :meth:`Series.sort_index`
195-
- :meth:`DataFrame.sort_values` / :meth:`Series.sort_values`
196-
- :meth:`DataFrame.squeeze` / :meth:`Series.squeeze`
197-
- :meth:`DataFrame.swapaxes`
198-
- :meth:`DataFrame.swaplevel` / :meth:`Series.swaplevel`
199-
- :meth:`DataFrame.take` / :meth:`Series.take`
200-
- :meth:`DataFrame.to_timestamp` / :meth:`Series.to_timestamp`
201-
- :meth:`DataFrame.to_period` / :meth:`Series.to_period`
202-
- :meth:`DataFrame.truncate`
203-
- :meth:`DataFrame.iterrows`
204-
- :meth:`DataFrame.tz_convert` / :meth:`Series.tz_localize`
205-
- :meth:`DataFrame.fillna` / :meth:`Series.fillna`
206-
- :meth:`DataFrame.interpolate` / :meth:`Series.interpolate`
207-
- :meth:`DataFrame.ffill` / :meth:`Series.ffill`
208-
- :meth:`DataFrame.bfill` / :meth:`Series.bfill`
209-
- :meth:`DataFrame.where` / :meth:`Series.where`
210-
- :meth:`DataFrame.infer_objects` / :meth:`Series.infer_objects`
211-
- :meth:`DataFrame.astype` / :meth:`Series.astype`
212-
- :meth:`DataFrame.convert_dtypes` / :meth:`Series.convert_dtypes`
213-
- :meth:`DataFrame.join`
214-
- :meth:`DataFrame.eval`
215-
- :func:`concat`
216-
- :func:`merge`
235+
methods that don't require a copy of the underlying data. Popular examples are :meth:`DataFrame.drop` for ``axis=1``
236+
and :meth:`DataFrame.rename`.
217237

218238
These methods return views when Copy-on-Write is enabled, which provides a significant
219239
performance improvement compared to the regular execution.

0 commit comments

Comments
 (0)