pandas-dev · phofl · Mar 1, 2023 · Feb 17, 2023 · Feb 17, 2023 · Feb 17, 2023
@@ -1,4 +1,4 @@
-.. _copy_on_write:
+.. _copy_on_write_dev:
 
 {{ header }}
 
@@ -9,7 +9,8 @@ Copy on write
 Copy on Write is a mechanism to simplify the indexing API and improve
 performance through avoiding copies if possible.
 CoW means that any DataFrame or Series derived from another in any way always
-behaves as a copy.
+behaves as a copy. An explanation on how to use Copy on Write efficiently can be
+found :ref:`here <copy_on_write>`.
 
 Reference tracking
 ------------------

diff --git a/doc/source/user_guide/copy_on_write.rst b/doc/source/user_guide/copy_on_write.rst
@@ -0,0 +1,209 @@
+.. _copy_on_write:
+
+{{ header }}
+
+*******************
+Copy-on-Write (CoW)
+*******************
+
+.. ipython:: python
+    :suppress:
+
+    pd.options.mode.copy_on_write = True
+
+Copy-on-Write was first introduced in version 1.5.0. Starting from version 2.0 most of the
+optimizations that become possible through CoW are implemented and supported. A complete list
+can be found at :ref:`Copy-on-Write optimizations <copy_on_write.optimizations>`.
+
+We expect that CoW will be enabled by default in version 3.0.
+
+CoW will lead to more predictable behavior since it is not possible to update more than
+one object with one statement, e.g. indexing operations or methods won't have side-effects. Additionally, through
+delaying copies as long as possible, the average performance and memory usage will improve.
+
+Description
+-----------
+
+CoW means that any DataFrame or Series derived from another in any way always
+behaves as a copy. As a consequence, we can only change the values of an object
+through modifying the object itself. CoW disallows updating a DataFrame or a Series
+that shares data with another DataFrame or Series object inplace.
+
+This avoids side-effects when modifying values and hence, most methods can avoid
+actually copying the data and only trigger a copy when necessary.
+
+The following example will operate inplace with CoW:
+
+.. ipython:: python
+
+    df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
+    df.iloc[0, 0] = 100
+    df
+
+The object ``df`` does not share any data with any other object and hence no
+copy is triggered when updating the values. In contrast, the following operation
+triggers a copy of the data under CoW:
+
+
+.. ipython:: python
+
+    df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
+    df2 = df.reset_index(drop=True)
+    df2.iloc[0, 0] = 100
+
+    df
+    df2
+
+``reset_index`` returns a lazy copy with CoW while it copies the data without CoW.
+Since both objects, ``df`` and ``df2`` share the same data, a copy is triggered
+when modifying ``df2``. The object ``df`` still has the same values as initially
+while ``df2`` was modified.
+
+If the object ``df`` isn't needed anymore after performing the ``reset_index`` operation,
+you can emulate an inplace-like operation through assigning the output of ``reset_index``
+to the same variable:
+
+.. ipython:: python
+
+    df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
+    df = df.reset_index(drop=True)
+    df.iloc[0, 0] = 100
+    df
+
+The initial object gets out of scope as soon as the result of ``reset_index`` is
+reassigned and hence ``df`` does not share data with any other object. No copy
+is necessary when modifying the object. This is generally true for all methods
+listed in :ref:`Copy-on-Write optimizations <copy_on_write.optimizations>`.
+
+Previously, when operating on views, the view and the parent object was modified:
+
+.. ipython:: python
+
+    with pd.option_context("mode.copy_on_write", False):
+        df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
+        view = df[:]
+        df.iloc[0, 0] = 100
+
+        df
+        view
+
+CoW triggers a copy when ``df`` is changed to avoid mutating ``view`` as well:
+
+.. ipython:: python
+
+    df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
+    view = df[:]
+    df.iloc[0, 0] = 100
+
+    df
+    view
+
+Chained Assignment
+------------------
+
+Chained assignment references a technique where an object is updated through
+two subsequent indexing operations, e.g.
+
+.. ipython:: python
+
+    with pd.option_context("mode.copy_on_write", False):
+        df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
+        df["foo"][df["bar"] > 5] = 100
+        df
+
+The column ``foo`` is updated where the column ``bar`` is greater than 5.
+This violates the CoW principles though, because it would have to modify the
+view ``df["foo"]`` and ``df`` in one step. Hence, chained assignment will
+consistently never work and raise a ``ChainedAssignmentError`` with CoW enabled:
+
+.. ipython:: python
+    :okexcept:
+
+    df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
+    df["foo"][df["bar"] > 5] = 100
+
+With copy on write this can be done by using ``loc``.
+
+.. ipython:: python
+
+    df.loc[df["bar"] > 5, "foo"] = 100
+
+.. _copy_on_write.optimizations:
+
+Copy-on-Write optimizations
+---------------------------
+
+A new lazy copy mechanism that defers the copy until the object in question is modified
+and only if this object shares data with another object. This mechanism was added to
+following methods:
+
+  - :meth:`DataFrame.reset_index` / :meth:`Series.reset_index`
+  - :meth:`DataFrame.set_index`
+  - :meth:`DataFrame.set_axis` / :meth:`Series.set_axis`
+  - :meth:`DataFrame.set_flags` / :meth:`Series.set_flags`
+  - :meth:`DataFrame.rename_axis` / :meth:`Series.rename_axis`
+  - :meth:`DataFrame.reindex` / :meth:`Series.reindex`
+  - :meth:`DataFrame.reindex_like` / :meth:`Series.reindex_like`
+  - :meth:`DataFrame.assign`
+  - :meth:`DataFrame.drop`
+  - :meth:`DataFrame.dropna` / :meth:`Series.dropna`
+  - :meth:`DataFrame.select_dtypes`
+  - :meth:`DataFrame.align` / :meth:`Series.align`
+  - :meth:`Series.to_frame`
+  - :meth:`DataFrame.rename` / :meth:`Series.rename`
+  - :meth:`DataFrame.add_prefix` / :meth:`Series.add_prefix`
+  - :meth:`DataFrame.add_suffix` / :meth:`Series.add_suffix`
+  - :meth:`DataFrame.drop_duplicates` / :meth:`Series.drop_duplicates`
+  - :meth:`DataFrame.droplevel` / :meth:`Series.droplevel`
+  - :meth:`DataFrame.reorder_levels` / :meth:`Series.reorder_levels`
+  - :meth:`DataFrame.between_time` / :meth:`Series.between_time`
+  - :meth:`DataFrame.filter` / :meth:`Series.filter`
+  - :meth:`DataFrame.head` / :meth:`Series.head`
+  - :meth:`DataFrame.tail` / :meth:`Series.tail`
+  - :meth:`DataFrame.isetitem`
+  - :meth:`DataFrame.pipe` / :meth:`Series.pipe`
+  - :meth:`DataFrame.pop` / :meth:`Series.pop`
+  - :meth:`DataFrame.replace` / :meth:`Series.replace`
+  - :meth:`DataFrame.shift` / :meth:`Series.shift`
+  - :meth:`DataFrame.sort_index` / :meth:`Series.sort_index`
+  - :meth:`DataFrame.sort_values` / :meth:`Series.sort_values`
+  - :meth:`DataFrame.squeeze` / :meth:`Series.squeeze`
+  - :meth:`DataFrame.swapaxes`
+  - :meth:`DataFrame.swaplevel` / :meth:`Series.swaplevel`
+  - :meth:`DataFrame.take` / :meth:`Series.take`
+  - :meth:`DataFrame.to_timestamp` / :meth:`Series.to_timestamp`
+  - :meth:`DataFrame.to_period` / :meth:`Series.to_period`
+  - :meth:`DataFrame.truncate`
+  - :meth:`DataFrame.iterrows`
+  - :meth:`DataFrame.tz_convert` / :meth:`Series.tz_localize`
+  - :meth:`DataFrame.fillna` / :meth:`Series.fillna`
+  - :meth:`DataFrame.interpolate` / :meth:`Series.interpolate`
+  - :meth:`DataFrame.ffill` / :meth:`Series.ffill`
+  - :meth:`DataFrame.bfill` / :meth:`Series.bfill`
+  - :meth:`DataFrame.where` / :meth:`Series.where`
+  - :meth:`DataFrame.infer_objects` / :meth:`Series.infer_objects`
+  - :meth:`DataFrame.astype` / :meth:`Series.astype`
+  - :meth:`DataFrame.convert_dtypes` / :meth:`Series.convert_dtypes`
+  - :meth:`DataFrame.join`
+  - :func:`concat`
+  - :func:`merge`
+
+These methods return views when Copy-on-Write is enabled, which provides a significant
+performance improvement compared to the regular execution.
+
+How to enable CoW
+-----------------
+
+Copy-on-Write can be enabled through the configuration option ``copy_on_write``. The option can
+be turned on __globally__ through either of the following:
+
+.. ipython:: python
+
+    pd.set_option("mode.copy_on_write", True)
+
+    pd.options.mode.copy_on_write = True
+
+.. ipython:: python
+    :suppress:
+
+    pd.options.mode.copy_on_write = False
diff --git a/doc/source/user_guide/index.rst b/doc/source/user_guide/index.rst
@@ -67,6 +67,7 @@ Guides
     pyarrow
     indexing
     advanced
+    copy_on_write
     merging
     reshaping
     text

diff --git a/doc/source/whatsnew/v2.0.0.rst b/doc/source/whatsnew/v2.0.0.rst
@@ -183,57 +183,8 @@ Copy-on-Write improvements
 ^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 - A new lazy copy mechanism that defers the copy until the object in question is modified
-  was added to the following methods:
-
-  - :meth:`DataFrame.reset_index` / :meth:`Series.reset_index`
-  - :meth:`DataFrame.set_index`
-  - :meth:`DataFrame.set_axis` / :meth:`Series.set_axis`
-  - :meth:`DataFrame.set_flags` / :meth:`Series.set_flags`
-  - :meth:`DataFrame.rename_axis` / :meth:`Series.rename_axis`
-  - :meth:`DataFrame.reindex` / :meth:`Series.reindex`
-  - :meth:`DataFrame.reindex_like` / :meth:`Series.reindex_like`
-  - :meth:`DataFrame.assign`
-  - :meth:`DataFrame.drop`
-  - :meth:`DataFrame.dropna` / :meth:`Series.dropna`
-  - :meth:`DataFrame.select_dtypes`
-  - :meth:`DataFrame.align` / :meth:`Series.align`
-  - :meth:`Series.to_frame`
-  - :meth:`DataFrame.rename` / :meth:`Series.rename`
-  - :meth:`DataFrame.add_prefix` / :meth:`Series.add_prefix`
-  - :meth:`DataFrame.add_suffix` / :meth:`Series.add_suffix`
-  - :meth:`DataFrame.drop_duplicates` / :meth:`Series.drop_duplicates`
-  - :meth:`DataFrame.droplevel` / :meth:`Series.droplevel`
-  - :meth:`DataFrame.reorder_levels` / :meth:`Series.reorder_levels`
-  - :meth:`DataFrame.between_time` / :meth:`Series.between_time`
-  - :meth:`DataFrame.filter` / :meth:`Series.filter`
-  - :meth:`DataFrame.head` / :meth:`Series.head`
-  - :meth:`DataFrame.tail` / :meth:`Series.tail`
-  - :meth:`DataFrame.isetitem`
-  - :meth:`DataFrame.pipe` / :meth:`Series.pipe`
-  - :meth:`DataFrame.pop` / :meth:`Series.pop`
-  - :meth:`DataFrame.replace` / :meth:`Series.replace`
-  - :meth:`DataFrame.shift` / :meth:`Series.shift`
-  - :meth:`DataFrame.sort_index` / :meth:`Series.sort_index`
-  - :meth:`DataFrame.sort_values` / :meth:`Series.sort_values`
-  - :meth:`DataFrame.squeeze` / :meth:`Series.squeeze`
-  - :meth:`DataFrame.swapaxes`
-  - :meth:`DataFrame.swaplevel` / :meth:`Series.swaplevel`
-  - :meth:`DataFrame.take` / :meth:`Series.take`
-  - :meth:`DataFrame.to_timestamp` / :meth:`Series.to_timestamp`
-  - :meth:`DataFrame.to_period` / :meth:`Series.to_period`
-  - :meth:`DataFrame.truncate`
-  - :meth:`DataFrame.iterrows`
-  - :meth:`DataFrame.tz_convert` / :meth:`Series.tz_localize`
-  - :meth:`DataFrame.fillna` / :meth:`Series.fillna`
-  - :meth:`DataFrame.interpolate` / :meth:`Series.interpolate`
-  - :meth:`DataFrame.ffill` / :meth:`Series.ffill`
-  - :meth:`DataFrame.bfill` / :meth:`Series.bfill`
-  - :meth:`DataFrame.where` / :meth:`Series.where`
-  - :meth:`DataFrame.infer_objects` / :meth:`Series.infer_objects`
-  - :meth:`DataFrame.astype` / :meth:`Series.astype`
-  - :meth:`DataFrame.convert_dtypes` / :meth:`Series.convert_dtypes`
-  - :func:`concat`
-
+  was added to the methods listed in
+  :ref:`Copy-on-Write optimizations <copy_on_write.optimizations>`.
   These methods return views when Copy-on-Write is enabled, which provides a significant
   performance improvement compared to the regular execution (:issue:`49473`).