Skip to content

DOC: Add section about copy on write and mutating objects #51788

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 15, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 30 additions & 5 deletions doc/source/user_guide/copy_on_write.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,6 @@
Copy-on-Write (CoW)
*******************

.. ipython:: python
:suppress:

pd.options.mode.copy_on_write = True

Copy-on-Write was first introduced in version 1.5.0. Starting from version 2.0 most of the
optimizations that become possible through CoW are implemented and supported. A complete list
can be found at :ref:`Copy-on-Write optimizations <copy_on_write.optimizations>`.
Expand All @@ -21,6 +16,36 @@ CoW will lead to more predictable behavior since it is not possible to update mo
one object with one statement, e.g. indexing operations or methods won't have side-effects. Additionally, through
delaying copies as long as possible, the average performance and memory usage will improve.

Previous behavior
-----------------

pandas indexing behavior is tricky to understand. Some operations return views while
other return copies. Depending on the result of the operation, mutation one object
might accidentally mutate another:

.. ipython:: python

df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
subset = df["foo"]
subset.iloc[0] = 100
df

Mutating ``subset``, e.g. updating its values, also updates ``df``. The exact behavior is
hard to predict. Copy-on-Write solves accidentally modifying more than one object,
it explicitly disallows this. With CoW enabled, ``df`` is unchanged:

.. ipython:: python

pd.options.mode.copy_on_write = True

df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
subset = df["foo"]
subset.iloc[0] = 100
df

The following sections will explain what this means and how it impacts existing
applications.

Description
-----------

Expand Down