{{ header }}
.. ipython:: python :suppress: pd.options.mode.copy_on_write = True
Copy-on-Write was first introduced in version 1.5.0. Starting from version 2.0 most of the optimizations that become possible through CoW are implemented and supported. A complete list can be found at TODO
We expect that CoW will be enabled per default in version 3.0
CoW means that any DataFrame or Series derived from another in any way always behaves as a copy. As a consequence, we can only change the values of an object through modifying the object itself. CoW disallows updating a DataFrame or a Series that shares data with another DataFrame or Series object inplace.
This avoids side-effects when modifying values and hence, most methods can avoid actually copying the data and only trigger a copy when necessary.
The following example will operate inplace with CoW:
.. ipython:: python df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]}) df.iloc[0, 0] = 100 df
The object df
does not share any data with any other object and hence no
copy is triggered when updating the values. In contrast, the following operation
triggers a copy of the data under CoW:
.. ipython:: python df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]}) df2 = df.reset_index(drop=True) df2.iloc[0, 0] = 100 df df2
reset_index
returns a lazy copy with CoW while it copies the data without CoW.
Since both objects, df
and df2
share the same data, a copy is triggered
when modifying df2
. The object df
still has the same values as initially
while df2
was modified.
If the object df
isn't needed anymore after performing the reset_index
operation,
you can emulate an inplace-like operation through assigning the output of reset_index
to the same variable:
.. ipython:: python df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]}) df = df.reset_index(drop=True) df.iloc[0, 0] = 100 df
The initial object gets out of scope as soon as the result of reset_index
is
reassigned and hence df
does not share data with any other object. No copy
is necessary when modifying the object. This is generally true for all methods
listed in :ref:`Copy-on-Write optimizations <copy_on_write.optimizations>`.
Chained assignment references a technique where an object is updated through two subsequent indexing operations, e.g.
.. ipython:: python with pd.option_context("mode.copy_on_write", False): df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]}) df["foo"][df["bar"] > 5] = 100 df
The column foo
is updated where the column bar
is greater than 5.
This violates the CoW principles though, because it would habe to modify the
view df["foo"]
and df
in one step. Hence, chained assignment will
consistently never work and raise a ChainedAssignmentError
with CoW enabled:
.. ipython:: python :okexcept: df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]}) df["foo"][df["bar"] > 5] = 100
A new lazy copy mechanism that defers the copy until the object in question is modified and only if this object shares data with another object. This mechanism was added to following methods:
- :meth:`DataFrame.reset_index` / :meth:`Series.reset_index`
- :meth:`DataFrame.set_index`
- :meth:`DataFrame.set_axis` / :meth:`Series.set_axis`
- :meth:`DataFrame.set_flags` / :meth:`Series.set_flags`
- :meth:`DataFrame.rename_axis` / :meth:`Series.rename_axis`
- :meth:`DataFrame.reindex` / :meth:`Series.reindex`
- :meth:`DataFrame.reindex_like` / :meth:`Series.reindex_like`
- :meth:`DataFrame.assign`
- :meth:`DataFrame.drop`
- :meth:`DataFrame.dropna` / :meth:`Series.dropna`
- :meth:`DataFrame.select_dtypes`
- :meth:`DataFrame.align` / :meth:`Series.align`
- :meth:`Series.to_frame`
- :meth:`DataFrame.rename` / :meth:`Series.rename`
- :meth:`DataFrame.add_prefix` / :meth:`Series.add_prefix`
- :meth:`DataFrame.add_suffix` / :meth:`Series.add_suffix`
- :meth:`DataFrame.drop_duplicates` / :meth:`Series.drop_duplicates`
- :meth:`DataFrame.droplevel` / :meth:`Series.droplevel`
- :meth:`DataFrame.reorder_levels` / :meth:`Series.reorder_levels`
- :meth:`DataFrame.between_time` / :meth:`Series.between_time`
- :meth:`DataFrame.filter` / :meth:`Series.filter`
- :meth:`DataFrame.head` / :meth:`Series.head`
- :meth:`DataFrame.tail` / :meth:`Series.tail`
- :meth:`DataFrame.isetitem`
- :meth:`DataFrame.pipe` / :meth:`Series.pipe`
- :meth:`DataFrame.pop` / :meth:`Series.pop`
- :meth:`DataFrame.replace` / :meth:`Series.replace`
- :meth:`DataFrame.shift` / :meth:`Series.shift`
- :meth:`DataFrame.sort_index` / :meth:`Series.sort_index`
- :meth:`DataFrame.sort_values` / :meth:`Series.sort_values`
- :meth:`DataFrame.squeeze` / :meth:`Series.squeeze`
- :meth:`DataFrame.swapaxes`
- :meth:`DataFrame.swaplevel` / :meth:`Series.swaplevel`
- :meth:`DataFrame.take` / :meth:`Series.take`
- :meth:`DataFrame.to_timestamp` / :meth:`Series.to_timestamp`
- :meth:`DataFrame.to_period` / :meth:`Series.to_period`
- :meth:`DataFrame.truncate`
- :meth:`DataFrame.iterrows`
- :meth:`DataFrame.tz_convert` / :meth:`Series.tz_localize`
- :meth:`DataFrame.fillna` / :meth:`Series.fillna`
- :meth:`DataFrame.interpolate` / :meth:`Series.interpolate`
- :meth:`DataFrame.ffill` / :meth:`Series.ffill`
- :meth:`DataFrame.bfill` / :meth:`Series.bfill`
- :meth:`DataFrame.where` / :meth:`Series.where`
- :meth:`DataFrame.infer_objects` / :meth:`Series.infer_objects`
- :meth:`DataFrame.astype` / :meth:`Series.astype`
- :meth:`DataFrame.convert_dtypes` / :meth:`Series.convert_dtypes`
- :meth:`DataFrame.join`
- :func:`concat`
- :func:`merge`
These methods return views when Copy-on-Write is enabled, which provides a significant performance improvement compared to the regular execution.
Copy-on-Write can be enabled through the configuration option copy_on_write
. The option can
be turned on __globally__ through either of the following:
.. ipython:: python pd.set_option("mode.copy_on_write", True) pd.options.mode.copy_on_write = True
Alternatively, CoW can be enabled locally for testing purposes through:
.. ipython:: python with pd.option_context("mode.copy_on_write", True): ...
.. ipython:: python :suppress: pd.options.mode.copy_on_write = False