-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DOC: Start migration guide for Copy-on-Write #56298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
5810dbb
bbd6824
b91d676
b47bfd2
8d2ea1e
10ea0eb
5ef8b68
ae8ec25
8581b11
ab8c2b9
4d39170
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,7 +16,7 @@ Copy-on-Write was first introduced in version 1.5.0. Starting from version 2.0 m | |
optimizations that become possible through CoW are implemented and supported. All possible | ||
optimizations are supported starting from pandas 2.1. | ||
|
||
We expect that CoW will be enabled by default in version 3.0. | ||
CoW will be enabled by default in version 3.0. | ||
|
||
CoW will lead to more predictable behavior since it is not possible to update more than | ||
one object with one statement, e.g. indexing operations or methods won't have side-effects. Additionally, through | ||
|
@@ -52,6 +52,74 @@ it explicitly disallows this. With CoW enabled, ``df`` is unchanged: | |
The following sections will explain what this means and how it impacts existing | ||
applications. | ||
|
||
Migrating to Copy-on-Write | ||
-------------------------- | ||
|
||
Copy-on-Write will be the default and only mode in pandas 3.0. This means that users | ||
need to migrate their code to be compliant with CoW rules. | ||
|
||
The default mode in pandas will raises warnings for certain cases that will actively | ||
phofl marked this conversation as resolved.
Show resolved
Hide resolved
|
||
change behavior and thus change user intended behavior. | ||
|
||
We added another mode, e.g. | ||
|
||
``` | ||
pd.options.mode.copy_on_write = "warn" | ||
``` | ||
phofl marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
that will warn for every operation that will change behavior with CoW. We expect this mode | ||
to be very noisy, since many cases that we don't expect that they will influence users will | ||
also emit a warning. We recommend checking this mode and analyzing the warnings, but it is | ||
not necessary to address all of these warning. The first two items of the following lists | ||
are the only cases that need to be addressed to make existing code work with CoW. | ||
|
||
The following few items describe the user visible changes: | ||
|
||
**ChainedAssignment will never work** | ||
phofl marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
``loc`` should be used as an alternative. Check the | ||
:ref:`chained assignment section <copy_on_write_chained_assignment>` for more details. | ||
|
||
**Accessing the underlying array of a pandas object will return a read-only view** | ||
|
||
|
||
.. ipython:: python | ||
|
||
ser = pd.Series([1, 2, 3]) | ||
ser.to_numpy() | ||
|
||
This example returns a NumPy array that is a view of the Series object. This view can | ||
be modified and thus also modify the pandas object. This is not compliant with CoW | ||
rules. The returned array is set to non-writeable to protect against this behavior. | ||
Creating a copy of this array allows modification. You can also make the array | ||
writeable again if you don't care about the pandas object anymore. | ||
|
||
**Only one pandas object is updated at once** | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The code example you use is already a practical example of it, but I wonder if we should explicitly call out "modifying a column as a Series no longer works"? As I think this will be one of the main use cases where right now the user intended the propagation of the modification (while with dataframe row slices, I think that is much less the case) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep added |
||
|
||
The following code snippet updates both ``df`` and ``view`` without CoW: | ||
phofl marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
.. ipython:: python | ||
|
||
df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]}) | ||
subset = df["foo"] | ||
subset.iloc[0] = 100 | ||
dfr | ||
phofl marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
This won't be possible anymore with CoW, since the CoW rules explicitly forbid this. | ||
This statement can be rewritten into a single statement with ``loc`` or ``iloc`` if | ||
this behavior is necessary. :meth:`DataFrame.where` is another suitable alternative | ||
for this case. | ||
|
||
**Constructors now copy NumPy arrays by default** | ||
|
||
The Series and DataFrame constructors will now copy NumPy array by default when not | ||
otherwise specified. This was changed to avoid mutating a pandas object when the | ||
NumPy array is changed inplace outside of pandas. You can set ``copy=False`` to | ||
avoid this copy. | ||
|
||
See the section about :ref:`read-only NumPy arrays <copy_on_write_read_only_na>` | ||
for more details. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This "see also" probably belongs in the section above "Accessing the underlying array of a pandas object will return a read-only view" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah good point |
||
|
||
Description | ||
----------- | ||
|
||
|
@@ -163,6 +231,8 @@ With copy on write this can be done by using ``loc``. | |
|
||
df.loc[df["bar"] > 5, "foo"] = 100 | ||
|
||
.. _copy_on_write_read_only_na: | ||
|
||
Read-only NumPy arrays | ||
---------------------- | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should move the migration guide further in the file, as right now this section assumes somewhat knowledge about what CoW is, but that is only explained "Description" section that now comes afterwards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had similar thoughts, but I wanted to put it as prominent as possible