Skip to content

Commit 0e6a4ef

Browse files
phoflcbpygit
authored andcommitted
DOC: Add whatsnew illustrating upcoming changes (pandas-dev#56545)
1 parent 5ffd1f1 commit 0e6a4ef

File tree

2 files changed

+85
-0
lines changed

2 files changed

+85
-0
lines changed

doc/source/user_guide/copy_on_write.rst

+2
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,8 @@ it explicitly disallows this. With CoW enabled, ``df`` is unchanged:
5252
The following sections will explain what this means and how it impacts existing
5353
applications.
5454

55+
.. _copy_on_write.migration_guide:
56+
5557
Migrating to Copy-on-Write
5658
--------------------------
5759

doc/source/whatsnew/v2.2.0.rst

+83
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,89 @@ including other versions of pandas.
99
{{ header }}
1010

1111
.. ---------------------------------------------------------------------------
12+
13+
.. _whatsnew_220.upcoming_changes:
14+
15+
Upcoming changes in pandas 3.0
16+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
17+
18+
pandas 3.0 will bring two bigger changes to the default behavior of pandas.
19+
20+
Copy-on-Write
21+
^^^^^^^^^^^^^
22+
23+
The currently optional mode Copy-on-Write will be enabled by default in pandas 3.0. There
24+
won't be an option to keep the current behavior enabled. The new behavioral semantics are
25+
explained in the :ref:`user guide about Copy-on-Write <copy_on_write>`.
26+
27+
The new behavior can be enabled since pandas 2.0 with the following option:
28+
29+
.. code-block:: ipython
30+
31+
pd.options.mode.copy_on_write = True
32+
33+
This change brings different changes in behavior in how pandas operates with respect to
34+
copies and views. Some of these changes allow a clear deprecation, like the changes in
35+
chained assignment. Other changes are more subtle and thus, the warnings are hidden behind
36+
an option that can be enabled in pandas 2.2.
37+
38+
.. code-block:: ipython
39+
40+
pd.options.mode.copy_on_write = "warn"
41+
42+
This mode will warn in many different scenarios that aren't actually relevant to
43+
most queries. We recommend exploring this mode, but it is not necessary to get rid
44+
of all of these warnings. The :ref:`migration guide <copy_on_write.migration_guide>`
45+
explains the upgrade process in more detail.
46+
47+
Dedicated string data type (backed by Arrow) by default
48+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
49+
50+
Historically, pandas represented string columns with NumPy object data type. This
51+
representation has numerous problems, including slow performance and a large memory
52+
footprint. This will change in pandas 3.0. pandas will start inferring string columns
53+
as a new ``string`` data type, backed by Arrow, which represents strings contiguous in memory. This brings
54+
a huge performance and memory improvement.
55+
56+
Old behavior:
57+
58+
.. code-block:: ipython
59+
60+
In [1]: ser = pd.Series(["a", "b"])
61+
Out[1]:
62+
0 a
63+
1 b
64+
dtype: object
65+
66+
New behavior:
67+
68+
69+
.. code-block:: ipython
70+
71+
In [1]: ser = pd.Series(["a", "b"])
72+
Out[1]:
73+
0 a
74+
1 b
75+
dtype: string
76+
77+
The string data type that is used in these scenarios will mostly behave as NumPy
78+
object would, including missing value semantics and general operations on these
79+
columns.
80+
81+
This change includes a few additional changes across the API:
82+
83+
- Currently, specifying ``dtype="string"`` creates a dtype that is backed by Python strings
84+
which are stored in a NumPy array. This will change in pandas 3.0, this dtype
85+
will create an Arrow backed string column.
86+
- The column names and the Index will also be backed by Arrow strings.
87+
- PyArrow will become a required dependency with pandas 3.0 to accommodate this change.
88+
89+
This future dtype inference logic can be enabled with:
90+
91+
.. code-block:: ipython
92+
93+
pd.options.future.infer_string = True
94+
1295
.. _whatsnew_220.enhancements:
1396

1497
Enhancements

0 commit comments

Comments
 (0)