Skip to content

Commit 031c1db

Browse files
phoflmroeschke
andcommitted
Add whatsnew for arrow (pandas-dev#54476)
* Add whatsnew for arrow * Update * Update doc/source/whatsnew/v2.1.0.rst Co-authored-by: Matthew Roeschke <[email protected]> * Update doc/source/whatsnew/v2.1.0.rst Co-authored-by: Matthew Roeschke <[email protected]> * Update doc/source/whatsnew/v2.1.0.rst Co-authored-by: Matthew Roeschke <[email protected]> * Update doc/source/whatsnew/v2.1.0.rst Co-authored-by: Matthew Roeschke <[email protected]> * Update doc/source/whatsnew/v2.1.0.rst Co-authored-by: Matthew Roeschke <[email protected]> --------- Co-authored-by: Matthew Roeschke <[email protected]>
1 parent 5722362 commit 031c1db

File tree

1 file changed

+40
-0
lines changed

1 file changed

+40
-0
lines changed

doc/source/whatsnew/v2.1.0.rst

+40
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,46 @@ including other versions of pandas.
1414
Enhancements
1515
~~~~~~~~~~~~
1616

17+
.. _whatsnew_210.enhancements.pyarrow_dependency:
18+
19+
PyArrow will become a required dependency with pandas 3.0
20+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21+
22+
`PyArrow <https://arrow.apache.org/docs/python/index.html>`_ will become a required
23+
dependency of pandas starting with pandas 3.0. This decision was made based on
24+
`PDEP 12 <https://pandas.pydata.org/pdeps/0010-required-pyarrow-dependency.html>`_.
25+
26+
This will enable more changes that are hugely beneficial to pandas users, including
27+
but not limited to:
28+
29+
- inferring strings as PyArrow backed strings by default enabling a significant
30+
reduction of the memory footprint and huge performance improvements.
31+
- inferring more complex dtypes with PyArrow by default, like ``Decimal``, ``lists``,
32+
``bytes``, ``structured data`` and more.
33+
- Better interoperability with other libraries that depend on Apache Arrow.
34+
35+
We are collecting feedback on this decision `here <https://github.com/pandas-dev/pandas/issues/54466>`_.
36+
37+
.. _whatsnew_210.enhancements.infer_strings:
38+
39+
Avoid NumPy object dtype for strings by default
40+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
41+
42+
Previously, all strings were stored in columns with NumPy object dtype.
43+
This release introduces an option ``future.infer_string`` that infers all
44+
strings as PyArrow backed strings with dtype ``pd.ArrowDtype(pa.string())`` instead.
45+
This option only works if PyArrow is installed. PyArrow backed strings have a
46+
significantly reduced memory footprint and provide a big performance improvement
47+
compared to NumPy object.
48+
49+
The option can be enabled with:
50+
51+
.. code-block:: python
52+
53+
pd.options.future.infer_string = True
54+
55+
This behavior will become the default with pandas 3.0.
56+
1757
.. _whatsnew_210.enhancements.reduction_extension_dtypes:
1858

1959
DataFrame reductions preserve extension dtypes

0 commit comments

Comments
 (0)