@@ -14,6 +14,46 @@ including other versions of pandas.
14
14
Enhancements
15
15
~~~~~~~~~~~~
16
16
17
+ .. _whatsnew_210.enhancements.pyarrow_dependency :
18
+
19
+ PyArrow will become a required dependency with pandas 3.0
20
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21
+
22
+ `PyArrow <https://arrow.apache.org/docs/python/index.html >`_ will become a required
23
+ dependency of pandas starting with pandas 3.0. This decision was made based on
24
+ `PDEP 12 <https://pandas.pydata.org/pdeps/0010-required-pyarrow-dependency.html >`_.
25
+
26
+ This will enable more changes that are hugely beneficial to pandas users, including
27
+ but not limited to:
28
+
29
+ - inferring strings as PyArrow backed strings by default enabling a significant
30
+ reduction of the memory footprint and huge performance improvements.
31
+ - inferring more complex dtypes with PyArrow by default, like ``Decimal ``, ``lists ``,
32
+ ``bytes ``, ``structured data `` and more.
33
+ - Better interoperability with other libraries that depend on Apache Arrow.
34
+
35
+ We are collecting feedback on this decision `here <https://github.com/pandas-dev/pandas/issues/54466 >`_.
36
+
37
+ .. _whatsnew_210.enhancements.infer_strings :
38
+
39
+ Avoid NumPy object dtype for strings by default
40
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
41
+
42
+ Previously, all strings were stored in columns with NumPy object dtype.
43
+ This release introduces an option ``future.infer_string `` that infers all
44
+ strings as PyArrow backed strings with dtype ``pd.ArrowDtype(pa.string()) `` instead.
45
+ This option only works if PyArrow is installed. PyArrow backed strings have a
46
+ significantly reduced memory footprint and provide a big performance improvement
47
+ compared to NumPy object.
48
+
49
+ The option can be enabled with:
50
+
51
+ .. code-block :: python
52
+
53
+ pd.options.future.infer_string = True
54
+
55
+ This behavior will become the default with pandas 3.0.
56
+
17
57
.. _whatsnew_210.enhancements.reduction_extension_dtypes :
18
58
19
59
DataFrame reductions preserve extension dtypes
0 commit comments