Skip to content

Commit 98eb85a

Browse files
committed
PDEP-10: Change status to rejected
1 parent 3e65640 commit 98eb85a

File tree

1 file changed

+40
-3
lines changed

1 file changed

+40
-3
lines changed

web/pandas/pdeps/0010-required-pyarrow-dependency.md

+40-3
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,48 @@
11
# PDEP-10: PyArrow as a required dependency for default string inference implementation
22

3-
- Created: 17 April 2023
4-
- Status: Accepted
3+
- Created: 17 April 2023 (updated May 8, 2024)
4+
- Status: Rejected
55
- Discussion: [#52711](https://github.com/pandas-dev/pandas/pull/52711)
66
[#52509](https://github.com/pandas-dev/pandas/issues/52509)
77
- Author: [Matthew Roeschke](https://github.com/mroeschke)
88
[Patrick Hoefler](https://github.com/phofl)
9-
- Revision: 1
9+
- Revision: 2
10+
11+
# Note
12+
13+
This PDEP was originally accepted on May 8, 2023. However, after reviewing feedback posted
14+
on the feedback issue [#54466](https://github.com/pandas-dev/pandas/issues/54466), we, the members of
15+
the core team, have not decided with moving forward with this PDEP for pandas 3.0.
16+
17+
The primary reasons for rejecting this PDEP are twofold:
18+
19+
1) Requiring pyarrow as a dependency causes installation problems.
20+
- Pyarrow does not fit or has a hard time fitting in space-constrained environments
21+
such as AWS Lambda and WASM, due to its large size of around ~40 MB for a compiled wheel
22+
(which is larger than pandas' own wheel sizes)
23+
- Installation of pyarrow is not possible on some platforms. We provide support for some
24+
less widely used platforms such as Alpine Linux (and there is third party support for pandas in
25+
pyodide, a WASM distribution of pandas), both of which pyarrow does not provide wheels for.
26+
27+
While both of these reasons are mentioned in the drawbacks section of this PDEP, at the time of the writing
28+
of the PDEP, we underestimated the impact this would have on users, and also downstream developers.
29+
30+
2) Many of the benefits presented in this PDEP can be materialized even with payrrow as an optional dependency.
31+
32+
For example, as detailed in PDEP-14, it is possible to create a new string data type with the same semantics
33+
as our current default object string data type, but that allows users to experience faster performance and memory savings
34+
compared to the object strings.
35+
36+
While we've decided to not move forward with requiring pyarrow in pandas 3.0, the rejection of this PDEP
37+
does not mean that we are abandoning pyarrow support and integration in pandas. We, as the core team, still believe
38+
that adopting support for pyarrow arrays and data types in more of pandas will lead to greater interoperability with the
39+
ecosystem and better performance for users. Furthermore, a lot of the drawbacks, such as the large installation size of pyarrow
40+
and the lack of support for certain platforms, can be solved, and potential solutions have been proposed for them, allowing us
41+
to potentially revisit this decision in the future.
42+
43+
However, at this point in time, it is clear that we are not ready to require pyarrow
44+
as a dependency in pandas.
45+
1046

1147
## Abstract
1248

@@ -210,6 +246,7 @@ before releasing a new pandas version.
210246

211247
- 17 April 2023: Initial version
212248
- 8 May 2023: Changed proposal to make pyarrow required in pandas 3.0 instead of 2.1
249+
- 8 May 2024: Changed status to rejected
213250

214251
[^1] <https://pandas.pydata.org/docs/development/roadmap.html#apache-arrow-interoperability>
215252
[^2] <https://arrow.apache.org/powered_by/>

0 commit comments

Comments
 (0)