Skip to content

Commit 5e451db

Browse files
committed
Split out into new pdep
1 parent 98eb85a commit 5e451db

File tree

1 file changed

+46
-0
lines changed

1 file changed

+46
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# PDEP-15: Do not require PyArrow as a required dependency (for pandas 3.0)
2+
3+
- Created: 8 May 2024
4+
- Status: Under Discussion
5+
- Discussion: [#58623](https://github.com/pandas-dev/pandas/pull/58623)
6+
[#52711](https://github.com/pandas-dev/pandas/pull/52711)
7+
[#52509](https://github.com/pandas-dev/pandas/issues/52509)
8+
[#54466](https://github.com/pandas-dev/pandas/issues/54466)
9+
- Author: [Thomas Li](https://github.com/lithomas1)
10+
- Revision: 1
11+
12+
## Abstract
13+
14+
This PDEP was supersedes PDEP-10, which stipulated that PyArrow should become a required dependency
15+
for pandas 3.0. After reviewing feedback posted
16+
on the feedback issue [#54466](https://github.com/pandas-dev/pandas/issues/54466), we, the members of
17+
the core team, have decided against moving forward with this PDEP for pandas 3.0.
18+
19+
The primary reasons for rejecting this PDEP are twofold:
20+
21+
1) Requiring pyarrow as a dependency causes installation problems.
22+
- Pyarrow does not fit or has a hard time fitting in space-constrained environments
23+
such as AWS Lambda and WASM, due to its large size of around ~40 MB for a compiled wheel
24+
(which is larger than pandas' own wheel sizes)
25+
- Installation of pyarrow is not possible on some platforms. We provide support for some
26+
less widely used platforms such as Alpine Linux (and there is third party support for pandas in
27+
pyodide, a WASM distribution of pandas), both of which pyarrow does not provide wheels for.
28+
29+
While both of these reasons are mentioned in the drawbacks section of this PDEP, at the time of the writing
30+
of the PDEP, we underestimated the impact this would have on users, and also downstream developers.
31+
32+
2) Many of the benefits presented in this PDEP can be materialized even with payrrow as an optional dependency.
33+
34+
For example, as detailed in PDEP-14, it is possible to create a new string data type with the same semantics
35+
as our current default object string data type, but that allows users to experience faster performance and memory savings
36+
compared to the object strings (if pyarrow is installed).
37+
38+
While we've decided to not move forward with requiring pyarrow in pandas 3.0, the rejection of this PDEP
39+
does not mean that we are abandoning pyarrow support and integration in pandas. We, as the core team, still believe
40+
that adopting support for pyarrow arrays and data types in more of pandas will lead to greater interoperability with the
41+
ecosystem and better performance for users. Furthermore, a lot of the drawbacks, such as the large installation size of pyarrow
42+
and the lack of support for certain platforms, can be solved, and potential solutions have been proposed for them, allowing us
43+
to potentially revisit this decision in the future.
44+
45+
However, at this point in time, it is clear that we are not ready to require pyarrow
46+
as a dependency in pandas.

0 commit comments

Comments
 (0)