Skip to content

Commit 61a39e1

Browse files
authored
Backport PR #60739 on branch 2.3.x (ENH: pandas.api.interchange.from_dataframe now uses the Arrow PyCapsule Interface if available, only falling back to the Dataframe Interchange Protocol if that fails) (#61488)
Backport PR #60739: ENH: pandas.api.interchange.from_dataframe now uses the Arrow PyCapsule Interface if available, only falling back to the Dataframe Interchange Protocol if that fails
1 parent 80c9f48 commit 61a39e1

File tree

3 files changed

+39
-3
lines changed

3 files changed

+39
-3
lines changed

doc/source/whatsnew/v2.3.0.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ Other enhancements
3636
when using ``np.array()`` or ``np.asarray()`` on pandas objects) has been
3737
updated to raise FutureWarning with NumPy >= 2 (:issue:`60340`)
3838
- :meth:`Series.str.decode` result now has ``StringDtype`` when ``future.infer_string`` is True (:issue:`60709`)
39+
- :meth:`pandas.api.interchange.from_dataframe` now uses the `PyCapsule Interface <https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html>`_ if available, only falling back to the Dataframe Interchange Protocol if that fails (:issue:`60739`)
3940
- :meth:`~Series.to_hdf` and :meth:`~DataFrame.to_hdf` now round-trip with ``StringDtype`` (:issue:`60663`)
4041
- Improved ``repr`` of :class:`.NumpyExtensionArray` to account for NEP51 (:issue:`61085`)
4142
- The :meth:`Series.str.decode` has gained the argument ``dtype`` to control the dtype of the result (:issue:`60940`)

pandas/core/interchange/from_dataframe.py

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,21 @@ def from_dataframe(df, allow_copy: bool = True) -> pd.DataFrame:
3636
"""
3737
Build a ``pd.DataFrame`` from any DataFrame supporting the interchange protocol.
3838
39+
.. note::
40+
41+
For new development, we highly recommend using the Arrow C Data Interface
42+
alongside the Arrow PyCapsule Interface instead of the interchange protocol.
43+
From pandas 2.3 onwards, `from_dataframe` uses the PyCapsule Interface,
44+
only falling back to the interchange protocol if that fails.
45+
46+
.. warning::
47+
48+
Due to severe implementation issues, we recommend only considering using the
49+
interchange protocol in the following cases:
50+
51+
- converting to pandas: for pandas >= 2.0.3
52+
- converting from pandas: for pandas >= 3.0.0
53+
3954
Parameters
4055
----------
4156
df : DataFrameXchg
@@ -67,6 +82,18 @@ def from_dataframe(df, allow_copy: bool = True) -> pd.DataFrame:
6782
if isinstance(df, pd.DataFrame):
6883
return df
6984

85+
if hasattr(df, "__arrow_c_stream__"):
86+
try:
87+
pa = import_optional_dependency("pyarrow", min_version="14.0.0")
88+
except ImportError:
89+
# fallback to _from_dataframe
90+
pass
91+
else:
92+
try:
93+
return pa.table(df).to_pandas(zero_copy_only=not allow_copy)
94+
except pa.ArrowInvalid as e:
95+
raise RuntimeError(e) from e
96+
7097
if not hasattr(df, "__dataframe__"):
7198
raise ValueError("`df` does not support __dataframe__")
7299

pandas/tests/interchange/test_impl.py

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -288,7 +288,7 @@ def test_empty_pyarrow(data):
288288
expected = pd.DataFrame(data)
289289
arrow_df = pa_from_dataframe(expected)
290290
result = from_dataframe(arrow_df)
291-
tm.assert_frame_equal(result, expected)
291+
tm.assert_frame_equal(result, expected, check_column_type=False)
292292

293293

294294
def test_multi_chunk_pyarrow() -> None:
@@ -298,8 +298,7 @@ def test_multi_chunk_pyarrow() -> None:
298298
table = pa.table([n_legs], names=names)
299299
with pytest.raises(
300300
RuntimeError,
301-
match="To join chunks a copy is required which is "
302-
"forbidden by allow_copy=False",
301+
match="Cannot do zero copy conversion into multi-column DataFrame block",
303302
):
304303
pd.api.interchange.from_dataframe(table, allow_copy=False)
305304

@@ -606,3 +605,12 @@ def test_empty_dataframe():
606605
result = pd.api.interchange.from_dataframe(dfi, allow_copy=False)
607606
expected = pd.DataFrame({"a": []}, dtype="int8")
608607
tm.assert_frame_equal(result, expected)
608+
609+
610+
def test_from_dataframe_list_dtype():
611+
pa = pytest.importorskip("pyarrow", "14.0.0")
612+
data = {"a": [[1, 2], [4, 5, 6]]}
613+
tbl = pa.table(data)
614+
result = from_dataframe(tbl)
615+
expected = pd.DataFrame(data)
616+
tm.assert_frame_equal(result, expected)

0 commit comments

Comments
 (0)