Skip to content

BUG: Interchange with PyArrow types does not match PyArrow behavior #56801

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
WillAyd opened this issue Jan 9, 2024 · 3 comments
Open
3 tasks done

BUG: Interchange with PyArrow types does not match PyArrow behavior #56801

WillAyd opened this issue Jan 9, 2024 · 3 comments
Labels
Arrow pyarrow functionality Bug Interchange Dataframe Interchange Protocol

Comments

@WillAyd
Copy link
Member

WillAyd commented Jan 9, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

>>> import datetime
>>> import pandas as pd
>>> df = pd.DataFrame([[datetime.date(2023, 1, 1)]], dtype=pd.ArrowDtype(pa.date32()))
>>> df.dtypes
0    date32[day][pyarrow]
dtype: object

>>> df.__dataframe__().get_column(0).get_buffers()
{'data': (PandasBuffer({'bufsize': 8, 'ptr': 103818415071840, 'device': 'CPU'}), (<DtypeKind.DATETIME: 22>, 64, 'tdD', '=')), 'validity': None, 'offsets': None}

>>> pa.Table.from_pandas(df).__dataframe__().get_column(0).get_buffers()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/willayd/mambaforge/envs/pantab-dev/lib/python3.12/site-packages/pyarrow/interchange/dataframe.py", line 139, in get_column
    return _PyArrowColumn(self._df.column(i),
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/willayd/mambaforge/envs/pantab-dev/lib/python3.12/site-packages/pyarrow/interchange/column.py", line 239, in __init__
    self._dtype = self._dtype_from_arrowdtype(dtype, bit_width)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/willayd/mambaforge/envs/pantab-dev/lib/python3.12/site-packages/pyarrow/interchange/column.py", line 322, in _dtype_from_arrowdtype
    raise ValueError(
ValueError: Data type date32[day] not supported by interchange protocol

Issue Description

I find it surprising that pandas provides a different value to the interchange protocol for an Arrow dtype than Arrow itself would (side note, I am also surprised Arrow fails here)

cc @MarcoGorelli

Expected Behavior

Should match arrow

Installed Versions

main

@WillAyd WillAyd added Bug Needs Triage Issue that has not been reviewed by a pandas team member Interchange Dataframe Interchange Protocol and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 9, 2024
@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jan 10, 2024

I find it surprising that pandas provides a different value to the interchange protocol for an Arrow dtype than Arrow itself

Essentially the current implementation of the interchange protocol in pandas is based on numpy, so it converts (or tries to convert) whatever other type (our own nullable, or Arrow dtypes) to numpy. Which is the root cause of the various issues you opened, I think (#56805, #56777)

@WillAyd
Copy link
Member Author

WillAyd commented Jan 10, 2024

I assume there is historical reasoning to that, but is there any reason you think it should remain that way going forward?

@jorisvandenbossche
Copy link
Member

No, there is definitely no good reason to not support it going forward, it was just to limit the scope of the initial PR I assume.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Bug Interchange Dataframe Interchange Protocol
Projects
None yet
Development

No branches or pull requests

3 participants