Skip to content

BUG: interchange.from_dataframe doesn't work with large_string #52795

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
MarcoGorelli opened this issue Apr 20, 2023 · 2 comments · Fixed by #52800
Closed
3 tasks done

BUG: interchange.from_dataframe doesn't work with large_string #52795

MarcoGorelli opened this issue Apr 20, 2023 · 2 comments · Fixed by #52800
Labels
Bug Interchange Dataframe Interchange Protocol

Comments

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Apr 20, 2023

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pyarrow as pa

arr = ["foo", "bar"]
table = pa.table(
    {"arr": pa.array(arr, 'large_string')}
)
exchange_df = table.__dataframe__()

from pandas.core.interchange.from_dataframe import from_dataframe
from_dataframe(exchange_df)

Issue Description

I get

Traceback (most recent call last):
  File "t.py", line 30, in <module>
    from_dataframe(exchange_df)
  File "/home/marcogorelli/pandas-dev/pandas/core/interchange/from_dataframe.py", line 52, in from_dataframe
    return _from_dataframe(df.__dataframe__(allow_copy=allow_copy))
  File "/home/marcogorelli/pandas-dev/pandas/core/interchange/from_dataframe.py", line 73, in _from_dataframe
    pandas_df = protocol_df_chunk_to_pandas(chunk)
  File "/home/marcogorelli/pandas-dev/pandas/core/interchange/from_dataframe.py", line 125, in protocol_df_chunk_to_pandas
    columns[name], buf = string_column_to_ndarray(col)
  File "/home/marcogorelli/pandas-dev/pandas/core/interchange/from_dataframe.py", line 242, in string_column_to_ndarray
    assert protocol_data_dtype[1] == 8  # bitwidth == 8
AssertionError

Expected Behavior

Raise informative error message, in particular not using assert

cc @AlenkaF @jorisvandenbossche

Installed Versions

INSTALLED VERSIONS

commit : 7059b7b
python : 3.8.16.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.102.1-microsoft-standard-WSL2
Version : #1 SMP Wed Mar 2 00:30:59 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 2.1.0.dev0+573.g7059b7b864.dirty
numpy : 1.23.5
pytz : 2023.3
dateutil : 2.8.2
setuptools : 67.6.1
pip : 23.1
Cython : 0.29.33
pytest : 7.3.1
hypothesis : 6.72.0
sphinx : 6.1.3
blosc : 1.11.1
feather : None
xlsxwriter : 3.1.0
lxml.etree : 4.9.2
html5lib : 1.1
pymysql : 1.0.3
psycopg2 : 2.9.6
jinja2 : 3.1.2
IPython : 8.12.0
pandas_datareader: None
bs4 : 4.12.2
bottleneck : 1.3.7
brotli :
fastparquet : 2023.2.0
fsspec : 2023.4.0
gcsfs : 2023.4.0
matplotlib : 3.6.3
numba : 0.56.4
numexpr : 2.8.4
odfpy : None
openpyxl : 3.1.0
pandas_gbq : None
pyarrow : 11.0.0
pyreadstat : 1.2.1
pyxlsb : 1.0.10
s3fs : 2023.4.0
scipy : 1.10.1
snappy :
sqlalchemy : 2.0.9
tables : 3.8.0
tabulate : 0.9.0
xarray : 2023.1.0
xlrd : 2.0.1
zstandard : 0.21.0
tzdata : 2023.3
qtpy : None
pyqt5 : None
None

@MarcoGorelli MarcoGorelli added Bug Needs Triage Issue that has not been reviewed by a pandas team member Interchange Dataframe Interchange Protocol labels Apr 20, 2023
@MarcoGorelli
Copy link
Member Author

from @honno: this should raise, which it does, though arguably not with assert, and the error message could be better

@MarcoGorelli MarcoGorelli changed the title BUG: interchange.from_dataframe doesn't work with large_string ERR: interchange.from_dataframe doesn't work with large_string Apr 20, 2023
@DeaMariaLeon DeaMariaLeon removed the Needs Triage Issue that has not been reviewed by a pandas team member label Apr 20, 2023
@MarcoGorelli MarcoGorelli changed the title ERR: interchange.from_dataframe doesn't work with large_string BUG: interchange.from_dataframe doesn't work with large_string Apr 20, 2023
@MarcoGorelli
Copy link
Member Author

correction - from data-apis/dataframe-api#150 (comment) , my understanding is that this should indeed work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Interchange Dataframe Interchange Protocol
Projects
None yet
2 participants