Skip to content

BUG: interchange.from_dataframe doesn't work with large_string #52800

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 21, 2023

Conversation

MarcoGorelli
Copy link
Member

@MarcoGorelli MarcoGorelli added the Interchange Dataframe Interchange Protocol label Apr 20, 2023
@jorisvandenbossche
Copy link
Member

The change looks good.

One related thing to note: it might be good to test the full roundtrip, i.e. the conversion from pandas to pyarrow as well (or at least have a test that checks what we create for buffers in pandas' __dataframe__, to ensure that the type (u vs U) matches the offsets' dtype)

@MarcoGorelli
Copy link
Member Author

thanks @jorisvandenbossche , have updated and added a test - would appreciate it if we could get this in to encourage more testing of "from interchange object to pandas" scenarios in real life

@jorisvandenbossche jorisvandenbossche added this to the 2.0.1 milestone Apr 21, 2023
@jorisvandenbossche
Copy link
Member

It's good that the roundtrip works. Now, that only works because pyarrow actually checks the bitwidth of the offsets in case the type string is "u" (and not just assumes it will be int32 offsets).
It's probably still good to make the pandas implementation compliant with the spec (use int32, or keep using int64 offsets but use "U" instead of "u" in the dtype specification)

@MarcoGorelli
Copy link
Member Author

thanks for explaining, noted!

MarcoGorelli added a commit that referenced this pull request Apr 21, 2023
…oesn't work with large_string) (#52822)

Backport PR #52800: BUG: interchange.from_dataframe doesn't work with large_string

Co-authored-by: Marco Edward Gorelli <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Interchange Dataframe Interchange Protocol
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: interchange.from_dataframe doesn't work with large_string
2 participants