Skip to content

fix: use fastpath for PyCapsule export when starting from pyarrow-backed Series, respect requested_schema #59683

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Sep 9, 2024

Conversation

MarcoGorelli
Copy link
Member

@MarcoGorelli MarcoGorelli commented Sep 2, 2024

Follow-up to https://github.com/pandas-dev/pandas/pull/59587/files (so, I haven't added a new whatsnew)

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

@MarcoGorelli MarcoGorelli changed the title fix: use fastpath for PyCapsule export when starting from pyarrow-bac… fix: use fastpath for PyCapsule export when starting from pyarrow-backed Series, respect requested_schema Sep 2, 2024
@MarcoGorelli
Copy link
Member Author

@jorisvandenbossche @WillAyd fancy taking a look?

@MarcoGorelli MarcoGorelli marked this pull request as ready for review September 2, 2024 14:52
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow-up!

Comment on lines 589 to 595
if isinstance(self.dtype, ArrowDtype):
# fastpath!
ca = self.values._pa_array
if type is not None:
ca = ca.cast(type)
else:
ca = pa.chunked_array([pa.Array.from_pandas(self, type=type)])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not yet handle StringDtype (which is not an ArrowDtype).

Alternative option is something like (untested):

ca = pa.array(self, type=type)
if not isinstance(ca, pa.ChunkedArray):
    ca = pa.chunked_array([ca])

Note: 1) it's a bit strange, but pa.array() can return a chunked array (in this case, it will check the __arrow_array__ attribute on the passed object, and for our dtypes backed by a chunked array, that method will return the chunked array, and this is directly passed through as result of pa.array()), and 2) pa.array() is essentially just the same as pa.Array.from_pandas but a bit simpler (and because we rely on it returning potentially a chunked array, it would read even more strangely ..)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that seems to work, thanks @jorisvandenbossche !

@mroeschke mroeschke added the Arrow pyarrow functionality label Sep 3, 2024
@mroeschke mroeschke added this to the 3.0 milestone Sep 9, 2024
@mroeschke mroeschke merged commit 871703d into pandas-dev:main Sep 9, 2024
47 checks passed
@mroeschke
Copy link
Member

Thanks for the follow up @MarcoGorelli

ammar-qazi pushed a commit to ammar-qazi/pandas that referenced this pull request Sep 9, 2024
…ked Series, respect requested_schema (pandas-dev#59683)

* fix: use fastpath for PyCapsule export when starting from pyarrow-backed Series, respect requested_schema

* simplify

* stringdtype test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants