-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Support explode for ArrowDtype #53373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Also please confirm if the operations listed here are the only ones that are supported for the |
-1 on raising, falling back to numpy is a suitable alternative in a bunch of scenarios for now |
@phofl If I understood you correctly you mean that we should rather refrain from using |
No, we are using numpy internally in a couple of methods right now, because there isn’t a arrow implementation available yet and that is fine since the conversion is cheap there |
Is it just I think most of the methods should have Arrow implementations, so it's probably easier to just fix the ones that don't that can be fixed. |
Sorry but haven't checked for all of them. |
pyarrow
dtypes pyarrow
dtypes
Going to change this issue to be explode specific. I don't think it's really feasible to warn on every methods that don't support arrow now; and the time is better spent supporting the method instead |
pyarrow
dtypes
@mroeschke - With the scope reduced to |
Ah right I believe #53602 closes this |
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
I wish for the functions that are still not supported an
Exception
is thrown. For instance, for the below snippet I try and use theexplode
method on twoDataFrame
s one havingpyarrow
backed data type and anothernumpy
.I make the
DataFrame
s by reading from theparquet
files because when doing in-memory using theconvert_dtypes
method it casts columnx
asobject
.If I check schema
If I try to extract one element from each
array([list([1, 2]), 'a'], dtype=object)
array([array([1, 2]), 'a'], dtype=object)
I do note how the dtype is different in both.
FInally, if I try to call the
explode
method on:numpy
backed dtypeOutput
pyarrow
backed dtypeOutput (without any error)
So basically, this was not the result I was expecting.
P.S. I was not sure if this would rather qualify as a bug.
Platform details
Feature Description
I don't know enough about the internals of pandas to propose anything here.
Alternative Solutions
For the functions which are not natively supported for
pyarrow
backed data frames could we internally convert them to thenumpy
one, do the operation and convert them back? In this case a warning message would make sense.But again since I don't know enough whether such a make-shift solution meets the standards wrt performance/api design etc I would not be able to make a suggestion here.
Additional Context
No response
The text was updated successfully, but these errors were encountered: