-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ExtensionDtype API for preferred type when converting to NumPy array #22791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Not sure if that was from the same discussion, but I recall a discussion where we said that we basically wanted to know the "numpy dtype if I would be converted to a numpy array" of an ExtensionArray, without actually converting it. |
It was indeed from the sparse discussion, my comment there: #22325 (comment) Basically, we want to know We could add a This might also overlap with #22224 |
There's some overlap with #22224, but I don't know if we can re-use the same attribute for both unfortunately. IntegerArray.numpy_type can't hold values with NaNs. |
Yes, I don't think we can re-use it, but the The other issue is more about if you have a composite array, what is the dtype of the underlying array (which of course can also be multiple arrays .., so then this might also be multiple dtypes), while here it is the dtype when converted to a numpy array. But I would use a |
Although the question is, what if this dtype depends on the data? |
I assume you meant integer dtype. That's a good point. It's not clear what's best here. I think IntegerArray may not be the best one to consider here, since the "best type" so clearly depends on the values. There's no way to put that on the ExtensionDtype. Let's consider, say, Categorical and Sparse. In this case, there's always a clearly best dtype for the resulting ndarray, |
Sorry, I was speaking about the return value for The question then is, what should the value for this attribute be for IntegerArray? |
I think we have this now? e.g. can just specify the |
I don't think this is solved. It is not about passing a dtype in One use case is to define the interleaved dtype (see top post) (but it's not necessarily a blocker for 0.24.0) |
Agreed: not solved, but a blocker.
…On Fri, Jan 4, 2019 at 10:22 AM Joris Van den Bossche < ***@***.***> wrote:
I don't think this is solved. It is not about passing a dtype in
np.asarray(..), the original question is: predicting for an array what
the dtype will be of np.array(dtype) (so the default when not passing a
dtype= kwarg).
One use case is to define the interleaved dtype (see top post)
(but it's not necessarily a blocker for 0.24.0)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#22791 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHItXsV-jbeGWEwnblFUqboJFQtLmfks5u_3_ggaJpZM4WydvG>
.
|
interleaved_dtype goes through find_common_type, for which EA authors can specify behavior via _get_common_dtype. There are some standard cases where find_common_type isn't quite right due to values-dependent-behavior. Recently we've been collecting the values-dependent-behavior in dtypes.cast functions So I'm thinking we could implement something like |
This proposal sounds related to #22224 -- would the proposed plan be to use something like pandas/pandas/core/internals/managers.py Lines 1587 to 1595 in 60ac973
in the clause for |
is this really desirable though? wouldn't it go against this comment:
I'd be more inclined to do:
This would be the 2D analogue of #48891 |
This is coming out of SparseArray. Right now if you have a homogeneous DataFrame of EAs, then doing something like
np.asarray(df)
will always convert to object.pandas/pandas/core/internals/managers.py
Lines 787 to 794 in 32a74f1
Should we give ExtensionDtype some input on the dtype used here? This would allow for some consistency between
np.asarray(series)
andnp.asarray(homogeneous_dataframe)
.The text was updated successfully, but these errors were encountered: