-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Public Data Followups #23995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This covers the use-cases that come to mind for me. For the For the |
I was thinking |
Reading through https://pandas-docs.github.io/pandas-docs-travis/whatsnew/v0.24.0.html#accessing-the-values-in-a-series-or-index, I really think the Basically, |
It's not in the release notes, but we did implement `DataFrame.to_numpy()`
that's an alias for `.values`. (I'll update, when I fix a few broken links).
I don't know what would be returned for `DataFrame.array`. There (often)
isn't a single array that could be returned, and the point of `.array` is to
*always* be a view back on the original data.
…On Wed, Dec 5, 2018 at 12:39 AM h-vetinari ***@***.***> wrote:
Reading through
https://pandas-docs.github.io/pandas-docs-travis/whatsnew/v0.24.0.html#accessing-the-values-in-a-series-or-index,
I really think the .values -> .array shift should be done for DataFrame
as well, even if there is no semantic collision like for the EAs.
Basically, Series and DataFrame should have the same fundamental API
about interacting with index/data, so one might also consider a .to_numpy
that (currently) only returns .array.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#23995 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIhUAmhz4C1aSZ5HVkSKRZGyN_GtZks5u12oNgaJpZM4Y5_u0>
.
|
User-facing change: `Series[period].values` nad `Series[interval].values` continues to be an ndarray of objects. Recommend ``.array`` instead. There are a handful of related places in pandas where we assumed that ``Series[EA].values`` was an EA. Part of pandas-dev#23995
* API: Revert breaking `.values` changes User-facing change: `Series[period].values` nad `Series[interval].values` continues to be an ndarray of objects. Recommend ``.array`` instead. There are a handful of related places in pandas where we assumed that ``Series[EA].values`` was an EA. Part of #23995
I think we should make it so that If a user wants a NumPy array, then they can use One potential downside: people start sticking this inside a Series / DataFrame instead of passing the actual ndarray. That would put them into the EA interface, which we may not want, especially for wide dataframes since they wouldn't be consolidated. We'll need to carefully document around this. Does anyone have any objections to that? cc @pandas-dev/pandas-core since this is a kind of fundamental (non-breaking) change to our 1.0 data model. |
We could also say that as a special case Being able to actually use a
|
This will make for a good test of just how much overhead we have :) Aside from non-consolidation, I suspect it won't be much, but best to verify. |
My WIP branch is at master...TomAugspurger:numpy-ea, FYI, in case anyone wants to work on it. I'm moving back to fighting with serialization issues in #24024 right now. |
The final piece that hasn't been addressed is the signature of I propose the following def to_numpy(self, dtype=None, copy=False):
"""
Convert the Series to a :class:`numpy.ndarray`.
By default, this requires no coercion or copying of data
for Series backed by a NumPy array. For Series backed by
an ExtensionArray coercion or copying may be required if
NumPy cannot natively hold the values of the array.
Parameters
----------
dtype : numpy.dtype
The NumPy dtype to pass to :func:`numpy.array`.
copy : bool, default False
Whether to copy the underlying data.
Returns
-------
ndarray
"""
result = np.array(self.array, dtype=dtype, copy=copy)
return result I think that'll cover most of the use cases. In particular, it'll handle
|
This is part 1 of pandas-dev#23995 We make the signature of `to_numpy(dtype : Union[str, np.dtype], copy : bool) -> ndarray`
This is part 1 of pandas-dev#23995 We make the signature of `to_numpy(dtype : Union[str, np.dtype], copy : bool) -> ndarray`
This is part 1 of pandas-dev#23995 We make the signature of `to_numpy(dtype : Union[str, np.dtype], copy : bool) -> ndarray`
This is part 1 of pandas-dev#23995 We make the signature of `to_numpy(dtype : Union[str, np.dtype], copy : bool) -> ndarray`
This is part 1 of pandas-dev#23995 We make the signature of `to_numpy(dtype : Union[str, np.dtype], copy : bool) -> ndarray`
I think that everything here has been addressed. |
* API: Revert breaking `.values` changes User-facing change: `Series[period].values` nad `Series[interval].values` continues to be an ndarray of objects. Recommend ``.array`` instead. There are a handful of related places in pandas where we assumed that ``Series[EA].values`` was an EA. Part of pandas-dev#23995
* API: Revert breaking `.values` changes User-facing change: `Series[period].values` nad `Series[interval].values` continues to be an ndarray of objects. Recommend ``.array`` instead. There are a handful of related places in pandas where we assumed that ``Series[EA].values`` was an EA. Part of pandas-dev#23995
leftover from #23623
Signature for
.to_numpy()
: @jorisvandenbossche proposedcopy=True
, which I think is good. Beyond that, we may want to control the "fidelity" of the conversion. ShouldSeries[datetime64[ns, tz]].to_numpy()
be an ndarray of Timestamp objets or an ndarray of dateimte64[ns] normalized to UTC (by default, and should we allow that to be controlled)? Can we hope for a set of keywords appropriate for all subtypes, or do we need to allowkwargs
? Perhapsto_numpy(copy=True, dtype=None)
will suffice?Make
.array
always an ExtensionArray (via @shoyer). This gives pandas a bit more freedom going forward, since the type of.array
will be stable if / when we flip over to Arrow arrays by default. We'll just swap out the data backing the ExtensionArray. A generic "NumpyBackedExtensionArray" is pretty easy to write (I had one in cyberpandas). My main concern here is that it makes the statement ".array
is the actual data stored in the Series / Index" falseish, but that's OK.Revert the breaking changes to
Series.values
forperiod
andinterval
dtype data (cc @jschendel)? I think we should do this.In terms of LOC, it's a simple change
There are a couple other places (like
Series._ndarray_values
) that assume "extension dtype means.values
is an ExtensionArray", which I've surfaced on my DatetimeArray branch. We'll need to update those to use.array
anyway.Series.to_numpy()
signatureSeries.array
is always an EASeries.values
for Period / Interval (API: Revert breaking.values
changes #24163)The text was updated successfully, but these errors were encountered: