-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: add NDArrayBackedExtensionArray to public API #56755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
1f93779
522b548
ee4e23d
945f840
721ae11
ae68f9d
05d0e08
1ad0338
38113c8
18ec784
2919f60
0c52366
319ac2b
8513863
5309895
827f483
2cd9b31
cc75eda
ca323bb
bfd31f0
396da54
27cf80e
c716826
f4df0e9
8876b9a
1bdd1cd
4b0a948
5920778
38018e6
9277cf5
0b86bd5
a5ac8ba
8a621c5
0e674d4
f7e353a
01191d1
ce4eeef
7019bc7
5f99f57
47f8917
4f8c055
1aaaa9a
8a66d3a
a8fe040
f2cbd4b
552f7a3
6ae423d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -92,6 +92,17 @@ def method(self, *args, **kwargs): | |
class NDArrayBackedExtensionArray(NDArrayBacked, ExtensionArray): | ||
""" | ||
ExtensionArray that is backed by a single NumPy ndarray. | ||
|
||
Notes | ||
----- | ||
This class is part of the public API, but may be adjusted in non-user-facing | ||
ways more aggressively than the regular API. | ||
|
||
Examples | ||
-------- | ||
Please see the following: | ||
|
||
https://pandas.pydata.org/docs/development/extending.html#NDArrayBackedExtensionArray | ||
""" | ||
|
||
_ndarray: np.ndarray | ||
|
@@ -114,6 +125,7 @@ def _validate_scalar(self, value): | |
|
||
# ------------------------------------------------------------------------ | ||
|
||
@doc(ExtensionArray.view) | ||
def view(self, dtype: Dtype | None = None) -> ArrayLike: | ||
# We handle datetime64, datetime64tz, timedelta64, and period | ||
# dtypes here. Everything else we pass through to the underlying | ||
|
@@ -154,6 +166,7 @@ def view(self, dtype: Dtype | None = None) -> ArrayLike: | |
# Sequence[int]]], List[Any], _DTypeDict, Tuple[Any, Any]]]" | ||
return arr.view(dtype=dtype) # type: ignore[arg-type] | ||
|
||
@doc(ExtensionArray.view) | ||
def take( | ||
self, | ||
indices: TakeIndexer, | ||
|
@@ -440,20 +453,8 @@ def _where(self: Self, mask: npt.NDArray[np.bool_], value) -> Self: | |
# ------------------------------------------------------------------------ | ||
# Index compat methods | ||
|
||
@doc(ExtensionArray.insert) | ||
def insert(self, loc: int, item) -> Self: | ||
""" | ||
Make new ExtensionArray inserting new item at location. Follows | ||
Python list.append semantics for negative values. | ||
|
||
Parameters | ||
---------- | ||
loc : int | ||
item : object | ||
|
||
Returns | ||
------- | ||
type(self) | ||
""" | ||
loc = validate_insert_loc(loc, len(self)) | ||
|
||
code = self._validate_scalar(item) | ||
|
@@ -474,16 +475,24 @@ def insert(self, loc: int, item) -> Self: | |
|
||
def value_counts(self, dropna: bool = True) -> Series: | ||
""" | ||
Return a Series containing counts of unique values. | ||
Return a Series containing counts of each unique value. | ||
|
||
Parameters | ||
---------- | ||
dropna : bool, default True | ||
Don't include counts of NA values. | ||
Don't include counts of missing values. | ||
|
||
Returns | ||
------- | ||
Series | ||
|
||
Examples | ||
-------- | ||
>>> arr = pd.array([4, 5]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this won't give a NDArrayBackedEA There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The docstrings for the other methods are the EA docstrings, so I've used an example similar to those here. Is there a better way of going about this? I can't see how to write an example using a NDArrayBackedEA without several lines initialising an ExtensionDtype and NDArrayBackedEA |
||
>>> arr.value_counts() | ||
4 1 | ||
5 1 | ||
Name: count, dtype: Int64 | ||
""" | ||
if self.ndim != 1: | ||
raise NotImplementedError | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is implicitly assuming that the ordering of self matches the ordering of self._ndarray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that an issue? If someone wants control over that they could use ExtensionArray instead of NDBackedExtensionArray.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a conversation with @jorisvandenbossche two ideas were clarified for me:
If I've misunderstood either the problem or proposed next steps, happy to edit the above to correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the ordering question, both NumPy and Pandas select min and max (and also sort) based only on the real component of complex numbers:
Thus it is up to us to decide what sorting behavior applies to uncertainties (I propose ordering based on magnitude only, not error terms, except when error is NaN, in which case the value is treated as NaN).
When we look at the composition question (2, above), we will look at having the subclasses deal with this entirely (meaning we can implement an EA of units which might also have uncertain values). And of course update the documentation...