-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
EA: implement+test EA.view #27633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EA: implement+test EA.view #27633
Changes from 13 commits
c368fb8
27a0af2
6636c89
a311f6e
f99afb5
1c3d81c
cddcab4
aa290bf
6053a50
426c434
67090a6
d84a0ba
d9158ef
86c9ab9
2b5e5fa
24b235c
c44aada
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -64,6 +64,7 @@ class ExtensionArray: | |
shift | ||
take | ||
unique | ||
view | ||
_concat_same_type | ||
_formatter | ||
_formatting_values | ||
|
@@ -147,7 +148,7 @@ class ExtensionArray: | |
If implementing NumPy's ``__array_ufunc__`` interface, pandas expects | ||
that | ||
|
||
1. You defer by raising ``NotImplemented`` when any Series are present | ||
1. You defer by returning ``NotImplemented`` when any Series are present | ||
in `inputs`. Pandas will extract the arrays and call the ufunc again. | ||
2. You define a ``_HANDLED_TYPES`` tuple as an attribute on the class. | ||
Pandas inspect this to determine whether the ufunc is valid for the | ||
|
@@ -354,7 +355,7 @@ def ndim(self) -> int: | |
""" | ||
Extension Arrays are only allowed to be 1-dimensional. | ||
""" | ||
return 1 | ||
return len(self.shape) | ||
|
||
@property | ||
def nbytes(self) -> int: | ||
|
@@ -862,6 +863,29 @@ def copy(self) -> ABCExtensionArray: | |
""" | ||
raise AbstractMethodError(self) | ||
|
||
def view(self, dtype=None) -> ABCExtensionArray: | ||
jreback marked this conversation as resolved.
Show resolved
Hide resolved
|
||
""" | ||
Return a view on the array. | ||
jreback marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Parameters | ||
---------- | ||
dtype : str, np.dtype, or ExtensionDtype, optional | ||
Default None | ||
|
||
Returns | ||
------- | ||
ExtensionArray | ||
|
||
Notes | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've been inconsistent about this, be in general I've tried to keep docstrings user-facing. For implementation notes I've just used regular comments. What are the consequences of Do we have any restrictions on this being zero-copy? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The test added in this PR will fail if we don't get an actual view.
The default implementation should Just Work as long as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Convert this to normal comments as Tom mentioned? |
||
----- | ||
- This must return a *new* object, not self. | ||
jbrockmendel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- The only case that *must* be implemented is with dtype=None, | ||
giving a view with the same dtype as self. | ||
""" | ||
if dtype is not None: | ||
raise NotImplementedError(dtype) | ||
return self[:] | ||
|
||
# ------------------------------------------------------------------------ | ||
# Printing | ||
# ------------------------------------------------------------------------ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -517,19 +517,12 @@ def astype(self, dtype: Dtype, copy: bool = True) -> ArrayLike: | |
return self._set_dtype(dtype) | ||
return np.array(self, dtype=dtype, copy=copy) | ||
|
||
@cache_readonly | ||
def ndim(self) -> int: | ||
""" | ||
Number of dimensions of the Categorical | ||
""" | ||
return self._codes.ndim | ||
|
||
@cache_readonly | ||
def size(self) -> int: | ||
""" | ||
return the len of myself | ||
""" | ||
return len(self) | ||
return self._codes.size | ||
|
||
@cache_readonly | ||
def itemsize(self) -> int: | ||
|
@@ -1764,18 +1757,10 @@ def ravel(self, order="C"): | |
) | ||
return np.array(self) | ||
|
||
def view(self): | ||
""" | ||
Return a view of myself. | ||
|
||
For internal compatibility with numpy arrays. | ||
|
||
Returns | ||
------- | ||
view : Categorical | ||
Returns `self`! | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you update the doc-string (or just inherit it) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. see my comment above, dtype needs to be added (but best to just remove the doc-string completely and inherit it) |
||
""" | ||
return self | ||
def view(self, dtype=None): | ||
if dtype is not None: | ||
raise NotImplementedError(dtype) | ||
return self._constructor(values=self._codes, dtype=self.dtype, fastpath=True) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the default implementation does not work here? (or is this more efficient?) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is more efficient, yes (note the fastpath kwarg) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add a note for that? (eg "override base implementation to use fastpath") |
||
|
||
def to_dense(self): | ||
""" | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -557,18 +557,8 @@ def astype(self, dtype, copy=True): | |
return np.asarray(self, dtype=dtype) | ||
|
||
def view(self, dtype=None): | ||
""" | ||
New view on this array with the same data. | ||
|
||
Parameters | ||
---------- | ||
dtype : numpy dtype, optional | ||
|
||
Returns | ||
------- | ||
ndarray | ||
jbrockmendel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
With the specified `dtype`. | ||
""" | ||
if dtype is None or dtype is self.dtype: | ||
return type(self)(self._data, dtype=self.dtype) | ||
return self._data.view(dtype=dtype) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not returning an EA? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. correct, the current implementation is only used to return an ndarray. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But so that is "violating" the spec? (it should return a new EA (not self), but not an ndarray) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, but its already in place and used extensively. I guess we could alter the spec to allow returning ndarray There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Sorry to further bother on this PR, but I would not alter the spec. For the interface, it should just be a new EA of the same type, no? Shouldn't we (ideally, at some point) change our own implementation to return an EA as well for consistency? |
||
|
||
# ------------------------------------------------------------------ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -75,3 +75,18 @@ def test_copy(self, data): | |
|
||
data[1] = data[0] | ||
assert result[1] != result[0] | ||
|
||
def test_view(self, data): | ||
# view with no dtype should return a shallow copy, *not* the same | ||
# object | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do you need to also test with a dtype != None? (e.g. that this raises NIE) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess to make sure the kwarg is accepted, sure |
||
assert data[1] != data[0] | ||
|
||
result = data.view() | ||
assert result is not data | ||
assert type(result) == type(data) | ||
|
||
result[1] = result[0] | ||
assert data[1] == data[0] | ||
|
||
# check specifically that the `dtype` kwarg is accepted | ||
data.view(dtype=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this change will help. There can be EAs out there that define this themselves and override this with a hardcoded 1, so we will still need to define a wrapper I think?
(so therefore I would maybe rather leave it as is, to ensure we cover that use case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im not sure I understand the problem here. is there a case in which this wont be correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I supposed you made this change because for the other PR you are patching
self.shape
to return (N, 1) or (1, N), and so with this change, ndim automatically follows that.But in general, you can't rely on the fact that self.ndim already is correctly following self.shape, so you will always have to patch ndim as well.
(exact terminology of "patching" might not be fully reflect to other PR on 2D EAs, need to update myself on that, but hope to give the idea)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will revert so we'll handle it in the next pass