-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
EA: implement+test EA.view #27633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EA: implement+test EA.view #27633
Changes from 16 commits
c368fb8
27a0af2
6636c89
a311f6e
f99afb5
1c3d81c
cddcab4
aa290bf
6053a50
426c434
67090a6
d84a0ba
d9158ef
86c9ab9
2b5e5fa
24b235c
c44aada
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -64,6 +64,7 @@ class ExtensionArray: | |
shift | ||
take | ||
unique | ||
view | ||
_concat_same_type | ||
_formatter | ||
_formatting_values | ||
|
@@ -147,7 +148,7 @@ class ExtensionArray: | |
If implementing NumPy's ``__array_ufunc__`` interface, pandas expects | ||
that | ||
|
||
1. You defer by raising ``NotImplemented`` when any Series are present | ||
1. You defer by returning ``NotImplemented`` when any Series are present | ||
in `inputs`. Pandas will extract the arrays and call the ufunc again. | ||
2. You define a ``_HANDLED_TYPES`` tuple as an attribute on the class. | ||
Pandas inspect this to determine whether the ufunc is valid for the | ||
|
@@ -862,6 +863,27 @@ def copy(self) -> ABCExtensionArray: | |
""" | ||
raise AbstractMethodError(self) | ||
|
||
def view(self, dtype=None) -> Union[ABCExtensionArray, np.ndarray]: | ||
""" | ||
Return a view on the array. | ||
jreback marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Parameters | ||
---------- | ||
dtype : str, np.dtype, or ExtensionDtype, optional | ||
Default None | ||
|
||
Returns | ||
------- | ||
ExtensionArray or np.ndarray | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would not document this. For implementors, it is simply an EA that it should be. I think our own array not doing can be seen as an historical artifact that ideally will be fixed? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. reverted |
||
""" | ||
# NB: | ||
# - This must return a *new* object referencing the same data, not self. | ||
# - The only case that *must* be implemented is with dtype=None, | ||
# giving a view with the same dtype as self. | ||
if dtype is not None: | ||
raise NotImplementedError(dtype) | ||
return self[:] | ||
|
||
# ------------------------------------------------------------------------ | ||
# Printing | ||
# ------------------------------------------------------------------------ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -517,19 +517,12 @@ def astype(self, dtype: Dtype, copy: bool = True) -> ArrayLike: | |
return self._set_dtype(dtype) | ||
return np.array(self, dtype=dtype, copy=copy) | ||
|
||
@cache_readonly | ||
def ndim(self) -> int: | ||
""" | ||
Number of dimensions of the Categorical | ||
""" | ||
return self._codes.ndim | ||
|
||
@cache_readonly | ||
def size(self) -> int: | ||
""" | ||
return the len of myself | ||
""" | ||
return len(self) | ||
return self._codes.size | ||
|
||
@cache_readonly | ||
def itemsize(self) -> int: | ||
|
@@ -1764,18 +1757,10 @@ def ravel(self, order="C"): | |
) | ||
return np.array(self) | ||
|
||
def view(self): | ||
""" | ||
Return a view of myself. | ||
|
||
For internal compatibility with numpy arrays. | ||
|
||
Returns | ||
------- | ||
view : Categorical | ||
Returns `self`! | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you update the doc-string (or just inherit it) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. see my comment above, dtype needs to be added (but best to just remove the doc-string completely and inherit it) |
||
""" | ||
return self | ||
def view(self, dtype=None): | ||
if dtype is not None: | ||
raise NotImplementedError(dtype) | ||
return self._constructor(values=self._codes, dtype=self.dtype, fastpath=True) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the default implementation does not work here? (or is this more efficient?) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is more efficient, yes (note the fastpath kwarg) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add a note for that? (eg "override base implementation to use fastpath") |
||
|
||
def to_dense(self): | ||
""" | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -554,18 +554,8 @@ def astype(self, dtype, copy=True): | |
return np.asarray(self, dtype=dtype) | ||
|
||
def view(self, dtype=None): | ||
""" | ||
New view on this array with the same data. | ||
|
||
Parameters | ||
---------- | ||
dtype : numpy dtype, optional | ||
|
||
Returns | ||
------- | ||
ndarray | ||
jbrockmendel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
With the specified `dtype`. | ||
""" | ||
if dtype is None or dtype is self.dtype: | ||
return type(self)(self._data, dtype=self.dtype) | ||
return self._data.view(dtype=dtype) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not returning an EA? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. correct, the current implementation is only used to return an ndarray. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But so that is "violating" the spec? (it should return a new EA (not self), but not an ndarray) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, but its already in place and used extensively. I guess we could alter the spec to allow returning ndarray There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Sorry to further bother on this PR, but I would not alter the spec. For the interface, it should just be a new EA of the same type, no? Shouldn't we (ideally, at some point) change our own implementation to return an EA as well for consistency? |
||
|
||
# ------------------------------------------------------------------ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -75,3 +75,18 @@ def test_copy(self, data): | |
|
||
data[1] = data[0] | ||
assert result[1] != result[0] | ||
|
||
def test_view(self, data): | ||
# view with no dtype should return a shallow copy, *not* the same | ||
# object | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do you need to also test with a dtype != None? (e.g. that this raises NIE) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess to make sure the kwarg is accepted, sure |
||
assert data[1] != data[0] | ||
|
||
result = data.view() | ||
assert result is not data | ||
assert type(result) == type(data) | ||
|
||
result[1] = result[0] | ||
assert data[1] == data[0] | ||
|
||
# check specifically that the `dtype` kwarg is accepted | ||
data.view(dtype=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would this return a np.ndarray OR an EA for an EA? (is this @jorisvandenbossche question)?
when / why would this be the case? this is pretty confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the same as Joris's question. The answer is that it is probably better to return EA-only, but ATM DTA/TDA/PA have existing implementations that return ndarray