Skip to content

EA: implement+test EA.view #27633

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Aug 9, 2019
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/reference/extensions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ objects.
api.extensions.ExtensionArray.argsort
api.extensions.ExtensionArray.astype
api.extensions.ExtensionArray.copy
api.extensions.ExtensionArray.view
api.extensions.ExtensionArray.dropna
api.extensions.ExtensionArray.factorize
api.extensions.ExtensionArray.fillna
Expand Down
24 changes: 23 additions & 1 deletion pandas/core/arrays/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ class ExtensionArray:
shift
take
unique
view
_concat_same_type
_formatter
_formatting_values
Expand Down Expand Up @@ -147,7 +148,7 @@ class ExtensionArray:
If implementing NumPy's ``__array_ufunc__`` interface, pandas expects
that

1. You defer by raising ``NotImplemented`` when any Series are present
1. You defer by returning ``NotImplemented`` when any Series are present
in `inputs`. Pandas will extract the arrays and call the ufunc again.
2. You define a ``_HANDLED_TYPES`` tuple as an attribute on the class.
Pandas inspect this to determine whether the ufunc is valid for the
Expand Down Expand Up @@ -862,6 +863,27 @@ def copy(self) -> ABCExtensionArray:
"""
raise AbstractMethodError(self)

def view(self, dtype=None) -> Union[ABCExtensionArray, np.ndarray]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would this return a np.ndarray OR an EA for an EA? (is this @jorisvandenbossche question)?

when / why would this be the case? this is pretty confusing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the same as Joris's question. The answer is that it is probably better to return EA-only, but ATM DTA/TDA/PA have existing implementations that return ndarray

"""
Return a view on the array.

Parameters
----------
dtype : str, np.dtype, or ExtensionDtype, optional
Default None

Returns
-------
ExtensionArray or np.ndarray
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not document this. For implementors, it is simply an EA that it should be. I think our own array not doing can be seen as an historical artifact that ideally will be fixed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

"""
# NB:
# - This must return a *new* object referencing the same data, not self.
# - The only case that *must* be implemented is with dtype=None,
# giving a view with the same dtype as self.
if dtype is not None:
raise NotImplementedError(dtype)
return self[:]

# ------------------------------------------------------------------------
# Printing
# ------------------------------------------------------------------------
Expand Down
25 changes: 5 additions & 20 deletions pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -517,19 +517,12 @@ def astype(self, dtype: Dtype, copy: bool = True) -> ArrayLike:
return self._set_dtype(dtype)
return np.array(self, dtype=dtype, copy=copy)

@cache_readonly
def ndim(self) -> int:
"""
Number of dimensions of the Categorical
"""
return self._codes.ndim

@cache_readonly
def size(self) -> int:
"""
return the len of myself
"""
return len(self)
return self._codes.size

@cache_readonly
def itemsize(self) -> int:
Expand Down Expand Up @@ -1764,18 +1757,10 @@ def ravel(self, order="C"):
)
return np.array(self)

def view(self):
"""
Return a view of myself.

For internal compatibility with numpy arrays.

Returns
-------
view : Categorical
Returns `self`!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update the doc-string (or just inherit it)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment above, dtype needs to be added (but best to just remove the doc-string completely and inherit it)

"""
return self
def view(self, dtype=None):
if dtype is not None:
raise NotImplementedError(dtype)
return self._constructor(values=self._codes, dtype=self.dtype, fastpath=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default implementation does not work here? (or is this more efficient?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more efficient, yes (note the fastpath kwarg)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a note for that? (eg "override base implementation to use fastpath")


def to_dense(self):
"""
Expand Down
14 changes: 2 additions & 12 deletions pandas/core/arrays/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -554,18 +554,8 @@ def astype(self, dtype, copy=True):
return np.asarray(self, dtype=dtype)

def view(self, dtype=None):
"""
New view on this array with the same data.

Parameters
----------
dtype : numpy dtype, optional

Returns
-------
ndarray
With the specified `dtype`.
"""
if dtype is None or dtype is self.dtype:
return type(self)(self._data, dtype=self.dtype)
return self._data.view(dtype=dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not returning an EA?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct, the current implementation is only used to return an ndarray.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But so that is "violating" the spec? (it should return a new EA (not self), but not an ndarray)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but its already in place and used extensively. I guess we could alter the spec to allow returning ndarray

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could alter the spec to allow returning ndarray

Sorry to further bother on this PR, but I would not alter the spec. For the interface, it should just be a new EA of the same type, no?

Shouldn't we (ideally, at some point) change our own implementation to return an EA as well for consistency?


# ------------------------------------------------------------------
Expand Down
8 changes: 2 additions & 6 deletions pandas/core/arrays/interval.py
Original file line number Diff line number Diff line change
Expand Up @@ -739,18 +739,14 @@ def isna(self):
return isna(self.left)

@property
def nbytes(self):
def nbytes(self) -> int:
return self.left.nbytes + self.right.nbytes

@property
def size(self):
def size(self) -> int:
# Avoid materializing self.values
return self.left.size

@property
def shape(self):
return self.left.shape

def take(self, indices, allow_fill=False, fill_value=None, axis=None, **kwargs):
"""
Take elements from the IntervalArray.
Expand Down
4 changes: 2 additions & 2 deletions pandas/core/arrays/numpy_.py
Original file line number Diff line number Diff line change
Expand Up @@ -241,11 +241,11 @@ def __setitem__(self, key, value):
else:
self._ndarray[key] = value

def __len__(self):
def __len__(self) -> int:
return len(self._ndarray)

@property
def nbytes(self):
def nbytes(self) -> int:
return self._ndarray.nbytes

def isna(self):
Expand Down
8 changes: 4 additions & 4 deletions pandas/core/arrays/sparse.py
Original file line number Diff line number Diff line change
Expand Up @@ -839,7 +839,7 @@ def fill_value(self, value):
self._dtype = SparseDtype(self.dtype.subtype, value)

@property
def kind(self):
def kind(self) -> str:
"""
The kind of sparse index for this array. One of {'integer', 'block'}.
"""
Expand All @@ -854,7 +854,7 @@ def _valid_sp_values(self):
mask = notna(sp_vals)
return sp_vals[mask]

def __len__(self):
def __len__(self) -> int:
return self.sp_index.length

@property
Expand All @@ -868,7 +868,7 @@ def _fill_value_matches(self, fill_value):
return self.fill_value == fill_value

@property
def nbytes(self):
def nbytes(self) -> int:
return self.sp_values.nbytes + self.sp_index.nbytes

@property
Expand All @@ -886,7 +886,7 @@ def density(self):
return r

@property
def npoints(self):
def npoints(self) -> int:
"""
The number of non- ``fill_value`` points.

Expand Down
4 changes: 4 additions & 0 deletions pandas/tests/extension/arrow/test_bool.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,10 @@ def test_copy(self, data):
# __setitem__ does not work, so we only have a smoke-test
data.copy()

def test_view(self, data):
# __setitem__ does not work, so we only have a smoke-test
data.view()


class TestConstructors(BaseArrowTests, base.BaseConstructorsTests):
def test_from_dtype(self, data):
Expand Down
15 changes: 15 additions & 0 deletions pandas/tests/extension/base/interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,3 +75,18 @@ def test_copy(self, data):

data[1] = data[0]
assert result[1] != result[0]

def test_view(self, data):
# view with no dtype should return a shallow copy, *not* the same
# object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need to also test with a dtype != None? (e.g. that this raises NIE)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess to make sure the kwarg is accepted, sure

assert data[1] != data[0]

result = data.view()
assert result is not data
assert type(result) == type(data)

result[1] = result[0]
assert data[1] == data[0]

# check specifically that the `dtype` kwarg is accepted
data.view(dtype=None)
4 changes: 2 additions & 2 deletions pandas/tests/extension/decimal/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,11 +137,11 @@ def __setitem__(self, key, value):
value = decimal.Decimal(value)
self._data[key] = value

def __len__(self):
def __len__(self) -> int:
return len(self._data)

@property
def nbytes(self):
def nbytes(self) -> int:
n = len(self)
if n:
return n * sys.getsizeof(self[0])
Expand Down
7 changes: 5 additions & 2 deletions pandas/tests/extension/json/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,9 @@ def __getitem__(self, item):
elif isinstance(item, abc.Iterable):
# fancy indexing
return type(self)([self.data[i] for i in item])
elif isinstance(item, slice) and item == slice(None):
# Make sure we get a view
return type(self)(self.data)
else:
# slice
return type(self)(self.data[item])
Expand All @@ -103,11 +106,11 @@ def __setitem__(self, key, value):
assert isinstance(v, self.dtype.type)
self.data[k] = v

def __len__(self):
def __len__(self) -> int:
return len(self.data)

@property
def nbytes(self):
def nbytes(self) -> int:
return sys.getsizeof(self.data)

def isna(self):
Expand Down
5 changes: 4 additions & 1 deletion pandas/tests/extension/test_interval.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,10 @@ class TestGrouping(BaseInterval, base.BaseGroupbyTests):


class TestInterface(BaseInterval, base.BaseInterfaceTests):
pass
def test_view(self, data):
# __setitem__ incorrectly makes a copy (GH#27147), so we only
# have a smoke-test
data.view()


class TestReduce(base.BaseNoReduceTests):
Expand Down
4 changes: 4 additions & 0 deletions pandas/tests/extension/test_sparse.py
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,10 @@ def test_copy(self, data):
# __setitem__ does not work, so we only have a smoke-test
data.copy()

def test_view(self, data):
# __setitem__ does not work, so we only have a smoke-test
data.view()


class TestConstructors(BaseSparseTests, base.BaseConstructorsTests):
pass
Expand Down