Skip to content

EA: implement+test EA.view #27633

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Aug 9, 2019
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/reference/extensions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ objects.
api.extensions.ExtensionArray.argsort
api.extensions.ExtensionArray.astype
api.extensions.ExtensionArray.copy
api.extensions.ExtensionArray.view
api.extensions.ExtensionArray.dropna
api.extensions.ExtensionArray.factorize
api.extensions.ExtensionArray.fillna
Expand Down
28 changes: 26 additions & 2 deletions pandas/core/arrays/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ class ExtensionArray:
shift
take
unique
view
_concat_same_type
_formatter
_formatting_values
Expand Down Expand Up @@ -147,7 +148,7 @@ class ExtensionArray:
If implementing NumPy's ``__array_ufunc__`` interface, pandas expects
that

1. You defer by raising ``NotImplemented`` when any Series are present
1. You defer by returning ``NotImplemented`` when any Series are present
in `inputs`. Pandas will extract the arrays and call the ufunc again.
2. You define a ``_HANDLED_TYPES`` tuple as an attribute on the class.
Pandas inspect this to determine whether the ufunc is valid for the
Expand Down Expand Up @@ -354,7 +355,7 @@ def ndim(self) -> int:
"""
Extension Arrays are only allowed to be 1-dimensional.
"""
return 1
return len(self.shape)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this change will help. There can be EAs out there that define this themselves and override this with a hardcoded 1, so we will still need to define a wrapper I think?
(so therefore I would maybe rather leave it as is, to ensure we cover that use case)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im not sure I understand the problem here. is there a case in which this wont be correct?

Copy link
Member

@jorisvandenbossche jorisvandenbossche Aug 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I supposed you made this change because for the other PR you are patching self.shape to return (N, 1) or (1, N), and so with this change, ndim automatically follows that.
But in general, you can't rely on the fact that self.ndim already is correctly following self.shape, so you will always have to patch ndim as well.

(exact terminology of "patching" might not be fully reflect to other PR on 2D EAs, need to update myself on that, but hope to give the idea)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will revert so we'll handle it in the next pass


@property
def nbytes(self) -> int:
Expand Down Expand Up @@ -862,6 +863,29 @@ def copy(self) -> ABCExtensionArray:
"""
raise AbstractMethodError(self)

def view(self, dtype=None) -> ABCExtensionArray:
"""
Return a view on the array.

Parameters
----------
dtype : str, np.dtype, or ExtensionDtype, optional
Default None

Returns
-------
ExtensionArray

Notes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been inconsistent about this, be in general I've tried to keep docstrings user-facing. For implementation notes I've just used regular comments.

What are the consequences of .view returning self?

Do we have any restrictions on this being zero-copy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any restrictions on this being zero-copy?

The test added in this PR will fail if we don't get an actual view.

What are the consequences of .view returning self?

view is going to be used by the default implementation of reshape, so returning self would cause all kinds of trouble.

The default implementation should Just Work as long as self[:] returns a view, which should be the case anyway (JSONArray ATM returns a copy)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convert this to normal comments as Tom mentioned?

-----
- This must return a *new* object, not self.
- The only case that *must* be implemented is with dtype=None,
giving a view with the same dtype as self.
"""
if dtype is not None:
raise NotImplementedError(dtype)
return self[:]

# ------------------------------------------------------------------------
# Printing
# ------------------------------------------------------------------------
Expand Down
25 changes: 5 additions & 20 deletions pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -517,19 +517,12 @@ def astype(self, dtype: Dtype, copy: bool = True) -> ArrayLike:
return self._set_dtype(dtype)
return np.array(self, dtype=dtype, copy=copy)

@cache_readonly
def ndim(self) -> int:
"""
Number of dimensions of the Categorical
"""
return self._codes.ndim

@cache_readonly
def size(self) -> int:
"""
return the len of myself
"""
return len(self)
return self._codes.size

@cache_readonly
def itemsize(self) -> int:
Expand Down Expand Up @@ -1764,18 +1757,10 @@ def ravel(self, order="C"):
)
return np.array(self)

def view(self):
"""
Return a view of myself.

For internal compatibility with numpy arrays.

Returns
-------
view : Categorical
Returns `self`!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update the doc-string (or just inherit it)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment above, dtype needs to be added (but best to just remove the doc-string completely and inherit it)

"""
return self
def view(self, dtype=None):
if dtype is not None:
raise NotImplementedError(dtype)
return self._constructor(values=self._codes, dtype=self.dtype, fastpath=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default implementation does not work here? (or is this more efficient?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more efficient, yes (note the fastpath kwarg)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a note for that? (eg "override base implementation to use fastpath")


def to_dense(self):
"""
Expand Down
14 changes: 2 additions & 12 deletions pandas/core/arrays/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -557,18 +557,8 @@ def astype(self, dtype, copy=True):
return np.asarray(self, dtype=dtype)

def view(self, dtype=None):
"""
New view on this array with the same data.

Parameters
----------
dtype : numpy dtype, optional

Returns
-------
ndarray
With the specified `dtype`.
"""
if dtype is None or dtype is self.dtype:
return type(self)(self._data, dtype=self.dtype)
return self._data.view(dtype=dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not returning an EA?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct, the current implementation is only used to return an ndarray.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But so that is "violating" the spec? (it should return a new EA (not self), but not an ndarray)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but its already in place and used extensively. I guess we could alter the spec to allow returning ndarray

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could alter the spec to allow returning ndarray

Sorry to further bother on this PR, but I would not alter the spec. For the interface, it should just be a new EA of the same type, no?

Shouldn't we (ideally, at some point) change our own implementation to return an EA as well for consistency?


# ------------------------------------------------------------------
Expand Down
8 changes: 2 additions & 6 deletions pandas/core/arrays/interval.py
Original file line number Diff line number Diff line change
Expand Up @@ -739,18 +739,14 @@ def isna(self):
return isna(self.left)

@property
def nbytes(self):
def nbytes(self) -> int:
return self.left.nbytes + self.right.nbytes

@property
def size(self):
def size(self) -> int:
# Avoid materializing self.values
return self.left.size

@property
def shape(self):
return self.left.shape

def take(self, indices, allow_fill=False, fill_value=None, axis=None, **kwargs):
"""
Take elements from the IntervalArray.
Expand Down
4 changes: 2 additions & 2 deletions pandas/core/arrays/numpy_.py
Original file line number Diff line number Diff line change
Expand Up @@ -241,11 +241,11 @@ def __setitem__(self, key, value):
else:
self._ndarray[key] = value

def __len__(self):
def __len__(self) -> int:
return len(self._ndarray)

@property
def nbytes(self):
def nbytes(self) -> int:
return self._ndarray.nbytes

def isna(self):
Expand Down
8 changes: 4 additions & 4 deletions pandas/core/arrays/sparse.py
Original file line number Diff line number Diff line change
Expand Up @@ -839,7 +839,7 @@ def fill_value(self, value):
self._dtype = SparseDtype(self.dtype.subtype, value)

@property
def kind(self):
def kind(self) -> str:
"""
The kind of sparse index for this array. One of {'integer', 'block'}.
"""
Expand All @@ -854,7 +854,7 @@ def _valid_sp_values(self):
mask = notna(sp_vals)
return sp_vals[mask]

def __len__(self):
def __len__(self) -> int:
return self.sp_index.length

@property
Expand All @@ -868,7 +868,7 @@ def _fill_value_matches(self, fill_value):
return self.fill_value == fill_value

@property
def nbytes(self):
def nbytes(self) -> int:
return self.sp_values.nbytes + self.sp_index.nbytes

@property
Expand All @@ -886,7 +886,7 @@ def density(self):
return r

@property
def npoints(self):
def npoints(self) -> int:
"""
The number of non- ``fill_value`` points.

Expand Down
4 changes: 4 additions & 0 deletions pandas/tests/extension/arrow/test_bool.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,10 @@ def test_copy(self, data):
# __setitem__ does not work, so we only have a smoke-test
data.copy()

def test_view(self, data):
# __setitem__ does not work, so we only have a smoke-test
data.view()


class TestConstructors(BaseArrowTests, base.BaseConstructorsTests):
def test_from_dtype(self, data):
Expand Down
15 changes: 15 additions & 0 deletions pandas/tests/extension/base/interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,3 +75,18 @@ def test_copy(self, data):

data[1] = data[0]
assert result[1] != result[0]

def test_view(self, data):
# view with no dtype should return a shallow copy, *not* the same
# object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need to also test with a dtype != None? (e.g. that this raises NIE)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess to make sure the kwarg is accepted, sure

assert data[1] != data[0]

result = data.view()
assert result is not data
assert type(result) == type(data)

result[1] = result[0]
assert data[1] == data[0]

# check specifically that the `dtype` kwarg is accepted
data.view(dtype=None)
4 changes: 2 additions & 2 deletions pandas/tests/extension/decimal/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,11 +137,11 @@ def __setitem__(self, key, value):
value = decimal.Decimal(value)
self._data[key] = value

def __len__(self):
def __len__(self) -> int:
return len(self._data)

@property
def nbytes(self):
def nbytes(self) -> int:
n = len(self)
if n:
return n * sys.getsizeof(self[0])
Expand Down
7 changes: 5 additions & 2 deletions pandas/tests/extension/json/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,9 @@ def __getitem__(self, item):
elif isinstance(item, abc.Iterable):
# fancy indexing
return type(self)([self.data[i] for i in item])
elif isinstance(item, slice) and item == slice(None):
# Make sure we get a view
return type(self)(self.data)
else:
# slice
return type(self)(self.data[item])
Expand All @@ -103,11 +106,11 @@ def __setitem__(self, key, value):
assert isinstance(v, self.dtype.type)
self.data[k] = v

def __len__(self):
def __len__(self) -> int:
return len(self.data)

@property
def nbytes(self):
def nbytes(self) -> int:
return sys.getsizeof(self.data)

def isna(self):
Expand Down
5 changes: 4 additions & 1 deletion pandas/tests/extension/test_interval.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,10 @@ class TestGrouping(BaseInterval, base.BaseGroupbyTests):


class TestInterface(BaseInterval, base.BaseInterfaceTests):
pass
def test_view(self, data):
# __setitem__ incorrectly makes a copy (GH#27147), so we only
# have a smoke-test
data.view()


class TestReduce(base.BaseNoReduceTests):
Expand Down
4 changes: 4 additions & 0 deletions pandas/tests/extension/test_sparse.py
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,10 @@ def test_copy(self, data):
# __setitem__ does not work, so we only have a smoke-test
data.copy()

def test_view(self, data):
# __setitem__ does not work, so we only have a smoke-test
data.view()


class TestConstructors(BaseSparseTests, base.BaseConstructorsTests):
pass
Expand Down