-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Implement take for EA mixins #23159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement take for EA mixins #23159
Changes from all commits
084cb34
0971615
a83adad
a16c900
755bc5c
50234e9
16c361b
dcc985e
a1fea06
23d2724
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,7 +11,8 @@ | |
from pandas._libs.tslibs.period import ( | ||
Period, DIFFERENT_FREQ_INDEX, IncompatibleFrequency) | ||
|
||
from pandas.errors import NullFrequencyError, PerformanceWarning | ||
from pandas.errors import ( | ||
NullFrequencyError, PerformanceWarning, AbstractMethodError) | ||
from pandas import compat | ||
|
||
from pandas.tseries import frequencies | ||
|
@@ -36,7 +37,7 @@ | |
from pandas.core.dtypes.dtypes import DatetimeTZDtype | ||
|
||
import pandas.core.common as com | ||
from pandas.core.algorithms import checked_add_with_arr | ||
from pandas.core.algorithms import checked_add_with_arr, take | ||
|
||
from .base import ExtensionOpsMixin | ||
from pandas.util._decorators import deprecate_kwarg | ||
|
@@ -77,12 +78,10 @@ class AttributesMixin(object): | |
@property | ||
def _attributes(self): | ||
# Inheriting subclass should implement _attributes as a list of strings | ||
from pandas.errors import AbstractMethodError | ||
raise AbstractMethodError(self) | ||
|
||
@classmethod | ||
def _simple_new(cls, values, **kwargs): | ||
from pandas.errors import AbstractMethodError | ||
raise AbstractMethodError(cls) | ||
|
||
def _get_attributes_dict(self): | ||
|
@@ -119,7 +118,7 @@ def _box_func(self): | |
""" | ||
box function to get object from internal representation | ||
""" | ||
raise com.AbstractMethodError(self) | ||
raise AbstractMethodError(self) | ||
|
||
def _box_values(self, values): | ||
""" | ||
|
@@ -140,6 +139,67 @@ def asi8(self): | |
# do not cache or you'll create a memory leak | ||
return self.values.view('i8') | ||
|
||
# ------------------------------------------------------------------ | ||
# Extension Array Interface | ||
# TODO: | ||
# _from_sequence | ||
# _from_factorized | ||
# __setitem__ | ||
# _values_for_argsort | ||
# argsort | ||
# fillna | ||
# dropna | ||
# shift | ||
# unique | ||
# _values_for_factorize | ||
# factorize | ||
# _formatting_values | ||
# _reduce | ||
# copy | ||
|
||
def _validate_fill_value(self, fill_value): | ||
""" | ||
If a fill_value is passed to `take` convert it to an i8 representation, | ||
raising ValueError if this is not possible. | ||
|
||
Parameters | ||
---------- | ||
fill_value : object | ||
|
||
Returns | ||
------- | ||
fill_value : np.int64 | ||
|
||
Raises | ||
------ | ||
ValueError | ||
""" | ||
raise AbstractMethodError(self) | ||
|
||
def take(self, indices, allow_fill=False, fill_value=None): | ||
|
||
if allow_fill: | ||
fill_value = self._validate_fill_value(fill_value) | ||
|
||
new_values = take(self._data, | ||
indices, | ||
allow_fill=allow_fill, | ||
fill_value=fill_value) | ||
|
||
# TODO: use "infer"? Why does not passing freq cause | ||
# failures in py37 but not py27? | ||
freq = self.freq if is_period_dtype(self) else None | ||
return self._shallow_copy(new_values, freq=freq) | ||
|
||
@classmethod | ||
def _concat_same_type(cls, to_concat): | ||
# for TimedeltaArray and PeriodArray; DatetimeArray overrides | ||
freqs = {x.freq for x in to_concat} | ||
assert len(freqs) == 1 | ||
freq = list(freqs)[0] | ||
values = np.concatenate([x._data for x in to_concat]) | ||
return cls._simple_new(values, freq=freq) | ||
|
||
# ------------------------------------------------------------------ | ||
# Array-like Methods | ||
|
||
|
@@ -211,6 +271,10 @@ def astype(self, dtype, copy=True): | |
# ------------------------------------------------------------------ | ||
# Null Handling | ||
|
||
def isna(self): | ||
# EA Interface | ||
return self._isnan | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comment as in the other PRs, I think we should keep this concept of (but since there are a lot of places here where it is used, I am fine with keeping There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a reasonable view, but one that I'd like to put off dealing with for as long as possible. The eventual move to composition is going to cause a mismatch between what is and is not cacheable (like |
||
|
||
@property # NB: override with cache_readonly in immutable subclasses | ||
def _isnan(self): | ||
""" return if each value is nan""" | ||
|
@@ -352,13 +416,13 @@ def _add_datelike(self, other): | |
typ=type(other).__name__)) | ||
|
||
def _sub_datelike(self, other): | ||
raise com.AbstractMethodError(self) | ||
raise AbstractMethodError(self) | ||
|
||
def _sub_period(self, other): | ||
return NotImplemented | ||
|
||
def _add_offset(self, offset): | ||
raise com.AbstractMethodError(self) | ||
raise AbstractMethodError(self) | ||
|
||
def _add_delta(self, other): | ||
return NotImplemented | ||
|
@@ -380,7 +444,7 @@ def _add_delta_tdi(self, other): | |
Add a delta of a TimedeltaIndex | ||
return the i8 result view | ||
""" | ||
if not len(self) == len(other): | ||
if len(self) != len(other): | ||
raise ValueError("cannot add indices of unequal length") | ||
|
||
if isinstance(other, np.ndarray): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not really tested for now, is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but I am -1 on moving things out of the PeriodArray PR that cannot stand on its own
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming you're implicitly saying you'd be OK with this if tests were implemented as part of this PR, I think this is an excellent comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that is not really what I meant :-)
Because now you added tests for something that would already be tested in the PeriodArray PR. We don't need explicit tests for this method, as it is tested through all existing methods for concat, and it already has explicit tests in the base extension tests.
So I just think this simply belongs in a PR where we actually make PeriodArray an ExtensionArray.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benefit of doing things in smaller chunks is that each chunk gets more focused. e.g. I don't want to throw shade on Tom's PR, but the
take
implementation missedfill_value
that is aPeriod
with non-matchingfreq
. Doing a couple methods at a time lets both authors and reviewers focus more precisely.And while "too many tests" does have a downside, it really isn't in the top 5 things I'm going to worry about today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And you can review the PR in detail (which of course takes time), and have pointed that out there.
Yes, for sure smaller PRs are better, and to the extent possible, we should strive for that. I fully agree the huge PR makes it very difficult to do a proper review.
But getting out chunks that cannot live on its own can also be hard to review. And in this case it also means to need to completely duplicate tests, and also testing the implementation detail and not the end result.
Agreeing on what the end result of this refactor will look like, and having a good view on that, is a worry for me. And so what I am saying is, that many small PRs also has a clear downside of making this harder.