-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Implement take for EA mixins #23159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement take for EA mixins #23159
Conversation
Hello @jbrockmendel! Thanks for submitting the PR.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me in general (didn't look at the tests in detail), added some comments.
class SharedTests(object): | ||
index_cls = None | ||
|
||
def test_take(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think such basic tests are already covered in the base extension tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea. I'm pretty sure the dtype-specific tests must be non-redundant.
return self._shallow_copy(new_values) | ||
|
||
@classmethod | ||
def _concat_same_type(cls, to_concat): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not really tested for now, is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but I am -1 on moving things out of the PeriodArray PR that cannot stand on its own
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming you're implicitly saying you'd be OK with this if tests were implemented as part of this PR, I think this is an excellent comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that is not really what I meant :-)
Because now you added tests for something that would already be tested in the PeriodArray PR. We don't need explicit tests for this method, as it is tested through all existing methods for concat, and it already has explicit tests in the base extension tests.
So I just think this simply belongs in a PR where we actually make PeriodArray an ExtensionArray.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would already be tested in the PeriodArray PR.
The benefit of doing things in smaller chunks is that each chunk gets more focused. e.g. I don't want to throw shade on Tom's PR, but the take
implementation missed fill_value
that is a Period
with non-matching freq
. Doing a couple methods at a time lets both authors and reviewers focus more precisely.
And while "too many tests" does have a downside, it really isn't in the top 5 things I'm going to worry about today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the take implementation missed fill_value that is a Period with non-matching freq
And you can review the PR in detail (which of course takes time), and have pointed that out there.
Yes, for sure smaller PRs are better, and to the extent possible, we should strive for that. I fully agree the huge PR makes it very difficult to do a proper review.
But getting out chunks that cannot live on its own can also be hard to review. And in this case it also means to need to completely duplicate tests, and also testing the implementation detail and not the end result.
it really isn't in the top 5 things I'm going to worry about today.
Agreeing on what the end result of this refactor will look like, and having a good view on that, is a worry for me. And so what I am saying is, that many small PRs also has a clear downside of making this harder.
@@ -211,6 +271,10 @@ def astype(self, dtype, copy=True): | |||
# ------------------------------------------------------------------ | |||
# Null Handling | |||
|
|||
def isna(self): | |||
# EA Interface | |||
return self._isnan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as in the other PRs, I think we should keep this concept of _isnan
on the Index (basically my question about it is: for the array classes, what is the advantages of having a _isnan
instead of simply using isna()
everywhere?)
(but since there are a lot of places here where it is used, I am fine with keeping _isna
itself in here until we de the actual split)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a reasonable view, but one that I'd like to put off dealing with for as long as possible.
The eventual move to composition is going to cause a mismatch between what is and is not cacheable (like _isnan
is now on the Index subclasses). Since the EA subclasses actually use _isnan
in arithmetic/comparison ops, there is a performance hit if we're not careful. Since I haven't thought of a good solution to this, putting it off is the next best thing.
Codecov Report
@@ Coverage Diff @@
## master #23159 +/- ##
==========================================
+ Coverage 92.19% 92.2% +<.01%
==========================================
Files 169 169
Lines 50959 51008 +49
==========================================
+ Hits 46980 47030 +50
+ Misses 3979 3978 -1
Continue to review full report at Codecov.
|
fb4a6af
to
a1fea06
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear / put it a bit more strongly: let's not merge any PR related to periods until we agree on the way forward (smaller like this one or Tom's PR), unless there is an explicit blessing of Tom.
Closing this. @TomAugspurger i think some of the “take” code/tests may be worth salvaging for the PeriodArray PR. |
We could also leave this open, so you can rebase once the PeriodArray PR is merged? |
I'd wait to re-open until it becomes actionable. |
Is that fixed on Tom's PR, or is there an issue for it? Or is that already tested through existing tests? |
Fixed locally. Haven't pushed yet. |
Also _concat_same_type and copy (both based on implementations in #22862)
git diff upstream/master -u -- "*.py" | flake8 --diff