Skip to content

Implement take for EA mixins #23159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from

Conversation

jbrockmendel
Copy link
Member

Also _concat_same_type and copy (both based on implementations in #22862)

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@pep8speaks
Copy link

Hello @jbrockmendel! Thanks for submitting the PR.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me in general (didn't look at the tests in detail), added some comments.

class SharedTests(object):
index_cls = None

def test_take(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think such basic tests are already covered in the base extension tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea. I'm pretty sure the dtype-specific tests must be non-redundant.

return self._shallow_copy(new_values)

@classmethod
def _concat_same_type(cls, to_concat):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really tested for now, is that correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but I am -1 on moving things out of the PeriodArray PR that cannot stand on its own

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming you're implicitly saying you'd be OK with this if tests were implemented as part of this PR, I think this is an excellent comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that is not really what I meant :-)
Because now you added tests for something that would already be tested in the PeriodArray PR. We don't need explicit tests for this method, as it is tested through all existing methods for concat, and it already has explicit tests in the base extension tests.

So I just think this simply belongs in a PR where we actually make PeriodArray an ExtensionArray.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would already be tested in the PeriodArray PR.

The benefit of doing things in smaller chunks is that each chunk gets more focused. e.g. I don't want to throw shade on Tom's PR, but the take implementation missed fill_value that is a Period with non-matching freq. Doing a couple methods at a time lets both authors and reviewers focus more precisely.

And while "too many tests" does have a downside, it really isn't in the top 5 things I'm going to worry about today.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the take implementation missed fill_value that is a Period with non-matching freq

And you can review the PR in detail (which of course takes time), and have pointed that out there.

Yes, for sure smaller PRs are better, and to the extent possible, we should strive for that. I fully agree the huge PR makes it very difficult to do a proper review.

But getting out chunks that cannot live on its own can also be hard to review. And in this case it also means to need to completely duplicate tests, and also testing the implementation detail and not the end result.

it really isn't in the top 5 things I'm going to worry about today.

Agreeing on what the end result of this refactor will look like, and having a good view on that, is a worry for me. And so what I am saying is, that many small PRs also has a clear downside of making this harder.

@@ -211,6 +271,10 @@ def astype(self, dtype, copy=True):
# ------------------------------------------------------------------
# Null Handling

def isna(self):
# EA Interface
return self._isnan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as in the other PRs, I think we should keep this concept of _isnan on the Index (basically my question about it is: for the array classes, what is the advantages of having a _isnan instead of simply using isna() everywhere?)

(but since there are a lot of places here where it is used, I am fine with keeping _isna itself in here until we de the actual split)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a reasonable view, but one that I'd like to put off dealing with for as long as possible.

The eventual move to composition is going to cause a mismatch between what is and is not cacheable (like _isnan is now on the Index subclasses). Since the EA subclasses actually use _isnan in arithmetic/comparison ops, there is a performance hit if we're not careful. Since I haven't thought of a good solution to this, putting it off is the next best thing.

@codecov
Copy link

codecov bot commented Oct 15, 2018

Codecov Report

Merging #23159 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23159      +/-   ##
==========================================
+ Coverage   92.19%    92.2%   +<.01%     
==========================================
  Files         169      169              
  Lines       50959    51008      +49     
==========================================
+ Hits        46980    47030      +50     
+ Misses       3979     3978       -1
Flag Coverage Δ
#multiple 90.62% <100%> (+0.01%) ⬆️
#single 42.27% <24.07%> (-0.03%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/timedeltas.py 94.2% <100%> (+0.23%) ⬆️
pandas/core/arrays/datetimelike.py 95.07% <100%> (+0.17%) ⬆️
pandas/core/arrays/period.py 95.7% <100%> (+0.13%) ⬆️
pandas/core/arrays/datetimes.py 97.48% <100%> (+0.1%) ⬆️
pandas/tseries/offsets.py 97.36% <0%> (+0.08%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 913f71f...23d2724. Read the comment docs.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear / put it a bit more strongly: let's not merge any PR related to periods until we agree on the way forward (smaller like this one or Tom's PR), unless there is an explicit blessing of Tom.

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves ExtensionArray Extending pandas with custom dtypes or arrays. labels Oct 17, 2018
@jbrockmendel
Copy link
Member Author

Closing this. @TomAugspurger i think some of the “take” code/tests may be worth salvaging for the PeriodArray PR.

@jorisvandenbossche
Copy link
Member

We could also leave this open, so you can rebase once the PeriodArray PR is merged?

@jbrockmendel
Copy link
Member Author

We could also leave this open, so you can rebase once the PeriodArray PR is merged?

I'd wait to re-open until it becomes actionable.

@jorisvandenbossche
Copy link
Member

but the take implementation missed fill_value that is a Period with non-matching freq

Is that fixed on Tom's PR, or is there an issue for it? Or is that already tested through existing tests?

@TomAugspurger
Copy link
Contributor

Fixed locally. Haven't pushed yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants