-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
CLN: Move to_period, to_perioddelta up to EA subclasses #23113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @jbrockmendel! Thanks for submitting the PR.
|
Codecov Report
@@ Coverage Diff @@
## master #23113 +/- ##
==========================================
- Coverage 92.19% 92.19% -0.01%
==========================================
Files 169 169
Lines 50950 50948 -2
==========================================
- Hits 46975 46973 -2
Misses 3975 3975
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to_period
is fine to move for me, but for the others we need to think a bit about it
To what extent do we want to keep our internal EAs to be strict to the EA interface (except for additional type-specific methods like to_period
of course)?
I think we should try to be strict, as otherwise we might start relying on those extra methods, that might not be there on external EAs.
But we of course also discuss to expand the EA interface.
pandas/core/arrays/datetimelike.py
Outdated
@@ -207,6 +208,24 @@ def astype(self, dtype, copy=True): | |||
return self._box_values(self.asi8) | |||
return super(DatetimeLikeArrayMixin, self).astype(dtype, copy) | |||
|
|||
def repeat(self, repeats, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a repeat
method needed? For now, we don't have that in our EA interface, so if avoidable, I would keep it that way (and keep it on the index).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repeat
is also moved to PeriodArray in #22862.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we might need to discuss it there as well.
But also, in that PR, the index method still exists. So in this PR, you are removing something from the index, which will need to be added back in a next PR.
""" | ||
Calculate TimedeltaArray of difference between index | ||
values and index converted to PeriodArray at specified | ||
freq. Used for vectorized offsets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this a rather "obscure" method (meaning, not directly sure how it would be useful for me as a user). But I am also not a frequent user of periods/offsets, so difficult to say (and maybe a good example in the docstring might already help).
So what I want to say: in case this is indeed rather for internal use, I would not make it a public method on the array class (but can still do the move but make it for example private here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used indirectly in DatetimeIndex + DateOffset, so will be needed for DatetimeArray + DateOffset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might want to make this private, i dont' think we need to expose to the user
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I privatized this earlier today, but have now changed my mind. For DateOffset arithmetic to work "for free" it needs to have the same name. We can revisit this later if needbe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, can you then add a comment like TODO(EA): make this private
, for once we do the actual split (because I suppose then this is no longer a problem?)
pandas/core/arrays/datetimelike.py
Outdated
return self._shallow_copy(self.asi8.repeat(repeats), | ||
freq=freq) | ||
|
||
def tolist(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for tolist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is ok, again a common / useful method. actually I would move these up to array.base (and if needed subclass here for example repeat)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then let's first have a general discussion about adding it to the EA interface, and what the exact expected semantics are.
But unless we have a clear use case for pandas (and have we? or is this on the Index rather for users?), I think we should a bit hesitant to add a lot.
(and also the same comment applies here: even if we add it to the Array object, it should not be removed from the Index)
I've been thinking about it from the other direction: what methods are Index-specific versus making sense on an array?
There is a growing list of Issues of the form "Add X to EA" that has me convinced we're going to end up with basically the entire ndarray API. Which makes sense to the extent we want EA (or at least pandas-internal EAs) to be a seamless drop-in replacement. |
I am with @jbrockmendel here. The EA interface is going be used as a drop in replacement for ndarray things, so we ought to have the common / useful methods at least. and certainly |
pandas/core/arrays/datetimelike.py
Outdated
return self._shallow_copy(self.asi8.repeat(repeats), | ||
freq=freq) | ||
|
||
def tolist(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is ok, again a common / useful method. actually I would move these up to array.base (and if needed subclass here for example repeat)
""" | ||
Calculate TimedeltaArray of difference between index | ||
values and index converted to PeriodArray at specified | ||
freq. Used for vectorized offsets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might want to make this private, i dont' think we need to expose to the user
OK. We can make it private on the Array and keep it public on the Index (could privatize later if really wanted) |
pandas/core/arrays/datetimelike.py
Outdated
@@ -207,6 +208,24 @@ def astype(self, dtype, copy=True): | |||
return self._box_values(self.asi8) | |||
return super(DatetimeLikeArrayMixin, self).astype(dtype, copy) | |||
|
|||
def repeat(self, repeats, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we might need to discuss it there as well.
But also, in that PR, the index method still exists. So in this PR, you are removing something from the index, which will need to be added back in a next PR.
pandas/core/arrays/datetimelike.py
Outdated
return self._shallow_copy(self.asi8.repeat(repeats), | ||
freq=freq) | ||
|
||
def tolist(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then let's first have a general discussion about adding it to the EA interface, and what the exact expected semantics are.
But unless we have a clear use case for pandas (and have we? or is this on the Index rather for users?), I think we should a bit hesitant to add a lot.
(and also the same comment applies here: even if we add it to the Array object, it should not be removed from the Index)
Reverted move of repeat/tolist so we can move the ball down the field. |
lgtm. The repeat/tolist were fine actually. |
I have still one (very minor) outstanding comment: https://github.com/pandas-dev/pandas/pull/23113/files#r225054241, then good to merge for me as well |
comment added |
@jorisvandenbossche implemented the requested comment. Are there other outstanding issues here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I don't think this should cause issues.
I restarted the travis job that timed out.
@jorisvandenbossche I think this is as green as it gets at the moment. Would like to explicitly get your OK. |
Also the import order that is failing |
Thanks! |
1 similar comment
Thanks! |
@@ -1168,6 +1143,9 @@ def slice_indexer(self, start=None, end=None, step=None, kind=None): | |||
is_year_end = wrap_field_accessor(DatetimeArrayMixin.is_year_end) | |||
is_leap_year = wrap_field_accessor(DatetimeArrayMixin.is_leap_year) | |||
|
|||
to_perioddelta = wrap_array_method(DatetimeArrayMixin.to_perioddelta, | |||
False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does anyone know why the result's .name
isn't set to self.name
here?
Should be orthogonal to other outstanding datetimelike EA PRs.