Skip to content

CLN: Move to_period, to_perioddelta up to EA subclasses #23113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Oct 22, 2018

Conversation

jbrockmendel
Copy link
Member

Should be orthogonal to other outstanding datetimelike EA PRs.

@pep8speaks
Copy link

Hello @jbrockmendel! Thanks for submitting the PR.

@codecov
Copy link

codecov bot commented Oct 12, 2018

Codecov Report

Merging #23113 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23113      +/-   ##
==========================================
- Coverage   92.19%   92.19%   -0.01%     
==========================================
  Files         169      169              
  Lines       50950    50948       -2     
==========================================
- Hits        46975    46973       -2     
  Misses       3975     3975
Flag Coverage Δ
#multiple 90.61% <100%> (-0.01%) ⬇️
#single 42.27% <33.33%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/period.py 93.24% <100%> (ø) ⬆️
pandas/core/indexes/datetimes.py 96.46% <100%> (-0.04%) ⬇️
pandas/core/arrays/datetimes.py 97.4% <100%> (+0.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b54e20b...5d8bc1a. Read the comment docs.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_period is fine to move for me, but for the others we need to think a bit about it

To what extent do we want to keep our internal EAs to be strict to the EA interface (except for additional type-specific methods like to_period of course)?
I think we should try to be strict, as otherwise we might start relying on those extra methods, that might not be there on external EAs.

But we of course also discuss to expand the EA interface.

@@ -207,6 +208,24 @@ def astype(self, dtype, copy=True):
return self._box_values(self.asi8)
return super(DatetimeLikeArrayMixin, self).astype(dtype, copy)

def repeat(self, repeats, *args, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a repeat method needed? For now, we don't have that in our EA interface, so if avoidable, I would keep it that way (and keep it on the index).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repeat is also moved to PeriodArray in #22862.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we might need to discuss it there as well.

But also, in that PR, the index method still exists. So in this PR, you are removing something from the index, which will need to be added back in a next PR.

"""
Calculate TimedeltaArray of difference between index
values and index converted to PeriodArray at specified
freq. Used for vectorized offsets
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this a rather "obscure" method (meaning, not directly sure how it would be useful for me as a user). But I am also not a frequent user of periods/offsets, so difficult to say (and maybe a good example in the docstring might already help).

So what I want to say: in case this is indeed rather for internal use, I would not make it a public method on the array class (but can still do the move but make it for example private here)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used indirectly in DatetimeIndex + DateOffset, so will be needed for DatetimeArray + DateOffset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might want to make this private, i dont' think we need to expose to the user

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I privatized this earlier today, but have now changed my mind. For DateOffset arithmetic to work "for free" it needs to have the same name. We can revisit this later if needbe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, can you then add a comment like TODO(EA): make this private, for once we do the actual split (because I suppose then this is no longer a problem?)

return self._shallow_copy(self.asi8.repeat(repeats),
freq=freq)

def tolist(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for tolist

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is ok, again a common / useful method. actually I would move these up to array.base (and if needed subclass here for example repeat)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then let's first have a general discussion about adding it to the EA interface, and what the exact expected semantics are.

But unless we have a clear use case for pandas (and have we? or is this on the Index rather for users?), I think we should a bit hesitant to add a lot.

(and also the same comment applies here: even if we add it to the Array object, it should not be removed from the Index)

@jbrockmendel
Copy link
Member Author

To what extent do we want to keep our internal EAs to be strict to the EA interface

I've been thinking about it from the other direction: what methods are Index-specific versus making sense on an array?

But we of course also discuss to expand the EA interface.

There is a growing list of Issues of the form "Add X to EA" that has me convinced we're going to end up with basically the entire ndarray API. Which makes sense to the extent we want EA (or at least pandas-internal EAs) to be a seamless drop-in replacement.

@jreback
Copy link
Contributor

jreback commented Oct 14, 2018

There is a growing list of Issues of the form "Add X to EA" that has me convinced we're going to end up with basically the entire ndarray API. Which makes sense to the extent we want EA (or at least pandas-internal EAs) to be a seamless drop-in replacement.

I am with @jbrockmendel here. The EA interface is going be used as a drop in replacement for ndarray things, so we ought to have the common / useful methods at least. and certainly .repeat is one of those.

return self._shallow_copy(self.asi8.repeat(repeats),
freq=freq)

def tolist(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is ok, again a common / useful method. actually I would move these up to array.base (and if needed subclass here for example repeat)

"""
Calculate TimedeltaArray of difference between index
values and index converted to PeriodArray at specified
freq. Used for vectorized offsets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might want to make this private, i dont' think we need to expose to the user

@jreback jreback added Datetime Datetime data dtype Period Period data type ExtensionArray Extending pandas with custom dtypes or arrays. labels Oct 14, 2018
@jbrockmendel
Copy link
Member Author

we might want to make this private, i dont' think we need to expose to the user

OK. We can make it private on the Array and keep it public on the Index (could privatize later if really wanted)

@@ -207,6 +208,24 @@ def astype(self, dtype, copy=True):
return self._box_values(self.asi8)
return super(DatetimeLikeArrayMixin, self).astype(dtype, copy)

def repeat(self, repeats, *args, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we might need to discuss it there as well.

But also, in that PR, the index method still exists. So in this PR, you are removing something from the index, which will need to be added back in a next PR.

return self._shallow_copy(self.asi8.repeat(repeats),
freq=freq)

def tolist(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then let's first have a general discussion about adding it to the EA interface, and what the exact expected semantics are.

But unless we have a clear use case for pandas (and have we? or is this on the Index rather for users?), I think we should a bit hesitant to add a lot.

(and also the same comment applies here: even if we add it to the Array object, it should not be removed from the Index)

@jbrockmendel
Copy link
Member Author

Reverted move of repeat/tolist so we can move the ball down the field.

@jreback jreback added this to the 0.24.0 milestone Oct 15, 2018
@jreback
Copy link
Contributor

jreback commented Oct 15, 2018

lgtm. The repeat/tolist were fine actually.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Oct 15, 2018

I have still one (very minor) outstanding comment: https://github.com/pandas-dev/pandas/pull/23113/files#r225054241, then good to merge for me as well

@jbrockmendel
Copy link
Member Author

comment added

@jorisvandenbossche jorisvandenbossche changed the title Move to_period, to_perioddelta, repeat up to EA subclasses CLN: Move to_period, to_perioddelta up to EA subclasses Oct 15, 2018
@jbrockmendel
Copy link
Member Author

@jorisvandenbossche implemented the requested comment. Are there other outstanding issues here?

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I don't think this should cause issues.

I restarted the travis job that timed out.

@jbrockmendel
Copy link
Member Author

@jorisvandenbossche I think this is as green as it gets at the moment. Would like to explicitly get your OK.

@jorisvandenbossche
Copy link
Member

Also the import order that is failing

@jorisvandenbossche jorisvandenbossche merged commit 4cac923 into pandas-dev:master Oct 22, 2018
@jorisvandenbossche
Copy link
Member

Thanks!

1 similar comment
@jorisvandenbossche
Copy link
Member

Thanks!

@jbrockmendel jbrockmendel deleted the dlike6 branch October 22, 2018 15:57
JustinZhengBC pushed a commit to JustinZhengBC/pandas that referenced this pull request Oct 23, 2018
tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018
@@ -1168,6 +1143,9 @@ def slice_indexer(self, start=None, end=None, step=None, kind=None):
is_year_end = wrap_field_accessor(DatetimeArrayMixin.is_year_end)
is_leap_year = wrap_field_accessor(DatetimeArrayMixin.is_leap_year)

to_perioddelta = wrap_array_method(DatetimeArrayMixin.to_perioddelta,
False)
Copy link
Contributor

@TomAugspurger TomAugspurger Dec 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anyone know why the result's .name isn't set to self.name here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype ExtensionArray Extending pandas with custom dtypes or arrays. Period Period data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants