-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
CLN: Move some PI/DTI methods to EA subclasses, implement tests #22961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @jbrockmendel! Thanks for submitting the PR.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think this looks good at a glance. Will take another look later.
w.r.t. constructors, I think the hope is (read ::
has 'has type')
PeriodArray.__init__ :: (ndarray[int], Tick) -> None
(the ndarray hold ordinals)PeriodArray._from_ordinals :: (ndarray[int], Tick) -> PeriodArray
PeriodArray._from_periods :: (ndarray[object], Optional[Tick]) -> PeriodArray
(the ndarray holds Period objects. freq may be inferred)
On the Index side, we have
- the complex
__new__
. not sure what to do there yet. - Maybe public
from_ordinals
andfrom_periods
that just pass-through, though that's a different discussion _simple_new
_shallow_copy
_shallow_copy_with_infer
Can we nail down the expected behavior of those least three? To me
_simple_new :: (Union[ndarray, ExtensionArray], Optional[name], Dict)
(what's in the dict? Currently it's attributes, but I think all those (freq for PeriodIndex) will be pushed onto PeriodArray, so maybe it isn't necessary anymore?).- I haven't thought deeply about _shallow_copy and _shallow_copy_with_infer. Ideally we'd always be operating on the "native" data for that type, the type passed to
_simple_new
at this point, but I'm not sure how these are used in practice.
Sorry for derailing this issue with constructor talk...
|
||
from pandas.core.arrays.datetimes import DatetimeArrayMixin | ||
from pandas.core.arrays.timedeltas import TimedeltaArrayMixin | ||
from pandas.core.arrays.period import PeriodArrayMixin | ||
|
||
|
||
# TODO: more freq variants | ||
@pytest.fixture(params=['D', 'B', 'W', 'M', 'Q', 'Y']) | ||
def per_index(request): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name confused me at first, I read "per" as in "miles per hour". Maybe just period_index
, and datetime_index
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, will update.
Updated with suggested edits. Initial thoughts on the constructors:
That was to allow for code to be shared between PeriodArray/PeriodIndex; the PeriodIndex version would pass
I'd prefer to have the PeriodArray usage be more similar to the PeriodIndex usage, in which case the first argument would accept more variants (accept PeriodIndex, PeriodArray, ndarray[Period], Series[Period], ...). Do you have a different usage in mind? |
There's a certain attractiveness to having the |
The difference between
but both still pass through attributes (name, etc).
If you want to allow the |
Codecov Report
@@ Coverage Diff @@
## master #22961 +/- ##
==========================================
+ Coverage 92.19% 92.19% +<.01%
==========================================
Files 169 169
Lines 50831 50845 +14
==========================================
+ Hits 46864 46878 +14
Misses 3967 3967
Continue to review full report at Codecov.
|
Totally happy to continue the constructors discussion here. First though: any other thoughts on the PR? I think original comments have all been addressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on these changes, just a couple questions that you're free to ignore.
@@ -127,6 +128,10 @@ def __new__(cls, values, freq=None, **kwargs): | |||
freq = values.freq | |||
values = values.asi8 | |||
|
|||
elif is_datetime64_dtype(values): | |||
# TODO: what if it has tz? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a warning, right? Maybe push the one that's in DatetimeIndex.to_period
down here? That can be a separate PR though, since there may be other places calling PeriodArray with a tz-aware values
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, yah.
If it doesn't mess up your plans too much, after this I'd like to do a PR cleaning up some of the duplicated/convoluted constructors in (Period|Datetime|Timedelta)(Index|ArrayMixin). I think making some helper functions etc before the bigger refactor would help focus attention during the latter.
""" | ||
freqstr = request.param | ||
# TODO: non-monotone indexes; NaTs, different start dates | ||
pi = pd.period_range(start=pd.Timestamp('2000-01-01'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, sorry forgot to mention this earlier. Is it possible to include a name on these? It may complicate the tests a bit.
Or maybe just double check that we have other tests ensure that the index .to_timestamp
and .to_period
assert the name is passed through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to include a name on these?
Sure I guess. My thought for both of these fixtures is to try to do something hypothesis-like to cover the whole space of Period/DatetimeIndexes that might occur in the wild.
assert the name is passed through.
... but it isn't. The Array classes don't have a name
. Or am I not understanding the suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably wasn't being clear, and I think I misunderstood the eventual goal of these tests. This looks fine, and I've verified that we do have other tests for PeriodIndex.to_timestmap
's name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added two small comments.
But for the rest I am fine with this as long as it doesn't make the life of Tom more difficult
pandas/core/indexes/period.py
Outdated
@property | ||
def end_time(self): | ||
return self.to_timestamp(how='end') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave those properties here, since we will need to add them back when doing the actual refactor
(or you can call the ArrayMixin version like you did for to_period
/to_timestamp
, but in this case that will be more verbose than what is there now (but which is not necessarily a problem, we can simplify when doing the actual refactor))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so on this PR, these are inherited from DatetimelikeArrayMixin
, which won't be a parent of PeriodIndex in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in the future, assuming we do away with inheritance (which I'm still -0 on), we'll use something akin to
@inherits_from_array(["start_time", "end_time", ...]
class PeriodIndex(...):
to succinctly implement appropriate wrapping.
In the interim, removing code from the Index subclasses makes it easier for me to sort out what methods still need to be migrated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so on this PR, these are inherited from DatetimelikeArrayMixin
Close, they are inherited from PeriodArrayMixin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in the future, we'll use something akin to
Yes, that might certainly be (or maybe with a method adding them like we do for eg arithmetic methods), but I don't think we will do that in an initial PR that does the refactor, so moving it away now, will make that PR bigger / less easy to understand
In the interim, removing code from the Index subclasses makes it easier for me to sort out what methods still need to be migrated.
Then I would suggest calling the ArrayMixin implementation as you did for to_timestamp/to_array. That also makes it clear what is already moved and what not yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually given the CI turnaround, if there’s consensus here I’d like to make this change in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do it here (travis is still fast)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
from pandas.core.arrays.datetimes import DatetimeArrayMixin | ||
from pandas.core.arrays.timedeltas import TimedeltaArrayMixin | ||
from pandas.core.arrays.period import PeriodArrayMixin | ||
|
||
|
||
# TODO: more freq variants | ||
@pytest.fixture(params=['D', 'B', 'W', 'M', 'Q', 'Y']) | ||
def period_index(request): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this rather be called 'period_array' ? (since this is in the array tests directory, and is meant to later become the actual array tests?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this rather be called 'period_array'
No, the fixture is specifically returning a PeriodIndex. I imagine at some point when these are more fleshed out they'll go up in conftest, or parts might go in pd.util.testing
|
||
# placeholder until these become actual EA subclasses and we can use | ||
# an EA-specific tm.assert_ function | ||
tm.assert_index_equal(pd.Index(result), pd.Index(expected)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we should test the equivalence between the index and array method in the array tests, that feels a bit out of place (since it is the index one that is calling the array)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is that we have fairly robust tests for the Index methods. Instead of effectively re-implementing all those tests, we can just test that the Index/Array methods behave identically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that makes sense (at least for now)
But you could still have fixtures for the arrays and create the index of it if needed inside the test (instead of currently the other way around) ?
FYI, given #22992 I think there's no need to wait for circle. |
Travis failure is unrelated hypothesis warning |
@TomAugspurger This is fine for me, but I will defer to you for merging (so you can time it if needed with your period work) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. @TomAugspurger can merge when ready to de-conflict w.r.t. PeriodArray refactor
------- | ||
DatetimeArray/Index | ||
""" | ||
from pandas.core.arrays.datetimes import DatetimeArrayMixin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you able to put these at the top?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's fragile. This mirrors the existing runtime import in the PeriodIndex method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok that's fine, yeah definitly appreciate making the import hierarchy nicer
|
||
from pandas.core.arrays.datetimes import DatetimeArrayMixin | ||
from pandas.core.arrays.timedeltas import TimedeltaArrayMixin | ||
from pandas.core.arrays.period import PeriodArrayMixin | ||
|
||
|
||
# TODO: more freq variants |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have a fixture for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see it anywhere. I'll make one at the same time as moving these to conftest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas/tests/tseries/offsets/conftest.py ? (those are the classes, but imagine have the abbrevs also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those don't exist at the moment. I'll make them in the same pass as the other fixture stuff discussed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k cool
return pi | ||
|
||
|
||
@pytest.fixture(params=['D', 'B', 'W', 'M', 'Q', 'Y']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at some point should move these to a pandas/tests/arrays/conftest.py (or event the very to-level conftest)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely. This is bare-bones at the moment and needs some major buffing-up. Current thought is to move them up once they are closer to "done".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
happy to merge this (or you can). |
If/when this goes through the basic template it creates for tests will be pretty straightforward to extend to the other relevant methods/properties.
@TomAugspurger this definitely overlaps with #22862, particularly the pieces touching constructors. LMK if this causes problems and I can try to stay in my lane.