Skip to content

CLN: Move some PI/DTI methods to EA subclasses, implement tests #22961

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Oct 8, 2018

Conversation

jbrockmendel
Copy link
Member

If/when this goes through the basic template it creates for tests will be pretty straightforward to extend to the other relevant methods/properties.

@TomAugspurger this definitely overlaps with #22862, particularly the pieces touching constructors. LMK if this causes problems and I can try to stay in my lane.

@pep8speaks
Copy link

Hello @jbrockmendel! Thanks for submitting the PR.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think this looks good at a glance. Will take another look later.

w.r.t. constructors, I think the hope is (read :: has 'has type')

  • PeriodArray.__init__ :: (ndarray[int], Tick) -> None (the ndarray hold ordinals)
  • PeriodArray._from_ordinals :: (ndarray[int], Tick) -> PeriodArray
  • PeriodArray._from_periods :: (ndarray[object], Optional[Tick]) -> PeriodArray (the ndarray holds Period objects. freq may be inferred)

On the Index side, we have

  • the complex __new__. not sure what to do there yet.
  • Maybe public from_ordinals and from_periods that just pass-through, though that's a different discussion
  • _simple_new
  • _shallow_copy
  • _shallow_copy_with_infer

Can we nail down the expected behavior of those least three? To me

  1. _simple_new :: (Union[ndarray, ExtensionArray], Optional[name], Dict) (what's in the dict? Currently it's attributes, but I think all those (freq for PeriodIndex) will be pushed onto PeriodArray, so maybe it isn't necessary anymore?).
  2. I haven't thought deeply about _shallow_copy and _shallow_copy_with_infer. Ideally we'd always be operating on the "native" data for that type, the type passed to _simple_new at this point, but I'm not sure how these are used in practice.

Sorry for derailing this issue with constructor talk...


from pandas.core.arrays.datetimes import DatetimeArrayMixin
from pandas.core.arrays.timedeltas import TimedeltaArrayMixin
from pandas.core.arrays.period import PeriodArrayMixin


# TODO: more freq variants
@pytest.fixture(params=['D', 'B', 'W', 'M', 'Q', 'Y'])
def per_index(request):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name confused me at first, I read "per" as in "miles per hour". Maybe just period_index, and datetime_index?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, will update.

@jbrockmendel
Copy link
Member Author

Updated with suggested edits.

Initial thoughts on the constructors:

  • I'd like to get rid of _from_ordinals and have _simple_new be at the bottom-end of the constructor chain.

  • I don't totally understand the use case for _shallow_copy_with_infer. If it could be made unnecessary, I'd be +1 on that.

  • -1 on adding public constructors just yet. Harder to undo later. But _from_periods seems like a reasonable method for "construct from list-like of periods".

(what's in the dict? Currently it's attributes, but I think all those (freq for PeriodIndex) will be pushed onto PeriodArray, so maybe it isn't necessary anymore?).

That was to allow for code to be shared between PeriodArray/PeriodIndex; the PeriodIndex version would pass name while PeriodArray wouldn't pass any kwargs. So if/when inheritance is removed in favor of composition, that can definitely go.

PeriodArray.__init__ :: (ndarray[int], Tick) -> None (the ndarray hold ordinals)

I'd prefer to have the PeriodArray usage be more similar to the PeriodIndex usage, in which case the first argument would accept more variants (accept PeriodIndex, PeriodArray, ndarray[Period], Series[Period], ...). Do you have a different usage in mind?

@TomAugspurger
Copy link
Contributor

There's a certain attractiveness to having the __init__ being just a type check and assignment to self. Regardless, I think it's fine to accept those for now, and when we do the Array refactor we can see how many times we're calling PeriodArray.__init__ directly. If most of the calls go through _simple_new or _from_periods then maybe we could move the remained to go through those.

@jorisvandenbossche
Copy link
Member

I don't totally understand the use case for _shallow_copy_with_infer. If it could be made unnecessary, I'd be +1 on that.

The difference between _shallow_copy_with_infer and _shallow_copy is that the first one can actually return a different class, based on the values it is passed. Eg:

In [71]: idx = pd.Index(['a', 'b', 'c'])

In [72]: idx._shallow_copy_with_infer(np.array([1., 2.]))
Out[72]: Float64Index([1.0, 2.0], dtype='float64')

In [73]: idx._shallow_copy(np.array([1., 2.]))
Out[73]: Index([1.0, 2.0], dtype='float64')     <---- strange result

but both still pass through attributes (name, etc).

_simple_new :: (Union[ndarray, ExtensionArray], Optional[name], Dict) (what's in the dict? Currently it's attributes, but I think all those (freq for PeriodIndex) will be pushed onto PeriodArray, so maybe it isn't necessary anymore?).

If you want to allow the _simple_new of PeriodIndex to accept ndarray, you still need to be able to pass the freq in the dict as well. But if we only accept PeriodArray, the dict could be dropped. But I think we still need a constructor from ndarray+freq on the index as well (unless we do _simple_new(PeriodArray(valuues, freq)), but that might be a bit verbose.

@codecov
Copy link

codecov bot commented Oct 3, 2018

Codecov Report

Merging #22961 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22961      +/-   ##
==========================================
+ Coverage   92.19%   92.19%   +<.01%     
==========================================
  Files         169      169              
  Lines       50831    50845      +14     
==========================================
+ Hits        46864    46878      +14     
  Misses       3967     3967
Flag Coverage Δ
#multiple 90.61% <100%> (ø) ⬆️
#single 42.35% <29.68%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/arrays/period.py 92.8% <100%> (+0.82%) ⬆️
pandas/core/indexes/period.py 93.31% <100%> (-0.35%) ⬇️
pandas/core/indexes/datetimes.py 95.74% <100%> (-0.04%) ⬇️
pandas/core/arrays/datetimes.py 96.91% <100%> (+0.09%) ⬆️
pandas/core/indexes/timedeltas.py 91.16% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4976ee1...927e4bd. Read the comment docs.

@jbrockmendel
Copy link
Member Author

Totally happy to continue the constructors discussion here. First though: any other thoughts on the PR? I think original comments have all been addressed.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on these changes, just a couple questions that you're free to ignore.

@@ -127,6 +128,10 @@ def __new__(cls, values, freq=None, **kwargs):
freq = values.freq
values = values.asi8

elif is_datetime64_dtype(values):
# TODO: what if it has tz?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a warning, right? Maybe push the one that's in DatetimeIndex.to_period down here? That can be a separate PR though, since there may be other places calling PeriodArray with a tz-aware values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, yah.

If it doesn't mess up your plans too much, after this I'd like to do a PR cleaning up some of the duplicated/convoluted constructors in (Period|Datetime|Timedelta)(Index|ArrayMixin). I think making some helper functions etc before the bigger refactor would help focus attention during the latter.

"""
freqstr = request.param
# TODO: non-monotone indexes; NaTs, different start dates
pi = pd.period_range(start=pd.Timestamp('2000-01-01'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, sorry forgot to mention this earlier. Is it possible to include a name on these? It may complicate the tests a bit.

Or maybe just double check that we have other tests ensure that the index .to_timestamp and .to_period assert the name is passed through.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to include a name on these?

Sure I guess. My thought for both of these fixtures is to try to do something hypothesis-like to cover the whole space of Period/DatetimeIndexes that might occur in the wild.

assert the name is passed through.

... but it isn't. The Array classes don't have a name. Or am I not understanding the suggestion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably wasn't being clear, and I think I misunderstood the eventual goal of these tests. This looks fine, and I've verified that we do have other tests for PeriodIndex.to_timestmap's name.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added two small comments.

But for the rest I am fine with this as long as it doesn't make the life of Tom more difficult

@property
def end_time(self):
return self.to_timestamp(how='end')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave those properties here, since we will need to add them back when doing the actual refactor

(or you can call the ArrayMixin version like you did for to_period/to_timestamp, but in this case that will be more verbose than what is there now (but which is not necessarily a problem, we can simplify when doing the actual refactor))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so on this PR, these are inherited from DatetimelikeArrayMixin, which won't be a parent of PeriodIndex in the future?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the future, assuming we do away with inheritance (which I'm still -0 on), we'll use something akin to

@inherits_from_array(["start_time", "end_time", ...]
class PeriodIndex(...):

to succinctly implement appropriate wrapping.

In the interim, removing code from the Index subclasses makes it easier for me to sort out what methods still need to be migrated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so on this PR, these are inherited from DatetimelikeArrayMixin

Close, they are inherited from PeriodArrayMixin

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the future, we'll use something akin to

Yes, that might certainly be (or maybe with a method adding them like we do for eg arithmetic methods), but I don't think we will do that in an initial PR that does the refactor, so moving it away now, will make that PR bigger / less easy to understand

In the interim, removing code from the Index subclasses makes it easier for me to sort out what methods still need to be migrated.

Then I would suggest calling the ArrayMixin implementation as you did for to_timestamp/to_array. That also makes it clear what is already moved and what not yet

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually given the CI turnaround, if there’s consensus here I’d like to make this change in a follow-up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do it here (travis is still fast)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


from pandas.core.arrays.datetimes import DatetimeArrayMixin
from pandas.core.arrays.timedeltas import TimedeltaArrayMixin
from pandas.core.arrays.period import PeriodArrayMixin


# TODO: more freq variants
@pytest.fixture(params=['D', 'B', 'W', 'M', 'Q', 'Y'])
def period_index(request):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this rather be called 'period_array' ? (since this is in the array tests directory, and is meant to later become the actual array tests?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this rather be called 'period_array'

No, the fixture is specifically returning a PeriodIndex. I imagine at some point when these are more fleshed out they'll go up in conftest, or parts might go in pd.util.testing


# placeholder until these become actual EA subclasses and we can use
# an EA-specific tm.assert_ function
tm.assert_index_equal(pd.Index(result), pd.Index(expected))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we should test the equivalence between the index and array method in the array tests, that feels a bit out of place (since it is the index one that is calling the array)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is that we have fairly robust tests for the Index methods. Instead of effectively re-implementing all those tests, we can just test that the Index/Array methods behave identically.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that makes sense (at least for now)
But you could still have fixtures for the arrays and create the index of it if needed inside the test (instead of currently the other way around) ?

@TomAugspurger
Copy link
Contributor

FYI, given #22992 I think there's no need to wait for circle.

@jbrockmendel
Copy link
Member Author

Travis failure is unrelated hypothesis warning

@jorisvandenbossche jorisvandenbossche changed the title Move some PI/DTI methods to EA subclasses, implement tests CLN: Move some PI/DTI methods to EA subclasses, implement tests Oct 5, 2018
@jorisvandenbossche
Copy link
Member

@TomAugspurger This is fine for me, but I will defer to you for merging (so you can time it if needed with your period work)

@jreback jreback added the Clean label Oct 6, 2018
@jreback jreback added this to the 0.24.0 milestone Oct 6, 2018
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. @TomAugspurger can merge when ready to de-conflict w.r.t. PeriodArray refactor

-------
DatetimeArray/Index
"""
from pandas.core.arrays.datetimes import DatetimeArrayMixin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you able to put these at the top?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's fragile. This mirrors the existing runtime import in the PeriodIndex method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok that's fine, yeah definitly appreciate making the import hierarchy nicer


from pandas.core.arrays.datetimes import DatetimeArrayMixin
from pandas.core.arrays.timedeltas import TimedeltaArrayMixin
from pandas.core.arrays.period import PeriodArrayMixin


# TODO: more freq variants
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a fixture for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see it anywhere. I'll make one at the same time as moving these to conftest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas/tests/tseries/offsets/conftest.py ? (those are the classes, but imagine have the abbrevs also.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those don't exist at the moment. I'll make them in the same pass as the other fixture stuff discussed here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k cool

return pi


@pytest.fixture(params=['D', 'B', 'W', 'M', 'Q', 'Y'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at some point should move these to a pandas/tests/arrays/conftest.py (or event the very to-level conftest)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely. This is bare-bones at the moment and needs some major buffing-up. Current thought is to move them up once they are closer to "done".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@jreback jreback added the Period Period data type label Oct 6, 2018
@jreback
Copy link
Contributor

jreback commented Oct 6, 2018

happy to merge this (or you can).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Period Period data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants