Skip to content

CLN: Move to_period, to_perioddelta up to EA subclasses #23113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Oct 22, 2018
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
e2c6727
Move repeat up to array mixin
jbrockmendel Oct 11, 2018
5c32348
Move tolist up
jbrockmendel Oct 11, 2018
ba5c2c1
Move to_perioddelta up
jbrockmendel Oct 11, 2018
e1e05a9
Avoid passing object dtype to simple_new
jbrockmendel Oct 11, 2018
d3ca5e9
de-duplicate wrapping code
jbrockmendel Oct 12, 2018
b4c2496
de-duplicate wrapping code
jbrockmendel Oct 12, 2018
192170b
Merge branch 'master' of https://github.com/pandas-dev/pandas into dl…
jbrockmendel Oct 12, 2018
b8c2647
test fixup
jbrockmendel Oct 12, 2018
a99b1f1
Fixup unused import
jbrockmendel Oct 12, 2018
edea5ef
Merge branch 'master' of https://github.com/pandas-dev/pandas into dl…
jbrockmendel Oct 13, 2018
a4ee53d
Merge branch 'master' of https://github.com/pandas-dev/pandas into dl…
jbrockmendel Oct 14, 2018
eaf364f
privatize to_perioddelta in Array class
jbrockmendel Oct 14, 2018
0a5bb77
revert move of repeat, tolist
jbrockmendel Oct 14, 2018
b38ad9b
remove tests for removed methods
jbrockmendel Oct 14, 2018
28bcd5a
Revert privatization in test
jbrockmendel Oct 14, 2018
033960a
Merge branch 'master' of https://github.com/pandas-dev/pandas into dl…
jbrockmendel Oct 15, 2018
91aaef4
add comment [ci skip]
jbrockmendel Oct 15, 2018
44c8162
Merge branch 'dlike6' of https://github.com/jbrockmendel/pandas into …
jbrockmendel Oct 15, 2018
ef5c1e6
Merge branch 'master' of https://github.com/pandas-dev/pandas into dl…
jbrockmendel Oct 17, 2018
3b31903
Merge branch 'master' of https://github.com/pandas-dev/pandas into dl…
jbrockmendel Oct 17, 2018
0ce415b
Merge branch 'master' of https://github.com/pandas-dev/pandas into dl…
jbrockmendel Oct 18, 2018
5d8bc1a
Merge branch 'master' of https://github.com/pandas-dev/pandas into dl…
jbrockmendel Oct 18, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions pandas/core/arrays/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@

from pandas.errors import NullFrequencyError, PerformanceWarning
from pandas import compat
from pandas.compat.numpy import function as nv

from pandas.tseries import frequencies
from pandas.tseries.offsets import Tick, DateOffset
Expand Down Expand Up @@ -207,6 +208,24 @@ def astype(self, dtype, copy=True):
return self._box_values(self.asi8)
return super(DatetimeLikeArrayMixin, self).astype(dtype, copy)

def repeat(self, repeats, *args, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a repeat method needed? For now, we don't have that in our EA interface, so if avoidable, I would keep it that way (and keep it on the index).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repeat is also moved to PeriodArray in #22862.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we might need to discuss it there as well.

But also, in that PR, the index method still exists. So in this PR, you are removing something from the index, which will need to be added back in a next PR.

"""
Analogous to ndarray.repeat
"""
nv.validate_repeat(args, kwargs)
if is_period_dtype(self):
freq = self.freq
else:
freq = None
return self._shallow_copy(self.asi8.repeat(repeats),
freq=freq)

def tolist(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for tolist

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is ok, again a common / useful method. actually I would move these up to array.base (and if needed subclass here for example repeat)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then let's first have a general discussion about adding it to the EA interface, and what the exact expected semantics are.

But unless we have a clear use case for pandas (and have we? or is this on the Index rather for users?), I think we should a bit hesitant to add a lot.

(and also the same comment applies here: even if we add it to the Array object, it should not be removed from the Index)

"""
return a list of the underlying data
"""
return list(self.astype(object))

# ------------------------------------------------------------------
# Null Handling

Expand Down
18 changes: 18 additions & 0 deletions pandas/core/arrays/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -829,6 +829,24 @@ def to_period(self, freq=None):

return PeriodArrayMixin(self.values, freq=freq)

def to_perioddelta(self, freq):
"""
Calculate TimedeltaArray of difference between index
values and index converted to PeriodArray at specified
freq. Used for vectorized offsets
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this a rather "obscure" method (meaning, not directly sure how it would be useful for me as a user). But I am also not a frequent user of periods/offsets, so difficult to say (and maybe a good example in the docstring might already help).

So what I want to say: in case this is indeed rather for internal use, I would not make it a public method on the array class (but can still do the move but make it for example private here)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used indirectly in DatetimeIndex + DateOffset, so will be needed for DatetimeArray + DateOffset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might want to make this private, i dont' think we need to expose to the user

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I privatized this earlier today, but have now changed my mind. For DateOffset arithmetic to work "for free" it needs to have the same name. We can revisit this later if needbe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, can you then add a comment like TODO(EA): make this private, for once we do the actual split (because I suppose then this is no longer a problem?)


Parameters
----------
freq: Period frequency

Returns
-------
TimedeltaArray/Index
"""
from pandas.core.arrays.timedeltas import TimedeltaArrayMixin
i8delta = self.asi8 - self.to_period(freq).to_timestamp().asi8
return TimedeltaArrayMixin(i8delta)

# -----------------------------------------------------------------
# Properties - Vectorized Timestamp Properties/Methods

Expand Down
18 changes: 0 additions & 18 deletions pandas/core/indexes/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -415,12 +415,6 @@ def _convert_tolerance(self, tolerance, target):
'target index size')
return tolerance

def tolist(self):
"""
return a list of the underlying data
"""
return list(self.astype(object))

def min(self, axis=None, *args, **kwargs):
"""
Return the minimum value of the Index or minimum along
Expand Down Expand Up @@ -613,18 +607,6 @@ def isin(self, values):

return algorithms.isin(self.asi8, values.asi8)

def repeat(self, repeats, *args, **kwargs):
"""
Analogous to ndarray.repeat
"""
nv.validate_repeat(args, kwargs)
if is_period_dtype(self):
freq = self.freq
else:
freq = None
return self._shallow_copy(self.asi8.repeat(repeats),
freq=freq)

@Appender(_index_shared_docs['where'] % _index_doc_kwargs)
def where(self, cond, other=None):
other = _ensure_datetimelike_to_i8(other, to_utc=True)
Expand Down
28 changes: 3 additions & 25 deletions pandas/core/indexes/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,6 @@
from pandas.tseries.offsets import (
generate_range, CDay, prefix_mapping)

from pandas.core.tools.timedeltas import to_timedelta
from pandas.util._decorators import Appender, cache_readonly, Substitution
import pandas.core.common as com
import pandas.tseries.offsets as offsets
Expand Down Expand Up @@ -621,13 +620,6 @@ def to_series(self, keep_tz=False, index=None, name=None):

return Series(values, index=index, name=name)

@Appender(DatetimeArrayMixin.to_period.__doc__)
def to_period(self, freq=None):
from pandas.core.indexes.period import PeriodIndex

result = DatetimeArrayMixin.to_period(self, freq=freq)
return PeriodIndex(result, name=self.name)

def snap(self, freq='S'):
"""
Snap time stamps to nearest occurring frequency
Expand Down Expand Up @@ -699,23 +691,6 @@ def union(self, other):
result.freq = to_offset(result.inferred_freq)
return result

def to_perioddelta(self, freq):
"""
Calculate TimedeltaIndex of difference between index
values and index converted to periodIndex at specified
freq. Used for vectorized offsets

Parameters
----------
freq: Period frequency

Returns
-------
y: TimedeltaIndex
"""
return to_timedelta(self.asi8 - self.to_period(freq)
.to_timestamp().asi8)

def union_many(self, others):
"""
A bit of a hack to accelerate unioning a collection of indexes
Expand Down Expand Up @@ -1247,6 +1222,9 @@ def slice_indexer(self, start=None, end=None, step=None, kind=None):
is_year_end = wrap_field_accessor(DatetimeArrayMixin.is_year_end)
is_leap_year = wrap_field_accessor(DatetimeArrayMixin.is_leap_year)

to_perioddelta = wrap_array_method(DatetimeArrayMixin.to_perioddelta,
False)
Copy link
Contributor

@TomAugspurger TomAugspurger Dec 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anyone know why the result's .name isn't set to self.name here?

to_period = wrap_array_method(DatetimeArrayMixin.to_period, True)
normalize = wrap_array_method(DatetimeArrayMixin.normalize, True)
to_julian_date = wrap_array_method(DatetimeArrayMixin.to_julian_date,
False)
Expand Down
6 changes: 3 additions & 3 deletions pandas/core/indexes/period.py
Original file line number Diff line number Diff line change
Expand Up @@ -316,9 +316,9 @@ def __array_wrap__(self, result, context=None):
"""
if isinstance(context, tuple) and len(context) > 0:
func = context[0]
if (func is np.add):
if func is np.add:
pass
elif (func is np.subtract):
elif func is np.subtract:
name = self.name
left = context[1][0]
right = context[1][1]
Expand All @@ -339,7 +339,7 @@ def __array_wrap__(self, result, context=None):
return result
# the result is object dtype array of Period
# cannot pass _simple_new as it is
return self._shallow_copy(result, freq=self.freq, name=self.name)
return type(self)(result, freq=self.freq, name=self.name)

@property
def size(self):
Expand Down
55 changes: 55 additions & 0 deletions pandas/tests/arrays/test_datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,40 @@ def test_astype_object(self, tz_naive_fixture):
assert asobj.dtype == 'O'
assert list(asobj) == list(dti)

# TODO: share this between Datetime/Timedelta/Period Array tests
def test_repeat(self, datetime_index):
dti = datetime_index
arr = DatetimeArrayMixin(dti)

expected = dti.repeat(3)
result = arr.repeat(3)
assert isinstance(result, DatetimeArrayMixin)

# placeholder until these become actual EA subclasses and we can use
# an EA-specific tm.assert_ function
tm.assert_index_equal(pd.Index(result), pd.Index(expected))

def test_tolist(self, datetime_index):
dti = datetime_index
arr = DatetimeArrayMixin(dti)

expected = dti.tolist()
result = arr.tolist()
assert expected == result

@pytest.mark.parametrize('freqstr', ['D', 'B', 'W', 'M', 'Q', 'Y'])
def test_to_perioddelta(self, datetime_index, freqstr):
dti = datetime_index
arr = DatetimeArrayMixin(dti)

expected = dti.to_perioddelta(freq=freqstr)
result = arr.to_perioddelta(freq=freqstr)
assert isinstance(result, TimedeltaArrayMixin)

# placeholder until these become actual EA subclasses and we can use
# an EA-specific tm.assert_ function
tm.assert_index_equal(pd.Index(result), pd.Index(expected))

@pytest.mark.parametrize('freqstr', ['D', 'B', 'W', 'M', 'Q', 'Y'])
def test_to_period(self, datetime_index, freqstr):
dti = datetime_index
Expand Down Expand Up @@ -156,6 +190,27 @@ def test_to_timestamp(self, how, period_index):
# an EA-specific tm.assert_ function
tm.assert_index_equal(pd.Index(result), pd.Index(expected))

# TODO: share this between Datetime/Timedelta/Period Array tests
def test_repeat(self, period_index):
pi = period_index
arr = PeriodArrayMixin(pi)

expected = pi.repeat(3)
result = arr.repeat(3)
assert isinstance(result, PeriodArrayMixin)

# placeholder until these become actual EA subclasses and we can use
# an EA-specific tm.assert_ function
tm.assert_index_equal(pd.Index(result), pd.Index(expected))

def test_tolist(self, period_index):
pi = period_index
arr = PeriodArrayMixin(pi)

expected = pi.tolist()
result = arr.tolist()
assert expected == result

@pytest.mark.parametrize('propname', pd.PeriodIndex._bool_ops)
def test_bool_properties(self, period_index, propname):
# in this case _bool_ops is just `is_leap_year`
Expand Down