Skip to content

Make Period - Period return DateOffset instead of int #21314

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jun 29, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,49 @@ Current Behavior:

.. _whatsnew_0240.api.datetimelike:


.. _whatsnew_0240.api.period_subtraction:

Period Subtraction
^^^^^^^^^^^^^^^^^^

Subtraction of a ``Period`` from another ``Period`` will give a ``DateOffset``.
instead of an integer (:issue:`21314`)

.. ipython:: python

june = pd.Period('June 2018')
april = pd.Period('April 2018')
june - april

Previous Behavior:

.. code-block:: ipython

In [2]: june = pd.Period('June 2018')

In [3]: april = pd.Period('April 2018')

In [4]: june - april
Out [4]: 2

Similarly, subtraction of a ``Period`` from a ``PeriodIndex`` will now return
an ``Index`` of ``DateOffset`` objects instead of an ``Int64Index``

.. ipython:: python

pi = pd.period_range('June 2018', freq='M', periods=3)
pi - pi[0]

Previous Behavior:

.. code-block:: ipython

In [2]: pi = pd.period_range('June 2018', freq='M', periods=3)

In [3]: pi - pi[0]
Out[3]: Int64Index([0, 1, 2], dtype='int64')

Datetimelike API Changes
^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down
7 changes: 5 additions & 2 deletions pandas/_libs/tslibs/period.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -1123,9 +1123,12 @@ cdef class _Period(object):
if other.freq != self.freq:
msg = _DIFFERENT_FREQ.format(self.freqstr, other.freqstr)
raise IncompatibleFrequency(msg)
return self.ordinal - other.ordinal
return (self.ordinal - other.ordinal) * self.freq
elif getattr(other, '_typ', None) == 'periodindex':
return -other.__sub__(self)
# GH#21314 PeriodIndex - Period returns an object-index
# of DateOffset objects, for which we cannot use __neg__
# directly, so we have to apply it pointwise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a test that hits this explicity?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Without this we get test failures in tests.indexes.period.test_arithmetic.TestPeriodIndexSeriesMethods for
test_pi_ops, test_pi_sub_period, test_pi_sub_period_nat

return other.__sub__(self).map(lambda x: -x)
else: # pragma: no cover
return NotImplemented
elif is_period_object(other):
Expand Down
7 changes: 6 additions & 1 deletion pandas/core/indexes/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -899,7 +899,9 @@ def __add__(self, other):
raise TypeError("cannot add {dtype}-dtype to {cls}"
.format(dtype=other.dtype,
cls=type(self).__name__))

elif is_categorical_dtype(other):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can make this is_extension_dtype (here and below)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should work for now, but it isn't obvious to me how it will shake out as more EA subclasses get implemented. Let me give this some thought.

# Categorical op will raise; defer explicitly
return NotImplemented
else: # pragma: no cover
return NotImplemented

Expand Down Expand Up @@ -964,6 +966,9 @@ def __sub__(self, other):
raise TypeError("cannot subtract {dtype}-dtype from {cls}"
.format(dtype=other.dtype,
cls=type(self).__name__))
elif is_categorical_dtype(other):
# Categorical op will raise; defer explicitly
return NotImplemented
else: # pragma: no cover
return NotImplemented

Expand Down
13 changes: 8 additions & 5 deletions pandas/core/indexes/period.py
Original file line number Diff line number Diff line change
Expand Up @@ -551,13 +551,14 @@ def is_all_dates(self):
@property
def is_full(self):
"""
Returns True if there are any missing periods from start to end
Returns True if this PeriodIndex is range-like in that all Periods
between start and end are present, in order.
"""
if len(self) == 0:
return True
if not self.is_monotonic:
raise ValueError('Index is not monotonic')
values = self.values
values = self.asi8
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the docstring for is_full correct? It looks to me like 1) it is flipped True/False and 2) the word "missing" is ambiguous since it can refer to pd.NaT/np.nan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel free to update, I don't remember why this was added / what is purpose is

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I don't remember even why we have this (can follow up later)

return ((values[1:] - values[:-1]) < 2).all()

@property
Expand Down Expand Up @@ -761,17 +762,19 @@ def _sub_datelike(self, other):
return NotImplemented

def _sub_period(self, other):
# If the operation is well-defined, we return an object-Index
# of DateOffsets. Null entries are filled with pd.NaT
if self.freq != other.freq:
msg = _DIFFERENT_FREQ_INDEX.format(self.freqstr, other.freqstr)
raise IncompatibleFrequency(msg)

asi8 = self.asi8
new_data = asi8 - other.ordinal
new_data = np.array([self.freq * x for x in new_data])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to implement DateOffset * arraylike so we don't have to right this broadcasting repeatedly.


if self.hasnans:
new_data = new_data.astype(np.float64)
new_data[self._isnan] = np.nan
# result must be Int64Index or Float64Index
new_data[self._isnan] = tslib.NaT

return Index(new_data)

def shift(self, n):
Expand Down
9 changes: 5 additions & 4 deletions pandas/tests/frame/test_arithmetic.py
Original file line number Diff line number Diff line change
Expand Up @@ -258,9 +258,10 @@ def test_ops_frame_period(self):
assert df['B'].dtype == object

p = pd.Period('2015-03', freq='M')
off = p.freq
# dtype will be object because of original dtype
exp = pd.DataFrame({'A': np.array([2, 1], dtype=object),
'B': np.array([14, 13], dtype=object)})
exp = pd.DataFrame({'A': np.array([2 * off, 1 * off], dtype=object),
'B': np.array([14 * off, 13 * off], dtype=object)})
tm.assert_frame_equal(p - df, exp)
tm.assert_frame_equal(df - p, -1 * exp)

Expand All @@ -271,7 +272,7 @@ def test_ops_frame_period(self):
assert df2['A'].dtype == object
assert df2['B'].dtype == object

exp = pd.DataFrame({'A': np.array([4, 4], dtype=object),
'B': np.array([16, 16], dtype=object)})
exp = pd.DataFrame({'A': np.array([4 * off, 4 * off], dtype=object),
'B': np.array([16 * off, 16 * off], dtype=object)})
tm.assert_frame_equal(df2 - df, exp)
tm.assert_frame_equal(df - df2, -1 * exp)
15 changes: 9 additions & 6 deletions pandas/tests/indexes/period/test_arithmetic.py
Original file line number Diff line number Diff line change
Expand Up @@ -730,11 +730,12 @@ def test_pi_ops(self):

self._check(idx + 2, lambda x: x - 2, idx)
result = idx - Period('2011-01', freq='M')
exp = pd.Index([0, 1, 2, 3], name='idx')
off = idx.freq
exp = pd.Index([0 * off, 1 * off, 2 * off, 3 * off], name='idx')
tm.assert_index_equal(result, exp)

result = Period('2011-01', freq='M') - idx
exp = pd.Index([0, -1, -2, -3], name='idx')
exp = pd.Index([0 * off, -1 * off, -2 * off, -3 * off], name='idx')
tm.assert_index_equal(result, exp)

@pytest.mark.parametrize('ng', ["str", 1.5])
Expand Down Expand Up @@ -864,14 +865,15 @@ def test_pi_sub_period(self):
freq='M', name='idx')

result = idx - pd.Period('2012-01', freq='M')
exp = pd.Index([-12, -11, -10, -9], name='idx')
off = idx.freq
exp = pd.Index([-12 * off, -11 * off, -10 * off, -9 * off], name='idx')
tm.assert_index_equal(result, exp)

result = np.subtract(idx, pd.Period('2012-01', freq='M'))
tm.assert_index_equal(result, exp)

result = pd.Period('2012-01', freq='M') - idx
exp = pd.Index([12, 11, 10, 9], name='idx')
exp = pd.Index([12 * off, 11 * off, 10 * off, 9 * off], name='idx')
tm.assert_index_equal(result, exp)

result = np.subtract(pd.Period('2012-01', freq='M'), idx)
Expand All @@ -898,11 +900,12 @@ def test_pi_sub_period_nat(self):
freq='M', name='idx')

result = idx - pd.Period('2012-01', freq='M')
exp = pd.Index([-12, np.nan, -10, -9], name='idx')
off = idx.freq
exp = pd.Index([-12 * off, pd.NaT, -10 * off, -9 * off], name='idx')
tm.assert_index_equal(result, exp)

result = pd.Period('2012-01', freq='M') - idx
exp = pd.Index([12, np.nan, 10, 9], name='idx')
exp = pd.Index([12 * off, pd.NaT, 10 * off, 9 * off], name='idx')
tm.assert_index_equal(result, exp)

exp = pd.TimedeltaIndex([np.nan, np.nan, np.nan, np.nan], name='idx')
Expand Down
7 changes: 4 additions & 3 deletions pandas/tests/scalar/period/test_period.py
Original file line number Diff line number Diff line change
Expand Up @@ -572,7 +572,7 @@ def test_strftime(self):
def test_sub_delta(self):
left, right = Period('2011', freq='A'), Period('2007', freq='A')
result = left - right
assert result == 4
assert result == 4 * right.freq

with pytest.raises(period.IncompatibleFrequency):
left - Period('2007-01', freq='M')
Expand Down Expand Up @@ -1064,8 +1064,9 @@ def test_sub(self):
dt1 = Period('2011-01-01', freq='D')
dt2 = Period('2011-01-15', freq='D')

assert dt1 - dt2 == -14
assert dt2 - dt1 == 14
off = dt1.freq
assert dt1 - dt2 == -14 * off
assert dt2 - dt1 == 14 * off

msg = r"Input has different freq=M from Period\(freq=D\)"
with tm.assert_raises_regex(period.IncompatibleFrequency, msg):
Expand Down
5 changes: 3 additions & 2 deletions pandas/tests/series/test_arithmetic.py
Original file line number Diff line number Diff line change
Expand Up @@ -517,16 +517,17 @@ def test_ops_series_period(self):
assert ser.dtype == object

per = pd.Period('2015-01-10', freq='D')
off = per.freq
# dtype will be object because of original dtype
expected = pd.Series([9, 8], name='xxx', dtype=object)
expected = pd.Series([9 * off, 8 * off], name='xxx', dtype=object)
tm.assert_series_equal(per - ser, expected)
tm.assert_series_equal(ser - per, -1 * expected)

s2 = pd.Series([pd.Period('2015-01-05', freq='D'),
pd.Period('2015-01-04', freq='D')], name='xxx')
assert s2.dtype == object

expected = pd.Series([4, 2], name='xxx', dtype=object)
expected = pd.Series([4 * off, 2 * off], name='xxx', dtype=object)
tm.assert_series_equal(s2 - ser, expected)
tm.assert_series_equal(ser - s2, -1 * expected)

Expand Down