Skip to content

ENH: PeriodIndex can accept freq with mult #7832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 3, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 22 additions & 5 deletions doc/source/timeseries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -591,7 +591,7 @@ various docstrings for the classes.
These operations (``apply``, ``rollforward`` and ``rollback``) preserves time (hour, minute, etc) information by default. To reset time, use ``normalize=True`` keyword when creating the offset instance. If ``normalize=True``, result is normalized after the function is applied.


.. ipython:: python
.. ipython:: python

day = Day()
day.apply(Timestamp('2014-01-01 09:00'))
Expand Down Expand Up @@ -1257,8 +1257,10 @@ be created with the convenience function ``period_range``.

Period
~~~~~~

A ``Period`` represents a span of time (e.g., a day, a month, a quarter, etc).
It can be created using a frequency alias:
You can specify the span via ``freq`` keyword using a frequency alias like below.
Because ``freq`` represents a span of ``Period``, it cannot be negative like "-3D".

.. ipython:: python

Expand All @@ -1268,11 +1270,10 @@ It can be created using a frequency alias:

Period('2012-1-1 19:00', freq='H')

Unlike time stamped data, pandas does not support frequencies at multiples of
DateOffsets (e.g., '3Min') for periods.
Period('2012-1-1 19:00', freq='5H')

Adding and subtracting integers from periods shifts the period by its own
frequency.
frequency. Arithmetic is not allowed between ``Period`` with different ``freq`` (span).

.. ipython:: python

Expand All @@ -1282,6 +1283,15 @@ frequency.

p - 3

p = Period('2012-01', freq='2M')

p + 2

p - 1

p == Period('2012-01', freq='3M')


If ``Period`` freq is daily or higher (``D``, ``H``, ``T``, ``S``, ``L``, ``U``, ``N``), ``offsets`` and ``timedelta``-like can be added if the result can have the same freq. Otherise, ``ValueError`` will be raised.

.. ipython:: python
Expand Down Expand Up @@ -1335,6 +1345,13 @@ The ``PeriodIndex`` constructor can also be used directly:

PeriodIndex(['2011-1', '2011-2', '2011-3'], freq='M')

Passing multiplied frequency outputs a sequence of ``Period`` which
has multiplied span.

.. ipython:: python

PeriodIndex(start='2014-01', freq='3M', periods=4)

Just like ``DatetimeIndex``, a ``PeriodIndex`` can also be used to index pandas
objects:

Expand Down
27 changes: 26 additions & 1 deletion doc/source/whatsnew/v0.17.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,32 @@ We are now supporting a ``Series.dt.strftime`` method for datetime-likes to gene

The string format is as the python standard library and details can be found `here <https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior>`_

.. _whatsnew_0170.periodfreq:

Period Frequency Enhancement
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``Period``, ``PeriodIndex`` and ``period_range`` can now accept multiplied freq. Also, ``Period.freq`` and ``PeriodIndex.freq`` are now stored as ``DateOffset`` instance like ``DatetimeIndex``, not ``str`` (:issue:`7811`)

Multiplied freq represents a span of corresponding length. Below example creates a period of 3 days. Addition and subtraction will shift the period by its span.

.. ipython:: python

p = pd.Period('2015-08-01', freq='3D')
p
p + 1
p - 2
p.to_timestamp()
p.to_timestamp(how='E')

You can use multiplied freq in ``PeriodIndex`` and ``period_range``.

.. ipython:: python

idx = pd.period_range('2015-08-01', periods=4, freq='2D')
idx
idx + 1

.. _whatsnew_0170.enhancements.sas_xport:

Support for SAS XPORT files
Expand Down Expand Up @@ -183,7 +209,6 @@ Other enhancements
- ``pandas.tseries.offsets`` larger than the ``Day`` offset can now be used with with ``Series`` for addition/subtraction (:issue:`10699`). See the :ref:`Documentation <timeseries.offsetseries>` for more details.

- ``.as_blocks`` will now take a ``copy`` optional argument to return a copy of the data, default is to copy (no change in behavior from prior versions), (:issue:`9607`)

- ``regex`` argument to ``DataFrame.filter`` now handles numeric column names instead of raising ``ValueError`` (:issue:`10384`).
- ``pd.read_stata`` will now read Stata 118 type files. (:issue:`9882`)

Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
14 changes: 13 additions & 1 deletion pandas/io/tests/test_pickle.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@
from pandas.compat import u
from pandas.util.misc import is_little_endian
import pandas
from pandas.tseries.offsets import Day, MonthEnd


class TestPickle():
"""
Expand Down Expand Up @@ -90,6 +92,10 @@ def read_pickles(self, version):
if 'ts' in data['series']:
self._validate_timeseries(data['series']['ts'], self.data['series']['ts'])
self._validate_frequency(data['series']['ts'])
if 'index' in data:
if 'period' in data['index']:
self._validate_periodindex(data['index']['period'],
self.data['index']['period'])
n += 1
assert n > 0, 'Pickle files are not tested'

Expand Down Expand Up @@ -162,7 +168,6 @@ def _validate_timeseries(self, pickled, current):

def _validate_frequency(self, pickled):
# GH 9291
from pandas.tseries.offsets import Day
freq = pickled.index.freq
result = freq + Day(1)
tm.assert_equal(result, Day(2))
Expand All @@ -175,6 +180,13 @@ def _validate_frequency(self, pickled):
tm.assert_equal(isinstance(result, pandas.Timedelta), True)
tm.assert_equal(result, pandas.Timedelta(days=1, nanoseconds=1))

def _validate_periodindex(self, pickled, current):
tm.assert_index_equal(pickled, current)
tm.assertIsInstance(pickled.freq, MonthEnd)
tm.assert_equal(pickled.freq, MonthEnd())
tm.assert_equal(pickled.freqstr, 'M')
tm.assert_index_equal(pickled.shift(2), current.shift(2))


if __name__ == '__main__':
import nose
Expand Down
97 changes: 61 additions & 36 deletions pandas/src/period.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -615,6 +615,9 @@ cdef ndarray[int64_t] localize_dt64arr_to_period(ndarray[int64_t] stamps,
return result


_DIFFERENT_FREQ_ERROR = "Input has different freq={1} from Period(freq={0})"


cdef class Period(object):
"""
Represents an period of time
Expand All @@ -624,8 +627,7 @@ cdef class Period(object):
value : Period or compat.string_types, default None
The time period represented (e.g., '4Q2005')
freq : str, default None
e.g., 'B' for businessday. Must be a singular rule-code (e.g. 5T is not
allowed).
One of pandas period strings or corresponding objects
year : int, default None
month : int, default 1
quarter : int, default None
Expand All @@ -641,12 +643,33 @@ cdef class Period(object):
_comparables = ['name','freqstr']
_typ = 'period'

@classmethod
def _maybe_convert_freq(cls, object freq):

if isinstance(freq, compat.string_types):
from pandas.tseries.frequencies import _period_alias_dict
freq = _period_alias_dict.get(freq, freq)
elif isinstance(freq, (int, tuple)):
from pandas.tseries.frequencies import get_freq_code as _gfc
from pandas.tseries.frequencies import _get_freq_str
code, stride = _gfc(freq)
freq = _get_freq_str(code, stride)

from pandas.tseries.frequencies import to_offset
freq = to_offset(freq)

if freq.n <= 0:
raise ValueError('Frequency must be positive, because it'
' represents span: {0}'.format(freq.freqstr))

return freq

@classmethod
def _from_ordinal(cls, ordinal, freq):
""" fast creation from an ordinal and freq that are already validated! """
self = Period.__new__(cls)
self.ordinal = ordinal
self.freq = freq
self.freq = cls._maybe_convert_freq(freq)
return self

def __init__(self, value=None, freq=None, ordinal=None,
Expand All @@ -659,8 +682,6 @@ cdef class Period(object):
# periods such as A, Q, etc. Every five minutes would be, e.g.,
# ('T', 5) but may be passed in as a string like '5T'

self.freq = None

# ordinal is the period offset from the gregorian proleptic epoch

if ordinal is not None and value is not None:
Expand All @@ -675,9 +696,8 @@ cdef class Period(object):
elif value is None:
if freq is None:
raise ValueError("If value is None, freq cannot be None")

ordinal = _ordinal_from_fields(year, month, quarter, day,
hour, minute, second, freq)
hour, minute, second, freq)

elif isinstance(value, Period):
other = value
Expand All @@ -698,8 +718,8 @@ cdef class Period(object):
if lib.is_integer(value):
value = str(value)
value = value.upper()

dt, _, reso = parse_time_string(value, freq)

if freq is None:
try:
freq = frequencies.Resolution.get_freq(reso)
Expand All @@ -723,24 +743,22 @@ cdef class Period(object):
raise ValueError(msg)

base, mult = _gfc(freq)
if mult != 1:
# TODO: Better error message - this is slightly confusing
raise ValueError('Only mult == 1 supported')

if ordinal is None:
self.ordinal = get_period_ordinal(dt.year, dt.month, dt.day,
dt.hour, dt.minute, dt.second, dt.microsecond, 0,
base)
dt.hour, dt.minute, dt.second,
dt.microsecond, 0, base)
else:
self.ordinal = ordinal

self.freq = frequencies._get_freq_str(base)
self.freq = self._maybe_convert_freq(freq)

def __richcmp__(self, other, op):
if isinstance(other, Period):
from pandas.tseries.frequencies import get_freq_code as _gfc
if other.freq != self.freq:
raise ValueError("Cannot compare non-conforming periods")
msg = _DIFFERENT_FREQ_ERROR.format(self.freqstr, other.freqstr)
raise ValueError(msg)
if self.ordinal == tslib.iNaT or other.ordinal == tslib.iNaT:
return _nat_scalar_rules[op]
return PyObject_RichCompareBool(self.ordinal, other.ordinal, op)
Expand All @@ -758,7 +776,7 @@ cdef class Period(object):
def _add_delta(self, other):
from pandas.tseries import frequencies
if isinstance(other, (timedelta, np.timedelta64, offsets.Tick, Timedelta)):
offset = frequencies.to_offset(self.freq)
offset = frequencies.to_offset(self.freq.rule_code)
if isinstance(offset, offsets.Tick):
nanos = tslib._delta_to_nanoseconds(other)
offset_nanos = tslib._delta_to_nanoseconds(offset)
Expand All @@ -769,18 +787,21 @@ cdef class Period(object):
else:
ordinal = self.ordinal + (nanos // offset_nanos)
return Period(ordinal=ordinal, freq=self.freq)
msg = 'Input cannnot be converted to Period(freq={0})'
raise ValueError(msg)
elif isinstance(other, offsets.DateOffset):
freqstr = frequencies.get_standard_freq(other)
base = frequencies.get_base_alias(freqstr)

if base == self.freq:
if base == self.freq.rule_code:
if self.ordinal == tslib.iNaT:
ordinal = self.ordinal
else:
ordinal = self.ordinal + other.n
return Period(ordinal=ordinal, freq=self.freq)

raise ValueError("Input has different freq from Period(freq={0})".format(self.freq))
msg = _DIFFERENT_FREQ_ERROR.format(self.freqstr, other.freqstr)
raise ValueError(msg)
else: # pragma no cover
return NotImplemented

def __add__(self, other):
if isinstance(other, (timedelta, np.timedelta64,
Expand All @@ -790,7 +811,7 @@ cdef class Period(object):
if self.ordinal == tslib.iNaT:
ordinal = self.ordinal
else:
ordinal = self.ordinal + other
ordinal = self.ordinal + other * self.freq.n
return Period(ordinal=ordinal, freq=self.freq)
else: # pragma: no cover
return NotImplemented
Expand All @@ -804,7 +825,7 @@ cdef class Period(object):
if self.ordinal == tslib.iNaT:
ordinal = self.ordinal
else:
ordinal = self.ordinal - other
ordinal = self.ordinal - other * self.freq.n
return Period(ordinal=ordinal, freq=self.freq)
elif isinstance(other, Period):
if other.freq != self.freq:
Expand Down Expand Up @@ -836,13 +857,18 @@ cdef class Period(object):
base1, mult1 = _gfc(self.freq)
base2, mult2 = _gfc(freq)

if mult2 != 1:
raise ValueError('Only mult == 1 supported')

end = how == 'E'
new_ordinal = period_asfreq(self.ordinal, base1, base2, end)
if self.ordinal == tslib.iNaT:
ordinal = self.ordinal
else:
# mult1 can't be negative or 0
end = how == 'E'
if end:
ordinal = self.ordinal + mult1 - 1
else:
ordinal = self.ordinal
ordinal = period_asfreq(ordinal, base1, base2, end)

return Period(ordinal=new_ordinal, freq=base2)
return Period(ordinal=ordinal, freq=freq)

@property
def start_time(self):
Expand All @@ -853,7 +879,8 @@ cdef class Period(object):
if self.ordinal == tslib.iNaT:
ordinal = self.ordinal
else:
ordinal = (self + 1).start_time.value - 1
# freq.n can't be negative or 0
ordinal = (self + self.freq.n).start_time.value - 1
return Timestamp(ordinal)

def to_timestamp(self, freq=None, how='start', tz=None):
Expand Down Expand Up @@ -947,14 +974,15 @@ cdef class Period(object):
def __str__(self):
return self.__unicode__()

@property
def freqstr(self):
return self.freq.freqstr

def __repr__(self):
from pandas.tseries import frequencies
from pandas.tseries.frequencies import get_freq_code as _gfc
base, mult = _gfc(self.freq)
formatted = period_format(self.ordinal, base)
freqstr = frequencies._reverse_period_code_map[base]

return "Period('%s', '%s')" % (formatted, freqstr)
return "Period('%s', '%s')" % (formatted, self.freqstr)

def __unicode__(self):
"""
Expand Down Expand Up @@ -1123,9 +1151,6 @@ def _ordinal_from_fields(year, month, quarter, day, hour, minute,
second, freq):
from pandas.tseries.frequencies import get_freq_code as _gfc
base, mult = _gfc(freq)
if mult != 1:
raise ValueError('Only mult == 1 supported')

if quarter is not None:
year, month = _quarter_to_myear(year, quarter, freq)

Expand Down
Loading