Skip to content

Commit 8aeaf02

Browse files
committed
Merge pull request #7832 from sinhrks/period_mult
ENH: PeriodIndex can accept freq with mult
2 parents 9aafd6d + 2d870f9 commit 8aeaf02

18 files changed

+1063
-360
lines changed

doc/source/timeseries.rst

+22-5
Original file line numberDiff line numberDiff line change
@@ -591,7 +591,7 @@ various docstrings for the classes.
591591
These operations (``apply``, ``rollforward`` and ``rollback``) preserves time (hour, minute, etc) information by default. To reset time, use ``normalize=True`` keyword when creating the offset instance. If ``normalize=True``, result is normalized after the function is applied.
592592

593593

594-
.. ipython:: python
594+
.. ipython:: python
595595
596596
day = Day()
597597
day.apply(Timestamp('2014-01-01 09:00'))
@@ -1257,8 +1257,10 @@ be created with the convenience function ``period_range``.
12571257

12581258
Period
12591259
~~~~~~
1260+
12601261
A ``Period`` represents a span of time (e.g., a day, a month, a quarter, etc).
1261-
It can be created using a frequency alias:
1262+
You can specify the span via ``freq`` keyword using a frequency alias like below.
1263+
Because ``freq`` represents a span of ``Period``, it cannot be negative like "-3D".
12621264

12631265
.. ipython:: python
12641266
@@ -1268,11 +1270,10 @@ It can be created using a frequency alias:
12681270
12691271
Period('2012-1-1 19:00', freq='H')
12701272
1271-
Unlike time stamped data, pandas does not support frequencies at multiples of
1272-
DateOffsets (e.g., '3Min') for periods.
1273+
Period('2012-1-1 19:00', freq='5H')
12731274
12741275
Adding and subtracting integers from periods shifts the period by its own
1275-
frequency.
1276+
frequency. Arithmetic is not allowed between ``Period`` with different ``freq`` (span).
12761277

12771278
.. ipython:: python
12781279
@@ -1282,6 +1283,15 @@ frequency.
12821283
12831284
p - 3
12841285
1286+
p = Period('2012-01', freq='2M')
1287+
1288+
p + 2
1289+
1290+
p - 1
1291+
1292+
p == Period('2012-01', freq='3M')
1293+
1294+
12851295
If ``Period`` freq is daily or higher (``D``, ``H``, ``T``, ``S``, ``L``, ``U``, ``N``), ``offsets`` and ``timedelta``-like can be added if the result can have the same freq. Otherise, ``ValueError`` will be raised.
12861296

12871297
.. ipython:: python
@@ -1335,6 +1345,13 @@ The ``PeriodIndex`` constructor can also be used directly:
13351345
13361346
PeriodIndex(['2011-1', '2011-2', '2011-3'], freq='M')
13371347
1348+
Passing multiplied frequency outputs a sequence of ``Period`` which
1349+
has multiplied span.
1350+
1351+
.. ipython:: python
1352+
1353+
PeriodIndex(start='2014-01', freq='3M', periods=4)
1354+
13381355
Just like ``DatetimeIndex``, a ``PeriodIndex`` can also be used to index pandas
13391356
objects:
13401357

doc/source/whatsnew/v0.17.0.txt

+26-1
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,32 @@ We are now supporting a ``Series.dt.strftime`` method for datetime-likes to gene
120120

121121
The string format is as the python standard library and details can be found `here <https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior>`_
122122

123+
.. _whatsnew_0170.periodfreq:
124+
125+
Period Frequency Enhancement
126+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
127+
128+
``Period``, ``PeriodIndex`` and ``period_range`` can now accept multiplied freq. Also, ``Period.freq`` and ``PeriodIndex.freq`` are now stored as ``DateOffset`` instance like ``DatetimeIndex``, not ``str`` (:issue:`7811`)
129+
130+
Multiplied freq represents a span of corresponding length. Below example creates a period of 3 days. Addition and subtraction will shift the period by its span.
131+
132+
.. ipython:: python
133+
134+
p = pd.Period('2015-08-01', freq='3D')
135+
p
136+
p + 1
137+
p - 2
138+
p.to_timestamp()
139+
p.to_timestamp(how='E')
140+
141+
You can use multiplied freq in ``PeriodIndex`` and ``period_range``.
142+
143+
.. ipython:: python
144+
145+
idx = pd.period_range('2015-08-01', periods=4, freq='2D')
146+
idx
147+
idx + 1
148+
123149
.. _whatsnew_0170.enhancements.sas_xport:
124150

125151
Support for SAS XPORT files
@@ -198,7 +224,6 @@ Other enhancements
198224
- ``pd.Timedelta.total_seconds()`` now returns Timedelta duration to ns precision (previously microsecond precision) (:issue: `10939`)
199225

200226
- ``.as_blocks`` will now take a ``copy`` optional argument to return a copy of the data, default is to copy (no change in behavior from prior versions), (:issue:`9607`)
201-
202227
- ``regex`` argument to ``DataFrame.filter`` now handles numeric column names instead of raising ``ValueError`` (:issue:`10384`).
203228
- ``pd.read_stata`` will now read Stata 118 type files. (:issue:`9882`)
204229

Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

pandas/io/tests/test_pickle.py

+13-1
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@
1717
from pandas.compat import u
1818
from pandas.util.misc import is_little_endian
1919
import pandas
20+
from pandas.tseries.offsets import Day, MonthEnd
21+
2022

2123
class TestPickle():
2224
"""
@@ -90,6 +92,10 @@ def read_pickles(self, version):
9092
if 'ts' in data['series']:
9193
self._validate_timeseries(data['series']['ts'], self.data['series']['ts'])
9294
self._validate_frequency(data['series']['ts'])
95+
if 'index' in data:
96+
if 'period' in data['index']:
97+
self._validate_periodindex(data['index']['period'],
98+
self.data['index']['period'])
9399
n += 1
94100
assert n > 0, 'Pickle files are not tested'
95101

@@ -162,7 +168,6 @@ def _validate_timeseries(self, pickled, current):
162168

163169
def _validate_frequency(self, pickled):
164170
# GH 9291
165-
from pandas.tseries.offsets import Day
166171
freq = pickled.index.freq
167172
result = freq + Day(1)
168173
tm.assert_equal(result, Day(2))
@@ -175,6 +180,13 @@ def _validate_frequency(self, pickled):
175180
tm.assert_equal(isinstance(result, pandas.Timedelta), True)
176181
tm.assert_equal(result, pandas.Timedelta(days=1, nanoseconds=1))
177182

183+
def _validate_periodindex(self, pickled, current):
184+
tm.assert_index_equal(pickled, current)
185+
tm.assertIsInstance(pickled.freq, MonthEnd)
186+
tm.assert_equal(pickled.freq, MonthEnd())
187+
tm.assert_equal(pickled.freqstr, 'M')
188+
tm.assert_index_equal(pickled.shift(2), current.shift(2))
189+
178190

179191
if __name__ == '__main__':
180192
import nose

pandas/src/period.pyx

+61-36
Original file line numberDiff line numberDiff line change
@@ -615,6 +615,9 @@ cdef ndarray[int64_t] localize_dt64arr_to_period(ndarray[int64_t] stamps,
615615
return result
616616

617617

618+
_DIFFERENT_FREQ_ERROR = "Input has different freq={1} from Period(freq={0})"
619+
620+
618621
cdef class Period(object):
619622
"""
620623
Represents an period of time
@@ -624,8 +627,7 @@ cdef class Period(object):
624627
value : Period or compat.string_types, default None
625628
The time period represented (e.g., '4Q2005')
626629
freq : str, default None
627-
e.g., 'B' for businessday. Must be a singular rule-code (e.g. 5T is not
628-
allowed).
630+
One of pandas period strings or corresponding objects
629631
year : int, default None
630632
month : int, default 1
631633
quarter : int, default None
@@ -641,12 +643,33 @@ cdef class Period(object):
641643
_comparables = ['name','freqstr']
642644
_typ = 'period'
643645

646+
@classmethod
647+
def _maybe_convert_freq(cls, object freq):
648+
649+
if isinstance(freq, compat.string_types):
650+
from pandas.tseries.frequencies import _period_alias_dict
651+
freq = _period_alias_dict.get(freq, freq)
652+
elif isinstance(freq, (int, tuple)):
653+
from pandas.tseries.frequencies import get_freq_code as _gfc
654+
from pandas.tseries.frequencies import _get_freq_str
655+
code, stride = _gfc(freq)
656+
freq = _get_freq_str(code, stride)
657+
658+
from pandas.tseries.frequencies import to_offset
659+
freq = to_offset(freq)
660+
661+
if freq.n <= 0:
662+
raise ValueError('Frequency must be positive, because it'
663+
' represents span: {0}'.format(freq.freqstr))
664+
665+
return freq
666+
644667
@classmethod
645668
def _from_ordinal(cls, ordinal, freq):
646669
""" fast creation from an ordinal and freq that are already validated! """
647670
self = Period.__new__(cls)
648671
self.ordinal = ordinal
649-
self.freq = freq
672+
self.freq = cls._maybe_convert_freq(freq)
650673
return self
651674

652675
def __init__(self, value=None, freq=None, ordinal=None,
@@ -659,8 +682,6 @@ cdef class Period(object):
659682
# periods such as A, Q, etc. Every five minutes would be, e.g.,
660683
# ('T', 5) but may be passed in as a string like '5T'
661684

662-
self.freq = None
663-
664685
# ordinal is the period offset from the gregorian proleptic epoch
665686

666687
if ordinal is not None and value is not None:
@@ -675,9 +696,8 @@ cdef class Period(object):
675696
elif value is None:
676697
if freq is None:
677698
raise ValueError("If value is None, freq cannot be None")
678-
679699
ordinal = _ordinal_from_fields(year, month, quarter, day,
680-
hour, minute, second, freq)
700+
hour, minute, second, freq)
681701

682702
elif isinstance(value, Period):
683703
other = value
@@ -698,8 +718,8 @@ cdef class Period(object):
698718
if lib.is_integer(value):
699719
value = str(value)
700720
value = value.upper()
701-
702721
dt, _, reso = parse_time_string(value, freq)
722+
703723
if freq is None:
704724
try:
705725
freq = frequencies.Resolution.get_freq(reso)
@@ -723,24 +743,22 @@ cdef class Period(object):
723743
raise ValueError(msg)
724744

725745
base, mult = _gfc(freq)
726-
if mult != 1:
727-
# TODO: Better error message - this is slightly confusing
728-
raise ValueError('Only mult == 1 supported')
729746

730747
if ordinal is None:
731748
self.ordinal = get_period_ordinal(dt.year, dt.month, dt.day,
732-
dt.hour, dt.minute, dt.second, dt.microsecond, 0,
733-
base)
749+
dt.hour, dt.minute, dt.second,
750+
dt.microsecond, 0, base)
734751
else:
735752
self.ordinal = ordinal
736753

737-
self.freq = frequencies._get_freq_str(base)
754+
self.freq = self._maybe_convert_freq(freq)
738755

739756
def __richcmp__(self, other, op):
740757
if isinstance(other, Period):
741758
from pandas.tseries.frequencies import get_freq_code as _gfc
742759
if other.freq != self.freq:
743-
raise ValueError("Cannot compare non-conforming periods")
760+
msg = _DIFFERENT_FREQ_ERROR.format(self.freqstr, other.freqstr)
761+
raise ValueError(msg)
744762
if self.ordinal == tslib.iNaT or other.ordinal == tslib.iNaT:
745763
return _nat_scalar_rules[op]
746764
return PyObject_RichCompareBool(self.ordinal, other.ordinal, op)
@@ -758,7 +776,7 @@ cdef class Period(object):
758776
def _add_delta(self, other):
759777
from pandas.tseries import frequencies
760778
if isinstance(other, (timedelta, np.timedelta64, offsets.Tick, Timedelta)):
761-
offset = frequencies.to_offset(self.freq)
779+
offset = frequencies.to_offset(self.freq.rule_code)
762780
if isinstance(offset, offsets.Tick):
763781
nanos = tslib._delta_to_nanoseconds(other)
764782
offset_nanos = tslib._delta_to_nanoseconds(offset)
@@ -769,18 +787,21 @@ cdef class Period(object):
769787
else:
770788
ordinal = self.ordinal + (nanos // offset_nanos)
771789
return Period(ordinal=ordinal, freq=self.freq)
790+
msg = 'Input cannnot be converted to Period(freq={0})'
791+
raise ValueError(msg)
772792
elif isinstance(other, offsets.DateOffset):
773793
freqstr = frequencies.get_standard_freq(other)
774794
base = frequencies.get_base_alias(freqstr)
775-
776-
if base == self.freq:
795+
if base == self.freq.rule_code:
777796
if self.ordinal == tslib.iNaT:
778797
ordinal = self.ordinal
779798
else:
780799
ordinal = self.ordinal + other.n
781800
return Period(ordinal=ordinal, freq=self.freq)
782-
783-
raise ValueError("Input has different freq from Period(freq={0})".format(self.freq))
801+
msg = _DIFFERENT_FREQ_ERROR.format(self.freqstr, other.freqstr)
802+
raise ValueError(msg)
803+
else: # pragma no cover
804+
return NotImplemented
784805

785806
def __add__(self, other):
786807
if isinstance(other, (timedelta, np.timedelta64,
@@ -790,7 +811,7 @@ cdef class Period(object):
790811
if self.ordinal == tslib.iNaT:
791812
ordinal = self.ordinal
792813
else:
793-
ordinal = self.ordinal + other
814+
ordinal = self.ordinal + other * self.freq.n
794815
return Period(ordinal=ordinal, freq=self.freq)
795816
else: # pragma: no cover
796817
return NotImplemented
@@ -804,7 +825,7 @@ cdef class Period(object):
804825
if self.ordinal == tslib.iNaT:
805826
ordinal = self.ordinal
806827
else:
807-
ordinal = self.ordinal - other
828+
ordinal = self.ordinal - other * self.freq.n
808829
return Period(ordinal=ordinal, freq=self.freq)
809830
elif isinstance(other, Period):
810831
if other.freq != self.freq:
@@ -836,13 +857,18 @@ cdef class Period(object):
836857
base1, mult1 = _gfc(self.freq)
837858
base2, mult2 = _gfc(freq)
838859

839-
if mult2 != 1:
840-
raise ValueError('Only mult == 1 supported')
841-
842-
end = how == 'E'
843-
new_ordinal = period_asfreq(self.ordinal, base1, base2, end)
860+
if self.ordinal == tslib.iNaT:
861+
ordinal = self.ordinal
862+
else:
863+
# mult1 can't be negative or 0
864+
end = how == 'E'
865+
if end:
866+
ordinal = self.ordinal + mult1 - 1
867+
else:
868+
ordinal = self.ordinal
869+
ordinal = period_asfreq(ordinal, base1, base2, end)
844870

845-
return Period(ordinal=new_ordinal, freq=base2)
871+
return Period(ordinal=ordinal, freq=freq)
846872

847873
@property
848874
def start_time(self):
@@ -853,7 +879,8 @@ cdef class Period(object):
853879
if self.ordinal == tslib.iNaT:
854880
ordinal = self.ordinal
855881
else:
856-
ordinal = (self + 1).start_time.value - 1
882+
# freq.n can't be negative or 0
883+
ordinal = (self + self.freq.n).start_time.value - 1
857884
return Timestamp(ordinal)
858885

859886
def to_timestamp(self, freq=None, how='start', tz=None):
@@ -947,14 +974,15 @@ cdef class Period(object):
947974
def __str__(self):
948975
return self.__unicode__()
949976

977+
@property
978+
def freqstr(self):
979+
return self.freq.freqstr
980+
950981
def __repr__(self):
951-
from pandas.tseries import frequencies
952982
from pandas.tseries.frequencies import get_freq_code as _gfc
953983
base, mult = _gfc(self.freq)
954984
formatted = period_format(self.ordinal, base)
955-
freqstr = frequencies._reverse_period_code_map[base]
956-
957-
return "Period('%s', '%s')" % (formatted, freqstr)
985+
return "Period('%s', '%s')" % (formatted, self.freqstr)
958986

959987
def __unicode__(self):
960988
"""
@@ -1123,9 +1151,6 @@ def _ordinal_from_fields(year, month, quarter, day, hour, minute,
11231151
second, freq):
11241152
from pandas.tseries.frequencies import get_freq_code as _gfc
11251153
base, mult = _gfc(freq)
1126-
if mult != 1:
1127-
raise ValueError('Only mult == 1 supported')
1128-
11291154
if quarter is not None:
11301155
year, month = _quarter_to_myear(year, quarter, freq)
11311156

0 commit comments

Comments
 (0)