Skip to content

BUG/ENH - base argument no longer ignored in period resample #23941

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 41 commits into from
Dec 14, 2018
Merged
Show file tree
Hide file tree
Changes from 36 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
4fbb93c
ENH - account for base argument in period resample
sds9995 Nov 27, 2018
ac3f301
BUG - closer on perfectly emulating previous behavior
sds9995 Nov 27, 2018
2a72bac
BUG - all original resample tests now pass
sds9995 Nov 28, 2018
cb03d4e
TST - add preliminary tests for pandas period resample with base
sds9995 Nov 28, 2018
4833c97
CLN - pep8 adherence
sds9995 Nov 28, 2018
5a69414
TST - add a couple more tests
sds9995 Nov 28, 2018
5987a2e
TST - parameterize tests
sds9995 Nov 29, 2018
8f3c976
DOC- add whatsnew entry
sds9995 Dec 2, 2018
58a59a6
DOC - add comments and modify whatsnew
sds9995 Dec 4, 2018
6b6d1a9
CLN - minor refactor of label creation and whatsnew
sds9995 Dec 4, 2018
ba10dcf
CLN - pep8 adherence
sds9995 Dec 4, 2018
fb1465e
Merge branch 'master' into bug/period_resample_base
sds9995 Dec 4, 2018
b51841a
BUG - fix cases where period doesnt start at the 0 base of the freq
sds9995 Dec 4, 2018
5afea5c
CLN - remove unneccesary code
sds9995 Dec 4, 2018
5ea4d2c
CLN - pep8 adherence
sds9995 Dec 4, 2018
9df9179
Merge branch 'master' into bug/period_resample_base
sds9995 Dec 5, 2018
ed975ee
Merge branch 'master' into bug/period_resample_base
sds9995 Dec 6, 2018
99e32a7
BUG - reset resample file
sds9995 Dec 6, 2018
126ae7a
BUG - add original changes back in after master reset
sds9995 Dec 6, 2018
bf076de
BUG - add case back in where not start is not onOffset and add tests …
sds9995 Dec 7, 2018
47d7f7b
ENH - allow for edge compensation in _get_range_edges
sds9995 Dec 7, 2018
eb05501
BUG/CLN - offsets.Day(n>2) not properly anchoring dates, and make cod…
sds9995 Dec 7, 2018
f2b0661
TST - update test to reflect more up-to-date assumption of what expec…
sds9995 Dec 8, 2018
0b9a8fd
Merge branch 'master' into bug/period_resample_base
sds9995 Dec 8, 2018
8020acb
CLN - rever changes for other resample fix, to be fixed in another PR
sds9995 Dec 8, 2018
334eb0b
CLN - split up range edge functions for timestamp and period
sds9995 Dec 8, 2018
c14dbce
CLN - pep8 adherence
sds9995 Dec 8, 2018
9bb348a
Merge branch 'master' into bug/period_resample_base
sds9995 Dec 8, 2018
8e07a95
TST/DOC - add tests for get range edges and update whatsnew
sds9995 Dec 10, 2018
1264f36
Merge branch 'master' into bug/period_resample_base
sds9995 Dec 10, 2018
bca4196
Merge branch 'master' into bug/period_resample_base
sds9995 Dec 11, 2018
0a9c7dc
BUG - account for new period diff behavior
sds9995 Dec 11, 2018
cfefa36
TST - add test for bad input to get range edges
sds9995 Dec 12, 2018
93eaab7
TST - add one more test case to bad get_range_edges
sds9995 Dec 12, 2018
e5286f8
DOC - add docstrings
sds9995 Dec 13, 2018
bef9118
TST - remove unneccesary test
sds9995 Dec 13, 2018
9b7d261
DOC - update docstrings
sds9995 Dec 13, 2018
5034455
Merge branch 'master' into bug/period_resample_base
sds9995 Dec 13, 2018
a49d129
CLN - rename offset to bin_shift, so as not to confuse with pd.offsets
sds9995 Dec 13, 2018
8f1e290
DOC - add params and returns sections
sds9995 Dec 13, 2018
c234cac
Merge branch 'master' into bug/period_resample_base
sds9995 Dec 13, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -362,6 +362,7 @@ Other Enhancements
- :meth:`round`, :meth:`ceil`, and meth:`floor` for :class:`DatetimeIndex` and :class:`Timestamp` now support a ``nonexistent`` argument for handling datetimes that are rounded to nonexistent times. See :ref:`timeseries.timezone_nonexistent` (:issue:`22647`)
- :class:`Resampler` now is iterable like :class:`GroupBy` (:issue:`15314`).
- :meth:`Series.resample` and :meth:`DataFrame.resample` have gained the :meth:`Resampler.quantile` (:issue:`15023`).
- :meth:`DataFrame.resample` and :meth:`Series.resample` with a :class:`PeriodIndex` will now respect the ``base`` argument in the same fashion as with a :class:`DatetimeIndex`. (:issue:`23882`)
- :meth:`pandas.core.dtypes.is_list_like` has gained a keyword ``allow_sets`` which is ``True`` by default; if ``False``,
all instances of ``set`` will not be considered "list-like" anymore (:issue:`23061`)
- :meth:`Index.to_frame` now supports overriding column name(s) (:issue:`22580`).
Expand Down
60 changes: 55 additions & 5 deletions pandas/core/resample.py
Original file line number Diff line number Diff line change
Expand Up @@ -1389,9 +1389,10 @@ def _get_time_bins(self, ax):
data=[], freq=self.freq, name=ax.name)
return binner, [], labels

first, last = _get_range_edges(ax.min(), ax.max(), self.freq,
closed=self.closed,
base=self.base)
first, last = _get_timestamp_range_edges(ax.min(), ax.max(),
self.freq,
closed=self.closed,
base=self.base)
tz = ax.tz
# GH #12037
# use first/last directly instead of call replace() on them
Expand Down Expand Up @@ -1540,20 +1541,39 @@ def _get_period_bins(self, ax):
data=[], freq=self.freq, name=ax.name)
return binner, [], labels

freq_mult = self.freq.n

start = ax.min().asfreq(self.freq, how=self.convention)
end = ax.max().asfreq(self.freq, how='end')
offset = 0

# GH 23882
if self.base:
# get base adjusted bin edge labels
p_start, end = _get_period_range_edges(start,
end,
self.freq,
closed=self.closed,
base=self.base)

# Get offset for bin edge (not label edge) adjustment
start_offset = (pd.Period(start, self.freq)
- pd.Period(p_start, self.freq))
offset = start_offset.n % freq_mult
start = p_start

labels = binner = PeriodIndex(start=start, end=end,
freq=self.freq, name=ax.name)

i8 = memb.asi8
freq_mult = self.freq.n

# when upsampling to subperiods, we need to generate enough bins
expected_bins_count = len(binner) * freq_mult
i8_extend = expected_bins_count - (i8[-1] - i8[0])
rng = np.arange(i8[0], i8[-1] + i8_extend, freq_mult)
rng += freq_mult
# adjust bin edge indexes to account for base
rng -= offset
bins = memb.searchsorted(rng, side='left')

if nat_count > 0:
Expand Down Expand Up @@ -1582,7 +1602,15 @@ def _take_new_index(obj, indexer, new_index, axis=0):
raise ValueError("'obj' should be either a Series or a DataFrame")


def _get_range_edges(first, last, offset, closed='left', base=0):
def _get_timestamp_range_edges(first, last, offset, closed='left', base=0):
"""
Adjust the provided Timestamp range edge values to the appropriate edge
values for the given offset parameters.
"""
if not all(isinstance(obj, pd.Timestamp) for obj in [first, last]):
raise TypeError("'first' and 'last' must be instances of type "
"Timestamp")

if isinstance(offset, Tick):
is_day = isinstance(offset, Day)
day_nanos = delta_to_nanoseconds(timedelta(1))
Expand All @@ -1606,6 +1634,28 @@ def _get_range_edges(first, last, offset, closed='left', base=0):
return first, last


def _get_period_range_edges(first, last, offset, closed='left', base=0):
"""
Adjust the provided Period range edge values to the appropriate edge
values for the given offset parameters.
"""
if not all(isinstance(obj, pd.Period) for obj in [first, last]):
raise TypeError("'first' and 'last' must be instances of type Period")

# GH 23882
first = first.to_timestamp()
last = last.to_timestamp()
adjust_first = not offset.onOffset(first)
adjust_last = offset.onOffset(last)

first, last = _get_timestamp_range_edges(first, last, offset,
closed=closed, base=base)

first = (first + adjust_first * offset).to_period(offset)
last = (last - adjust_last * offset).to_period(offset)
return first, last


def _adjust_dates_anchored(first, last, offset, closed='right', base=0):
# First and last offsets should be calculated from the start day to fix an
# error cause by resampling across multiple days when a one day period is
Expand Down
27 changes: 26 additions & 1 deletion pandas/tests/resample/test_datetime_index.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@
from pandas.core.indexes.datetimes import date_range
from pandas.core.indexes.period import Period, period_range
from pandas.core.indexes.timedeltas import timedelta_range
from pandas.core.resample import DatetimeIndex, TimeGrouper
from pandas.core.resample import (
DatetimeIndex, TimeGrouper, _get_timestamp_range_edges)
from pandas.tests.resample.test_base import (
Base, business_day_offset, downsample_methods, simple_date_range_series,
simple_period_range_series)
Expand Down Expand Up @@ -1463,3 +1464,27 @@ def f(data, add_arg):
result = df.groupby("A").resample("D").agg(f, multiplier)
expected = df.groupby("A").resample('D').mean().multiply(multiplier)
assert_frame_equal(result, expected)

@pytest.mark.parametrize('first,last,offset,exp_first,exp_last', [
('19910905', '19920406', 'D', '19910905', '19920407'),
('19910905 00:00', '19920406 06:00', 'D', '19910905', '19920407'),
('19910905 06:00', '19920406 06:00', 'H', '19910905 06:00',
'19920406 07:00'),
('19910906', '19920406', 'M', '19910831', '19920430'),
('19910831', '19920430', 'M', '19910831', '19920531'),
('1991-08', '1992-04', 'M', '19910831', '19920531'),
])
def test_get_timestamp_range_edges(self, first, last, offset,
exp_first, exp_last):
first = pd.Period(first)
first = first.to_timestamp(first.freq)
last = pd.Period(last)
last = last.to_timestamp(last.freq)

exp_first = pd.Timestamp(exp_first, freq=offset)
exp_last = pd.Timestamp(exp_last, freq=offset)

offset = pd.tseries.frequencies.to_offset(offset)
result = _get_timestamp_range_edges(first, last, offset)
expected = (exp_first, exp_last)
assert result == expected
53 changes: 53 additions & 0 deletions pandas/tests/resample/test_period_index.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
from pandas import DataFrame, Series, Timestamp
from pandas.core.indexes.datetimes import date_range
from pandas.core.indexes.period import Period, PeriodIndex, period_range
from pandas.core.resample import _get_period_range_edges
from pandas.tests.resample.test_base import (
Base, resample_methods, simple_period_range_series)
import pandas.util.testing as tm
Expand Down Expand Up @@ -739,3 +740,55 @@ def test_resample_with_only_nat(self):
expected = DataFrame([], index=expected_index)
result = frame.resample('1s').mean()
assert_frame_equal(result, expected)

@pytest.mark.parametrize('start,end,start_freq,end_freq,base', [
('19910905', '19910909 03:00', 'H', '24H', 10),
('19910905', '19910909 12:00', 'H', '24H', 10),
('19910905', '19910909 23:00', 'H', '24H', 10),
('19910905 10:00', '19910909', 'H', '24H', 10),
('19910905 10:00', '19910909 10:00', 'H', '24H', 10),
('19910905', '19910909 10:00', 'H', '24H', 10),
('19910905 12:00', '19910909', 'H', '24H', 10),
('19910905 12:00', '19910909 03:00', 'H', '24H', 10),
('19910905 12:00', '19910909 12:00', 'H', '24H', 10),
('19910905 12:00', '19910909 12:00', 'H', '24H', 34),
('19910905 12:00', '19910909 12:00', 'H', '17H', 10),
('19910905 12:00', '19910909 12:00', 'H', '17H', 3),
('19910905 12:00', '19910909 1:00', 'H', 'M', 3),
('19910905', '19910913 06:00', '2H', '24H', 10),
('19910905', '19910905 01:39', 'Min', '5Min', 3),
('19910905', '19910905 03:18', '2Min', '5Min', 3),
])
def test_resample_with_non_zero_base(self, start, end, start_freq,
end_freq, base):
# GH 23882
s = pd.Series(0, index=pd.period_range(start, end, freq=start_freq))
s = s + np.arange(len(s))
result = s.resample(end_freq, base=base).mean()
result = result.to_timestamp(end_freq)
# to_timestamp casts 24H -> D
result = result.asfreq(end_freq) if end_freq == '24H' else result
expected = s.to_timestamp().resample(end_freq, base=base).mean()
assert_series_equal(result, expected)

@pytest.mark.parametrize('first,last,offset,exp_first,exp_last', [
('19910905', '19920406', 'D', '19910905', '19920406'),
('19910905 00:00', '19920406 06:00', 'D', '19910905', '19920406'),
('19910905 06:00', '19920406 06:00', 'H', '19910905 06:00',
'19920406 06:00'),
('19910906', '19920406', 'M', '1991-09', '1992-04'),
('19910831', '19920430', 'M', '1991-08', '1992-04'),
('1991-08', '1992-04', 'M', '1991-08', '1992-04'),
])
def test_get_period_range_edges(self, first, last, offset,
exp_first, exp_last):
first = pd.Period(first)
last = pd.Period(last)

exp_first = pd.Period(exp_first, freq=offset)
exp_last = pd.Period(exp_last, freq=offset)

offset = pd.tseries.frequencies.to_offset(offset)
result = _get_period_range_edges(first, last, offset)
expected = (exp_first, exp_last)
assert result == expected