Skip to content

ENH: Add end and end_day options for origin from resample #37805

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 44 commits into from
Closed
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
39f1e8f
ENH: Add 'end' option in resample's origin
GYHHAHA Nov 13, 2020
cd5aa64
Update resample.py
GYHHAHA Nov 13, 2020
0184b1d
Update resample.py
GYHHAHA Nov 13, 2020
ff35b6f
Update test_resample_api.py
GYHHAHA Nov 13, 2020
8c4549e
Update resample.py
GYHHAHA Nov 13, 2020
b835d1a
Update test_resample_api.py
GYHHAHA Nov 13, 2020
bf15c67
Update test_resample_api.py
GYHHAHA Nov 13, 2020
e4b01d8
Update test_datetime_index.py
GYHHAHA Nov 13, 2020
d096ccd
add backward para and end_day option
GYHHAHA Nov 27, 2020
222ef8d
add doc-string
GYHHAHA Nov 27, 2020
90c9c5f
add test cases
GYHHAHA Nov 27, 2020
eae898c
fix format
GYHHAHA Nov 27, 2020
2ee1000
Update test_resample_api.py
GYHHAHA Nov 27, 2020
3442e00
Update test_resample_api.py
GYHHAHA Nov 27, 2020
a33acac
Update test_resample_api.py
GYHHAHA Nov 27, 2020
7c54839
Update test_resample_api.py
GYHHAHA Nov 27, 2020
a4e0a39
flake8 fix
GYHHAHA Nov 27, 2020
0e2e390
break lines
GYHHAHA Nov 27, 2020
9f4844a
Update resample.py
GYHHAHA Nov 27, 2020
5b7f396
fix docstring
GYHHAHA Nov 27, 2020
115c92a
split tests
GYHHAHA Nov 28, 2020
7d8d67a
Update generic.py
GYHHAHA Nov 28, 2020
77fc4a3
doc added & tests fix
GYHHAHA Nov 28, 2020
0cff41e
Merge branch 'master' into master
GYHHAHA Nov 28, 2020
b492293
fix doc
GYHHAHA Nov 28, 2020
561096c
Merge remote-tracking branch 'upstream/master'
GYHHAHA Dec 11, 2020
76a015a
Revert "Merge remote-tracking branch 'upstream/master'"
GYHHAHA Dec 11, 2020
a0262ab
Revert "fix doc"
GYHHAHA Dec 11, 2020
b990c5f
Revert "Merge branch 'master' into master"
GYHHAHA Dec 11, 2020
8e8c1e6
Revert "doc added & tests fix"
GYHHAHA Dec 11, 2020
cc9f2e0
Revert "Update generic.py"
GYHHAHA Dec 11, 2020
629773a
Revert "split tests"
GYHHAHA Dec 11, 2020
c79155b
Revert "fix docstring"
GYHHAHA Dec 11, 2020
f46c924
Revert "Update resample.py"
GYHHAHA Dec 11, 2020
af99a33
Revert "break lines"
GYHHAHA Dec 11, 2020
d7db83b
Revert "flake8 fix"
GYHHAHA Dec 11, 2020
69183f6
Revert "Update test_resample_api.py"
GYHHAHA Dec 11, 2020
5b9afee
Revert "Update test_resample_api.py"
GYHHAHA Dec 11, 2020
216bff3
Revert "Update test_resample_api.py"
GYHHAHA Dec 11, 2020
5409a75
Revert "Update test_resample_api.py"
GYHHAHA Dec 11, 2020
90ddc36
Revert "fix format"
GYHHAHA Dec 11, 2020
7b3cffb
Revert "add test cases"
GYHHAHA Dec 11, 2020
2d51a8a
Revert "add doc-string"
GYHHAHA Dec 11, 2020
c24d8f9
Revert "add backward para and end_day option"
GYHHAHA Dec 11, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 44 additions & 3 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -7952,6 +7952,7 @@ def resample(
on=None,
level=None,
origin: Union[str, TimestampConvertibleTypes] = "start_day",
backward: Optional[bool_t] = None,
offset: Optional[TimedeltaConvertibleTypes] = None,
) -> Resampler:
"""
Expand All @@ -7972,8 +7973,9 @@ def resample(
`DatetimeIndex`, `TimedeltaIndex` or `PeriodIndex`.
closed : {'right', 'left'}, default None
Which side of bin interval is closed. The default is 'left'
for all frequency offsets except for 'M', 'A', 'Q', 'BM',
'BA', 'BQ', and 'W' which all have a default of 'right'.
for all frequency offsets with forward resampling except for 'M',
'A', 'Q', 'BM', 'BA', 'BQ', and 'W' which all have a default of
'right'. When `Backward` set to be True, default is 'right'.
label : {'right', 'left'}, default None
Which bin edge label to label bucket with. The default is 'left'
for all frequency offsets except for 'M', 'A', 'Q', 'BM',
Expand Down Expand Up @@ -8006,7 +8008,8 @@ def resample(
level : str or int, optional
For a MultiIndex, level (name or number) to use for
resampling. `level` must be datetime-like.
origin : {'epoch', 'start', 'start_day'}, Timestamp or str, default 'start_day'
origin : {'epoch', 'start', 'start_day', 'end', 'end_day'}, Timestamp \
or str, default 'start_day'
The timestamp on which to adjust the grouping. The timezone of origin
must match the timezone of the index.
If a timestamp is not used, these values are also supported:
Expand All @@ -8017,6 +8020,21 @@ def resample(

.. versionadded:: 1.1.0

- 'end': `origin` is the last value of the timeseries
- 'end_day': `origin` is the ceiling midnight of the last day

.. versionadded:: 1.2.0

backward : bool, default is None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if anything this should be a separate issue / PR discussion. I actually am -1 on adding this as yet another keyword.

pls revert everything in this PR and focus only on adding the end/end_day), though then do these become useless?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for adding without discussion. I will make the reversion in this pr. Maybe the backward resampling on an arbitrary Timestamp needs a new strategy instead of new keyword. (origin=('20201129', 'B') seems to be a possible solution, but a little strange to take this tuple input?)

Resample on the given `origin` from a backward direction. True when
`origin` is 'end' or 'end_day'. False when `origin` is 'start' or
'start_day'. Optional when using datetime `origin` , and default
False. The resample result for a specified datetime stands for the
group from time substract the given `freq` to time with a right
`closed` setting by default.

.. versionadded:: 1.2.0

offset : Timedelta or str, default is None
An offset timedelta added to the origin.

Expand Down Expand Up @@ -8296,6 +8314,28 @@ def resample(
2000-10-02 00:21:00 24
Freq: 17T, dtype: int64

If you want to take the last timestamp as `origin` with a backward resample:

>>> ts.index.max()
Timestamp('2000-10-02 00:26:00', freq='7T')
>>> ts.groupby(pd.Grouper(freq='17min', origin='end')).sum()
2000-10-01 23:35:00 0
2000-10-01 23:52:00 18
2000-10-02 00:09:00 27
2000-10-02 00:26:00 63
Freq: 17T, dtype: int64

You can also specify the backward origin:

>>> ts.groupby(pd.Grouper(freq='17min',
origin='2000-10-02 00:30:00',
backward=True)).sum()
2000-10-01 23:39:00 3
2000-10-01 23:56:00 15
2000-10-02 00:13:00 45
2000-10-02 00:30:00 45
Freq: 17T, dtype: int32

To replace the use of the deprecated `base` argument, you can now use `offset`,
in this example it is equivalent to have `base=2`:

Expand Down Expand Up @@ -8337,6 +8377,7 @@ def resample(
key=on,
level=level,
origin=origin,
backward=backward,
offset=offset,
)

Expand Down
31 changes: 28 additions & 3 deletions pandas/core/groupby/grouper.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,9 +83,10 @@ class Grouper:
However, loffset is also deprecated for ``.resample(...)``
See: :class:`DataFrame.resample`

origin : {'epoch', 'start', 'start_day'}, Timestamp or str, default 'start_day'
The timestamp on which to adjust the grouping. The timezone of origin must
match the timezone of the index.
origin : {'epoch', 'start', 'start_day', 'end', 'end_day'}, Timestamp \
or str, default 'start_day'
The timestamp on which to adjust the grouping. The timezone of origin
must match the timezone of the index.
If a timestamp is not used, these values are also supported:

- 'epoch': `origin` is 1970-01-01
Expand All @@ -94,6 +95,21 @@ class Grouper:

.. versionadded:: 1.1.0

- 'end': `origin` is the last value of the timeseries
- 'end_day': `origin` is the ceiling midnight of the last day

.. versionadded:: 1.2.0

backward : bool, default is None
Resample on the given `origin` from a backward direction. True when
`origin` is 'end' or 'end_day'. False when `origin` is 'start' or
'start_day'. Optional when using datetime `origin` , and default
False. The resample result for a specified datetime stands for the
group from time substract the given `freq` to time with a right
`closed` setting by default.

.. versionadded:: 1.2.0

offset : Timedelta or str, default is None
An offset timedelta added to the origin.

Expand Down Expand Up @@ -200,6 +216,15 @@ class Grouper:
2000-10-02 00:15:00 45
Freq: 17T, dtype: int64

If you want to take the last timestamp as `origin` with a backward resample:

>>> ts.groupby(pd.Grouper(freq='17min', origin='end')).sum()
2000-10-01 23:35:00 0
2000-10-01 23:52:00 18
2000-10-02 00:09:00 27
2000-10-02 00:26:00 63
Freq: 17T, dtype: int64

If you want to adjust the start of the bins with an `offset` Timedelta, the two
following lines are equivalent:

Expand Down
76 changes: 64 additions & 12 deletions pandas/core/resample.py
Original file line number Diff line number Diff line change
Expand Up @@ -1366,6 +1366,7 @@ def __init__(
convention: Optional[str] = None,
base: Optional[int] = None,
origin: Union[str, TimestampConvertibleTypes] = "start_day",
backward: Optional[bool] = None,
offset: Optional[TimedeltaConvertibleTypes] = None,
**kwargs,
):
Expand All @@ -1389,9 +1390,15 @@ def __init__(
label = "right"
else:
if closed is None:
closed = "left"
if origin in ["end", "end_day"] or backward:
closed = "right"
else:
closed = "left"
if label is None:
label = "left"
if origin in ["end", "end_day"] or backward:
label = "right"
else:
label = "left"

self.closed = closed
self.label = label
Expand All @@ -1404,17 +1411,37 @@ def __init__(
self.fill_method = fill_method
self.limit = limit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you edit the doc-string and add a versionupdate 1.2 tag (mention that end is added).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

if origin in ("epoch", "start", "start_day"):
if origin in ("epoch", "start", "start_day", "end", "end_day"):
self.origin = origin
else:
try:
self.origin = Timestamp(origin)
except Exception as e:
raise ValueError(
"'origin' should be equal to 'epoch', 'start', 'start_day' or "
f"should be a Timestamp convertible type. Got '{origin}' instead."
"'origin' should be equal to 'epoch', 'start', 'start_day', "
"'end', 'end_day' or should be a Timestamp convertible "
f"type. Got '{origin}' instead."
) from e

if backward is None:
if self.origin in ("end", "end_day"):
self.backward = True
else:
self.backward = False
elif backward:
if origin in ("start", "start_day"):
raise ValueError(
"`start` or `start_day` origin isn't allowed when "
"`backward` is True"
)
self.backward = backward
else:
if origin in ("end", "end_day"):
raise ValueError(
"`end` or `end_day` origin isn't allowed when `backward` is False"
)
self.backward = backward

try:
self.offset = Timedelta(offset) if offset is not None else None
except Exception as e:
Expand Down Expand Up @@ -1502,6 +1529,7 @@ def _get_time_bins(self, ax):
self.freq,
closed=self.closed,
origin=self.origin,
backward=self.backward,
offset=self.offset,
)
# GH #12037
Expand Down Expand Up @@ -1655,6 +1683,7 @@ def _get_period_bins(self, ax: PeriodIndex):
self.freq,
closed=self.closed,
origin=self.origin,
backward=self.backward,
offset=self.offset,
)

Expand Down Expand Up @@ -1708,7 +1737,7 @@ def _take_new_index(obj, indexer, new_index, axis=0):


def _get_timestamp_range_edges(
first, last, freq, closed="left", origin="start_day", offset=None
first, last, freq, closed="left", origin="start_day", backward=False, offset=None
):
"""
Adjust the `first` Timestamp to the preceding Timestamp that resides on
Expand Down Expand Up @@ -1761,7 +1790,13 @@ def _get_timestamp_range_edges(
origin = origin.tz_localize(None)

first, last = _adjust_dates_anchored(
first, last, freq, closed=closed, origin=origin, offset=offset
first,
last,
freq,
closed=closed,
origin=origin,
backward=backward,
offset=offset,
)
if isinstance(freq, Day):
first = first.tz_localize(index_tz)
Expand All @@ -1781,7 +1816,7 @@ def _get_timestamp_range_edges(


def _get_period_range_edges(
first, last, freq, closed="left", origin="start_day", offset=None
first, last, freq, closed="left", origin="start_day", backward=False, offset=None
):
"""
Adjust the provided `first` and `last` Periods to the respective Period of
Expand Down Expand Up @@ -1823,7 +1858,13 @@ def _get_period_range_edges(
adjust_last = freq.is_on_offset(last)

first, last = _get_timestamp_range_edges(
first, last, freq, closed=closed, origin=origin, offset=offset
first,
last,
freq,
closed=closed,
origin=origin,
backward=backward,
offset=offset,
)

first = (first + int(adjust_first) * freq).to_period(freq)
Expand All @@ -1832,7 +1873,7 @@ def _get_period_range_edges(


def _adjust_dates_anchored(
first, last, freq, closed="right", origin="start_day", offset=None
first, last, freq, closed="right", origin="start_day", backward=False, offset=None
):
# First and last offsets should be calculated from the start day to fix an
# error cause by resampling across multiple days when a one day period is
Expand All @@ -1844,8 +1885,19 @@ def _adjust_dates_anchored(
origin_nanos = first.normalize().value
elif origin == "start":
origin_nanos = first.value
elif isinstance(origin, Timestamp):
origin_nanos = origin.value
elif isinstance(origin, Timestamp) or origin in ("end", "end_day"):
if backward:
if origin == "end":
origin = last
elif origin == "end_day":
origin = last.ceil("D")
sub_freq_times = (origin.value - first.value) // freq.nanos
if closed == "left":
sub_freq_times += 1
first = origin - sub_freq_times * freq
origin_nanos = first.value
else:
origin_nanos = origin.value
origin_nanos += offset.value if offset else 0

# GH 10117 & GH 19375. If first and last contain timezone information,
Expand Down
4 changes: 2 additions & 2 deletions pandas/tests/resample/test_datetime_index.py
Original file line number Diff line number Diff line change
Expand Up @@ -770,8 +770,8 @@ def test_resample_bad_origin(origin):
rng = date_range("2000-01-01 00:00:00", "2000-01-01 02:00", freq="s")
ts = Series(np.random.randn(len(rng)), index=rng)
msg = (
"'origin' should be equal to 'epoch', 'start', 'start_day' or "
f"should be a Timestamp convertible type. Got '{origin}' instead."
"'origin' should be equal to 'epoch', 'start', 'start_day', 'end', 'end_day' "
f"or should be a Timestamp convertible type. Got '{origin}' instead."
)
with pytest.raises(ValueError, match=msg):
ts.resample("5min", origin=origin)
Expand Down
Loading