Skip to content

Commit 3cbeac4

Browse files
committed
ENH: add 'origin and 'offset' arguments to 'resample' and 'pd.Grouper'
1 parent a9d2450 commit 3cbeac4

13 files changed

+394
-95
lines changed

doc/source/user_guide/timeseries.rst

+3-6
Original file line numberDiff line numberDiff line change
@@ -1563,19 +1563,16 @@ end of the interval is closed:
15631563
15641564
ts.resample('5Min', closed='left').mean()
15651565
1566-
Parameters like ``label`` and ``loffset`` are used to manipulate the resulting
1567-
labels. ``label`` specifies whether the result is labeled with the beginning or
1568-
the end of the interval. ``loffset`` performs a time adjustment on the output
1569-
labels.
1566+
Parameters like ``label`` are used to manipulate the resulting labels.
1567+
``label`` specifies whether the result is labeled with the beginning or
1568+
the end of the interval.
15701569

15711570
.. ipython:: python
15721571
15731572
ts.resample('5Min').mean() # by default label='left'
15741573
15751574
ts.resample('5Min', label='left').mean()
15761575
1577-
ts.resample('5Min', label='left', loffset='1s').mean()
1578-
15791576
.. warning::
15801577

15811578
The default values for ``label`` and ``closed`` is '**left**' for all

doc/source/whatsnew/v1.0.0.rst

100755100644
File mode changed.

doc/source/whatsnew/v1.1.0.rst

+24
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,30 @@ For example:
3636
ser["2014"]
3737
ser.loc["May 2015"]
3838
39+
.. _whatsnew_110.grouper_origin:
40+
41+
Grouper now supports the argument origin
42+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
43+
44+
:class:`Grouper` and :class:`DataFrame.resample` now supports the argument `origin`. A the timestamp on which to adjust the grouping. (:issue:`31809`)
45+
46+
The bins of the grouping are adjusted based on the beginning of the day of the time series starting point. This works well with frequencies that are multiples of a day (like `30D`) or that divides a day (like `90s` or `1min`). But it can create inconsistencies with some frequencies that do not meet this criteria. To change this behavior you can now specify a fixed timestamp with `origin`.
47+
48+
For example:
49+
50+
.. ipython:: python
51+
52+
start, end = "1/1/2000 00:00:00", "1/31/2000 00:00"
53+
rng = pd.date_range(start, end, freq="1231min")
54+
ts = pd.Series(np.arange(len(rng)), index=rng)
55+
ts.groupby(pd.Grouper(freq="1399min")).agg("count")
56+
ts.groupby(pd.Grouper(
57+
freq="1399min",
58+
origin=pd.Timestamp("1970-01-01"))
59+
).agg("count")
60+
61+
..
62+
3963
.. _whatsnew_110.enhancements.other:
4064

4165
Other enhancements

pandas/core/generic.py

+31-2
Original file line numberDiff line numberDiff line change
@@ -7650,9 +7650,11 @@ def resample(
76507650
convention: str = "start",
76517651
kind: Optional[str] = None,
76527652
loffset=None,
7653-
base: int = 0,
7653+
base: Optional[int] = None,
76547654
on=None,
76557655
level=None,
7656+
origin=None,
7657+
offset=None,
76567658
) -> "Resampler":
76577659
"""
76587660
Resample time-series data.
@@ -7687,17 +7689,35 @@ def resample(
76877689
By default the input representation is retained.
76887690
loffset : timedelta, default None
76897691
Adjust the resampled time labels.
7692+
7693+
.. deprecated:: 1.1.0
7694+
You should add the loffset to the `df.index` after the resample.
7695+
like this:
7696+
``df.index = df.index.to_timestamp() + to_offset(loffset)``
7697+
(a more complete example is present below)
7698+
76907699
base : int, default 0
76917700
For frequencies that evenly subdivide 1 day, the "origin" of the
76927701
aggregated intervals. For example, for '5min' frequency, base could
76937702
range from 0 through 4. Defaults to 0.
7703+
7704+
.. deprecated:: 1.1.0
7705+
The new arguments that you should use are 'offset' or 'origin'.
7706+
``df.resample(freq="3s", base=2)``
7707+
becomes
7708+
``df.resample(freq="3s", offset="2s")``
7709+
76947710
on : str, optional
76957711
For a DataFrame, column to use instead of index for resampling.
76967712
Column must be datetime-like.
7697-
76987713
level : str or int, optional
76997714
For a MultiIndex, level (name or number) to use for
77007715
resampling. `level` must be datetime-like.
7716+
origin : pd.Timestamp, default None
7717+
The timestamp on which to adjust the grouping. If None is passed,
7718+
the first day of the time series at midnight is used.
7719+
offset : pd.Timedelta, default is None
7720+
An offset timedelta added to the origin.
77017721
77027722
Returns
77037723
-------
@@ -7916,6 +7936,13 @@ def resample(
79167936
2000-01-02 22 140
79177937
2000-01-03 32 150
79187938
2000-01-04 36 90
7939+
7940+
To replace the use of the deprecated loffset argument:
7941+
>>> from pandas.tseries.frequencies import to_offset
7942+
>>> rng = pd.date_range("2000-01-01", "2000-01-01", freq="1s")
7943+
>>> ts = pd.Series(np.arange(len(rng)), index=rng)
7944+
>>> s = s.resample("3s").mean()
7945+
>>> s.index = s.index.to_timestamp() + to_offset("8H")
79197946
"""
79207947

79217948
from pandas.core.resample import get_resampler
@@ -7933,6 +7960,8 @@ def resample(
79337960
base=base,
79347961
key=on,
79357962
level=level,
7963+
origin=origin,
7964+
offset=offset,
79367965
)
79377966

79387967
def first(self: FrameOrSeries, offset) -> FrameOrSeries:

pandas/core/groupby/grouper.py

+23
Original file line numberDiff line numberDiff line change
@@ -68,9 +68,32 @@ class Grouper:
6868
If grouper is PeriodIndex and `freq` parameter is passed.
6969
base : int, default 0
7070
Only when `freq` parameter is passed.
71+
For frequencies that evenly subdivide 1 day, the "origin" of the
72+
aggregated intervals. For example, for '5min' frequency, base could
73+
range from 0 through 4. Defaults to 0.
74+
75+
.. deprecated:: 1.1.0
76+
The new arguments that you should use are 'offset' or 'origin'.
77+
``df.resample(freq="3s", base=2)``
78+
becomes
79+
``df.resample(freq="3s", offset="2s")``
80+
7181
loffset : str, DateOffset, timedelta object
7282
Only when `freq` parameter is passed.
7383
84+
.. deprecated:: 1.1.0
85+
loffset is only working for ``.resample(...)`` and not for
86+
Grouper (:issue:`28302`).
87+
However, loffset is also deprecated for ``.resample(...)``
88+
See: :class:`DataFrame.resample`
89+
90+
origin : Timestamp, default None
91+
Only when `freq` parameter is passed.
92+
The timestamp on which to adjust the grouping. If None is passed, the
93+
first day of the time series at midnight is used.
94+
offset : pd.Timedelta, default is None
95+
An offset timedelta added to the origin.
96+
7497
Returns
7598
-------
7699
A specification for a groupby instruction

0 commit comments

Comments
 (0)