Skip to content

Commit 2d2beba

Browse files
committed
ENH: add 'origin and 'offset' arguments to 'resample' and 'pd.Grouper'
a more work
1 parent a9d2450 commit 2d2beba

12 files changed

+331
-87
lines changed

doc/source/whatsnew/v1.0.0.rst

100755100644
File mode changed.

doc/source/whatsnew/v1.1.0.rst

+24
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,30 @@ For example:
3636
ser["2014"]
3737
ser.loc["May 2015"]
3838
39+
.. _whatsnew_110.grouper_origin:
40+
41+
Grouper now supports the argument origin
42+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
43+
44+
:class:`Grouper` and :class:`DataFrame.resample` now supports the argument `origin`. A the timestamp on which to adjust the grouping. (:issue:`31809`)
45+
46+
The bins of the grouping are adjusted based on the beginning of the day of the time series starting point. This works well with frequencies that are multiples of a day (like `30D`) or that divides a day (like `90s` or `1min`). But it can create inconsistencies with some frequencies that do not meet this criteria. To change this behavior you can now specify a fixed timestamp with `origin`.
47+
48+
For example:
49+
50+
.. ipython:: python
51+
52+
start, end = "1/1/2000 00:00:00", "1/31/2000 00:00"
53+
rng = pd.date_range(start, end, freq="1231min")
54+
ts = pd.Series(np.arange(len(rng)), index=rng)
55+
ts.groupby(pd.Grouper(freq="1399min")).agg("count")
56+
ts.groupby(pd.Grouper(
57+
freq="1399min",
58+
origin=pd.Timestamp("1970-01-01"))
59+
).agg("count")
60+
61+
..
62+
3963
.. _whatsnew_110.enhancements.other:
4064

4165
Other enhancements

pandas/core/generic.py

+10-2
Original file line numberDiff line numberDiff line change
@@ -7650,9 +7650,11 @@ def resample(
76507650
convention: str = "start",
76517651
kind: Optional[str] = None,
76527652
loffset=None,
7653-
base: int = 0,
7653+
base: int = None,
76547654
on=None,
76557655
level=None,
7656+
origin=None,
7657+
offset=None,
76567658
) -> "Resampler":
76577659
"""
76587660
Resample time-series data.
@@ -7694,10 +7696,14 @@ def resample(
76947696
on : str, optional
76957697
For a DataFrame, column to use instead of index for resampling.
76967698
Column must be datetime-like.
7697-
76987699
level : str or int, optional
76997700
For a MultiIndex, level (name or number) to use for
77007701
resampling. `level` must be datetime-like.
7702+
origin : pd.Timestamp, default None
7703+
The timestamp on which to adjust the grouping. If None is passed,
7704+
the first day of the time series at midnight is used.
7705+
offset : pd.Timedelta, default is None
7706+
An offset timedelta added to the origin.
77017707
77027708
Returns
77037709
-------
@@ -7933,6 +7939,8 @@ def resample(
79337939
base=base,
79347940
key=on,
79357941
level=level,
7942+
origin=origin,
7943+
offset=offset,
79367944
)
79377945

79387946
def first(self: FrameOrSeries, offset) -> FrameOrSeries:

pandas/core/groupby/grouper.py

+7
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,13 @@ class Grouper:
6868
If grouper is PeriodIndex and `freq` parameter is passed.
6969
base : int, default 0
7070
Only when `freq` parameter is passed.
71+
For frequencies that evenly subdivide 1 day, the "origin" of the
72+
aggregated intervals. For example, for '5min' frequency, base could
73+
range from 0 through 4. Defaults to 0.
74+
origin : Timestamp, default None
75+
Only when `freq` parameter is passed.
76+
The timestamp on which to adjust the grouping. If None is passed, the
77+
first day of the time series at midnight is used.
7178
loffset : str, DateOffset, timedelta object
7279
Only when `freq` parameter is passed.
7380

0 commit comments

Comments
 (0)