Skip to content

Commit 5b59fc0

Browse files
committed
ENH: .resample API to groupby-like class, pandas-dev#11732
1 parent 7f062da commit 5b59fc0

File tree

13 files changed

+1268
-695
lines changed

13 files changed

+1268
-695
lines changed

doc/source/release.rst

+6-1
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,12 @@ users upgrade to this version.
4848

4949
Highlights include:
5050

51-
See the :ref:`v0.17.0 Whatsnew <whatsnew_0180>` overview for an extensive list
51+
Highlights include:
52+
53+
- Window functions are now methods on ``.groupby`` like objects, see :ref:`here <whatsnew_0180.enhancements.moments>`.
54+
- API breaking ``.resample`` changes to make it more ``.groupby`` like, see :ref:`here <whatsnew_0180.resample>`.
55+
56+
See the :ref:`v0.18.0 Whatsnew <whatsnew_0180>` overview for an extensive list
5257
of all enhancements and bugs that have been fixed in 0.17.1.
5358

5459
Thanks

doc/source/whatsnew/v0.18.0.txt

+129
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ users upgrade to this version.
1414
Highlights include:
1515

1616
- Window functions are now methods on ``.groupby`` like objects, see :ref:`here <whatsnew_0180.enhancements.moments>`.
17+
- API breaking ``.resample`` changes to make it more ``.groupby`` like, see :ref:`here <whatsnew_0180.resample>`.
1718

1819
Check the :ref:`API Changes <whatsnew_0180.api>` and :ref:`deprecations <whatsnew_0180.deprecations>` before updating.
1920

@@ -202,6 +203,134 @@ other anchored offsets like ``MonthBegin`` and ``YearBegin``.
202203
d + pd.offsets.QuarterBegin(n=0, startingMonth=2)
203204

204205

206+
.. _whatsnew_0180.enhancements.resample:
207+
208+
Resample API
209+
^^^^^^^^^^^^
210+
211+
Like the change in the window functions API `above :ref:whatsnew_0180.enhancements.moments:`, ``.resample(...)`` is changing to have
212+
a more groupy-like API. (:issue:`11732`).
213+
214+
.. ipython:: python
215+
216+
np.random.seed(1234)
217+
df = pd.DataFrame(np.random.rand(10,4),
218+
columns=list('ABCD'),
219+
index=pd.date_range('2010-01-01 09:00:00', periods=10, freq='s'))
220+
df
221+
222+
Previously you would write a resampling operations:
223+
224+
This defaults to ``how='mean'``
225+
226+
.. code-block:: python
227+
228+
In [6]: df.resample('2s')
229+
Out[6]:
230+
A B C D
231+
2010-01-01 09:00:00 0.485748 0.447351 0.357096 0.793615
232+
2010-01-01 09:00:02 0.820801 0.794317 0.364034 0.531096
233+
2010-01-01 09:00:04 0.433985 0.314582 0.424104 0.625733
234+
2010-01-01 09:00:06 0.624988 0.609738 0.633165 0.612452
235+
2010-01-01 09:00:08 0.510470 0.534317 0.573201 0.806949
236+
237+
You can also specify a ``how`` directly
238+
239+
.. code-block:: python
240+
241+
In [7]: df.resample('2s',how='sum')
242+
Out[7]:
243+
A B C D
244+
2010-01-01 09:00:00 0.971495 0.894701 0.714192 1.587231
245+
2010-01-01 09:00:02 1.641602 1.588635 0.728068 1.062191
246+
2010-01-01 09:00:04 0.867969 0.629165 0.848208 1.251465
247+
2010-01-01 09:00:06 1.249976 1.219477 1.266330 1.224904
248+
2010-01-01 09:00:08 1.020940 1.068634 1.146402 1.613897
249+
250+
Now, you write ``.resample`` as a 2-stage operation like groupby, which
251+
yields a ``Resampler``.
252+
253+
.. ipython:: python
254+
255+
r = df.resample('2s')
256+
r
257+
258+
Downsampling
259+
''''''''''''
260+
261+
You can then use this object to perform similar operations.
262+
These are downsampling operations (going from a lower frequency to a higher one).
263+
264+
.. ipython:: python
265+
266+
r.mean()
267+
268+
.. ipython:: python
269+
270+
r.sum()
271+
272+
Furthermore, resample now supports ``getitem`` operations to selectively perform the resample.
273+
274+
.. ipython:: python
275+
276+
r[['A','C']].mean()
277+
278+
and ``.aggregate`` type of operations.
279+
280+
.. ipython:: python
281+
282+
r.agg({'A' : 'mean', 'B' : 'sum'})
283+
284+
These accessors can of course, be combined
285+
286+
.. ipython:: python
287+
288+
r[['A','B']].agg(['mean','sum'])
289+
290+
Upsampling
291+
''''''''''
292+
293+
Upsampling operations take you from a higher frequency to a lower frequency. These are now
294+
performed with the ``Resampler`` objects with pad/fill/upsample methods.
295+
296+
.. ipython:: python
297+
298+
s = Series(np.arange(5,dtype='int64'),
299+
index=date_range('2010-01-01', periods=5, freq='Q'))
300+
s
301+
302+
Previously
303+
304+
.. code-block:: python
305+
306+
In [6]: s.resample('M', fill_method='ffill')
307+
Out[6]:
308+
2010-03-31 0
309+
2010-04-30 0
310+
2010-05-31 0
311+
2010-06-30 1
312+
2010-07-31 1
313+
2010-08-31 1
314+
2010-09-30 2
315+
2010-10-31 2
316+
2010-11-30 2
317+
2010-12-31 3
318+
2011-01-31 3
319+
2011-02-28 3
320+
2011-03-31 4
321+
Freq: M, dtype: int64
322+
323+
New API
324+
325+
.. ipython:: python
326+
327+
s.resample('M').ffill()
328+
329+
:: note:
330+
331+
In the new API, you can either downsample OR upsample. The prior implementation would allow you to pass an aggregator function (like ``mean``) even though you were upsampling, provide a bit of confusion.
332+
333+
205334
Other API Changes
206335
^^^^^^^^^^^^^^^^^
207336

pandas/core/generic.py

+42-22
Original file line numberDiff line numberDiff line change
@@ -3556,22 +3556,14 @@ def resample(self, rule, how=None, axis=0, fill_method=None,
35563556
----------
35573557
rule : string
35583558
the offset string or object representing target conversion
3559-
how : string
3560-
method for down- or re-sampling, default to 'mean' for
3561-
downsampling
35623559
axis : int, optional, default 0
3563-
fill_method : string, default None
3564-
fill_method for upsampling
35653560
closed : {'right', 'left'}
35663561
Which side of bin interval is closed
35673562
label : {'right', 'left'}
35683563
Which bin edge label to label bucket with
35693564
convention : {'start', 'end', 's', 'e'}
3570-
kind : "period"/"timestamp"
35713565
loffset : timedelta
35723566
Adjust the resampled time labels
3573-
limit : int, default None
3574-
Maximum size gap to when reindexing with fill_method
35753567
base : int, default 0
35763568
For frequencies that evenly subdivide 1 day, the "origin" of the
35773569
aggregated intervals. For example, for '5min' frequency, base could
@@ -3600,7 +3592,7 @@ def resample(self, rule, how=None, axis=0, fill_method=None,
36003592
Downsample the series into 3 minute bins and sum the values
36013593
of the timestamps falling into a bin.
36023594
3603-
>>> series.resample('3T', how='sum')
3595+
>>> series.resample('3T').sum()
36043596
2000-01-01 00:00:00 3
36053597
2000-01-01 00:03:00 12
36063598
2000-01-01 00:06:00 21
@@ -3616,7 +3608,7 @@ def resample(self, rule, how=None, axis=0, fill_method=None,
36163608
To include this value close the right side of the bin interval as
36173609
illustrated in the example below this one.
36183610
3619-
>>> series.resample('3T', how='sum', label='right')
3611+
>>> series.resample('3T', label='right').sum()
36203612
2000-01-01 00:03:00 3
36213613
2000-01-01 00:06:00 12
36223614
2000-01-01 00:09:00 21
@@ -3625,7 +3617,7 @@ def resample(self, rule, how=None, axis=0, fill_method=None,
36253617
Downsample the series into 3 minute bins as above, but close the right
36263618
side of the bin interval.
36273619
3628-
>>> series.resample('3T', how='sum', label='right', closed='right')
3620+
>>> series.resample('3T', label='right', closed='right').sum()
36293621
2000-01-01 00:00:00 0
36303622
2000-01-01 00:03:00 6
36313623
2000-01-01 00:06:00 15
@@ -3634,7 +3626,7 @@ def resample(self, rule, how=None, axis=0, fill_method=None,
36343626
36353627
Upsample the series into 30 second bins.
36363628
3637-
>>> series.resample('30S')[0:5] #select first 5 rows
3629+
>>> series.resample('30S').upsample()[0:5] #select first 5 rows
36383630
2000-01-01 00:00:00 0
36393631
2000-01-01 00:00:30 NaN
36403632
2000-01-01 00:01:00 1
@@ -3645,7 +3637,7 @@ def resample(self, rule, how=None, axis=0, fill_method=None,
36453637
Upsample the series into 30 second bins and fill the ``NaN``
36463638
values using the ``pad`` method.
36473639
3648-
>>> series.resample('30S', fill_method='pad')[0:5]
3640+
>>> series.resample('30S').pad()[0:5]
36493641
2000-01-01 00:00:00 0
36503642
2000-01-01 00:00:30 0
36513643
2000-01-01 00:01:00 1
@@ -3656,34 +3648,62 @@ def resample(self, rule, how=None, axis=0, fill_method=None,
36563648
Upsample the series into 30 second bins and fill the
36573649
``NaN`` values using the ``bfill`` method.
36583650
3659-
>>> series.resample('30S', fill_method='bfill')[0:5]
3651+
>>> series.resample('30S').bfill()[0:5]
36603652
2000-01-01 00:00:00 0
36613653
2000-01-01 00:00:30 1
36623654
2000-01-01 00:01:00 1
36633655
2000-01-01 00:01:30 2
36643656
2000-01-01 00:02:00 2
36653657
Freq: 30S, dtype: int64
36663658
3667-
Pass a custom function to ``how``.
3659+
Pass a custom function via ``apply``
36683660
36693661
>>> def custom_resampler(array_like):
36703662
... return np.sum(array_like)+5
36713663
3672-
>>> series.resample('3T', how=custom_resampler)
3664+
>>> series.resample('3T').apply(custom_resampler)
36733665
2000-01-01 00:00:00 8
36743666
2000-01-01 00:03:00 17
36753667
2000-01-01 00:06:00 26
36763668
Freq: 3T, dtype: int64
36773669
36783670
"""
3671+
from pandas.tseries.resample import resample
36793672

3680-
from pandas.tseries.resample import TimeGrouper
36813673
axis = self._get_axis_number(axis)
3682-
sampler = TimeGrouper(rule, label=label, closed=closed, how=how,
3683-
axis=axis, kind=kind, loffset=loffset,
3684-
fill_method=fill_method, convention=convention,
3685-
limit=limit, base=base)
3686-
return sampler.resample(self).__finalize__(self)
3674+
r = resample(self, freq=rule, label=label, closed=closed,
3675+
axis=axis, kind=kind, loffset=loffset,
3676+
fill_method=fill_method, convention=convention,
3677+
limit=limit, base=base)
3678+
3679+
# deprecation warning
3680+
# but call the method anyhow
3681+
if fill_method is not None:
3682+
args = "limit={0}".format(limit) if limit is not None else ""
3683+
warnings.warn("fill_method is deprecated to .resample()\nthe new syntax is "
3684+
".resample(...).{fill_method}({args})".format(fill_method=fill_method,
3685+
args=args),
3686+
FutureWarning, stacklevel=2)
3687+
return r.aggregate(fill_method, limit=limit)
3688+
3689+
# deprecation warning
3690+
# but call the method anyhow
3691+
if how is not None:
3692+
3693+
# .resample(..., how='sum')
3694+
if isinstance(how, compat.string_types):
3695+
method = "{0}()".format(how)
3696+
3697+
# .resample(..., how=lambda x: ....)
3698+
else:
3699+
method = ".apply(<func>)"
3700+
3701+
warnings.warn("how in .resample() is deprecated\nthe new syntax is "
3702+
".resample(...).{method}".format(method=method),
3703+
FutureWarning, stacklevel=2)
3704+
return r.aggregate(how)
3705+
3706+
return r
36873707

36883708
def first(self, offset):
36893709
"""

0 commit comments

Comments
 (0)