Skip to content

Commit e570570

Browse files
committed
ENH: .resample API to groupby-like class, pandas-dev#11732
original API detection & warning support for isinstance / numeric ops support for comparison ops DOC: documentation updates w.r.t. aggregation
1 parent 6a32f10 commit e570570

22 files changed

+2037
-804
lines changed

doc/source/api.rst

+58
Original file line numberDiff line numberDiff line change
@@ -1729,6 +1729,64 @@ The following methods are available only for ``DataFrameGroupBy`` objects.
17291729
DataFrameGroupBy.corrwith
17301730
DataFrameGroupBy.boxplot
17311731

1732+
Resampling
1733+
----------
1734+
.. currentmodule:: pandas.tseries.resample
1735+
1736+
Resampler objects are returned by resample calls: :func:`pandas.DataFrame.resample`, :func:`pandas.Series.resample`.
1737+
1738+
Indexing, iteration
1739+
~~~~~~~~~~~~~~~~~~~
1740+
.. autosummary::
1741+
:toctree: generated/
1742+
1743+
Resampler.__iter__
1744+
Resampler.groups
1745+
Resampler.indices
1746+
Resampler.get_group
1747+
1748+
Function application
1749+
~~~~~~~~~~~~~~~~~~~~
1750+
.. autosummary::
1751+
:toctree: generated/
1752+
1753+
Resampler.apply
1754+
Resampler.aggregate
1755+
Resampler.transform
1756+
1757+
Upsampling
1758+
~~~~~~~~~~
1759+
1760+
.. autosummary::
1761+
:toctree: generated/
1762+
1763+
Resampler.ffill
1764+
Resampler.backfill
1765+
Resampler.bfill
1766+
Resampler.pad
1767+
Resampler.fillna
1768+
Resampler.asfreq
1769+
1770+
Computations / Descriptive Stats
1771+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1772+
.. autosummary::
1773+
:toctree: generated/
1774+
1775+
Resampler.count
1776+
Resampler.first
1777+
Resampler.last
1778+
Resampler.max
1779+
Resampler.mean
1780+
Resampler.median
1781+
Resampler.min
1782+
Resampler.ohlc
1783+
Resampler.prod
1784+
Resampler.size
1785+
Resampler.sem
1786+
Resampler.std
1787+
Resampler.sum
1788+
Resampler.var
1789+
17321790
Style
17331791
-----
17341792
.. currentmodule:: pandas.core.style

doc/source/cookbook.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -567,7 +567,7 @@ Unlike agg, apply's callable is passed a sub-DataFrame which gives you access to
567567
return pd.NaT
568568
569569
mhc = {'Mean' : np.mean, 'Max' : np.max, 'Custom' : MyCust}
570-
ts.resample("5min",how = mhc)
570+
ts.resample("5min").apply(mhc)
571571
ts
572572
573573
`Create a value counts column and reassign back to the DataFrame

doc/source/release.rst

+6-1
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,12 @@ users upgrade to this version.
4848

4949
Highlights include:
5050

51-
See the :ref:`v0.17.0 Whatsnew <whatsnew_0180>` overview for an extensive list
51+
Highlights include:
52+
53+
- Window functions are now methods on ``.groupby`` like objects, see :ref:`here <whatsnew_0180.enhancements.moments>`.
54+
- API breaking ``.resample`` changes to make it more ``.groupby`` like, see :ref:`here <whatsnew_0180.resample>`.
55+
56+
See the :ref:`v0.18.0 Whatsnew <whatsnew_0180>` overview for an extensive list
5257
of all enhancements and bugs that have been fixed in 0.17.1.
5358

5459
Thanks

doc/source/timedeltas.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -401,4 +401,4 @@ Similar to :ref:`timeseries resampling <timeseries.resampling>`, we can resample
401401

402402
.. ipython:: python
403403
404-
s.resample('D')
404+
s.resample('D').mean()

doc/source/timeseries.rst

+88-16
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ Resample:
6868
.. ipython:: python
6969
7070
# Daily means
71-
ts.resample('D', how='mean')
71+
ts.resample('D').mean()
7272
7373
7474
.. _timeseries.overview:
@@ -1211,6 +1211,11 @@ Converting to Python datetimes
12111211
Resampling
12121212
----------
12131213

1214+
.. warning::
1215+
1216+
The interface to ``.resample`` has changed in 0.18.0 to be more groupby-like and hence more flexible.
1217+
See the :ref:`whatsnew docs <whatsnew_0180.breaking.resample>` for a comparison with prior versions.
1218+
12141219
Pandas has a simple, powerful, and efficient functionality for
12151220
performing resampling operations during frequency conversion (e.g., converting
12161221
secondly data into 5-minutely data). This is extremely common in, but not
@@ -1226,7 +1231,7 @@ See some :ref:`cookbook examples <cookbook.resample>` for some advanced strategi
12261231
12271232
ts = Series(randint(0, 500, len(rng)), index=rng)
12281233
1229-
ts.resample('5Min', how='sum')
1234+
ts.resample('5Min').sum()
12301235
12311236
The ``resample`` function is very flexible and allows you to specify many
12321237
different parameters to control the frequency conversion and resampling
@@ -1237,11 +1242,11 @@ an array and produces aggregated values:
12371242

12381243
.. ipython:: python
12391244
1240-
ts.resample('5Min') # default is mean
1245+
ts.resample('5Min').mean()
12411246
1242-
ts.resample('5Min', how='ohlc')
1247+
ts.resample('5Min').ohlc()
12431248
1244-
ts.resample('5Min', how=np.max)
1249+
ts.resample('5Min').max()
12451250
12461251
Any function available via :ref:`dispatching <groupby.dispatch>` can be given to
12471252
the ``how`` parameter by name, including ``sum``, ``mean``, ``std``, ``sem``,
@@ -1252,9 +1257,9 @@ end of the interval is closed:
12521257

12531258
.. ipython:: python
12541259
1255-
ts.resample('5Min', closed='right')
1260+
ts.resample('5Min', closed='right').mean()
12561261
1257-
ts.resample('5Min', closed='left')
1262+
ts.resample('5Min', closed='left').mean()
12581263
12591264
Parameters like ``label`` and ``loffset`` are used to manipulate the resulting
12601265
labels. ``label`` specifies whether the result is labeled with the beginning or
@@ -1263,11 +1268,11 @@ labels.
12631268

12641269
.. ipython:: python
12651270
1266-
ts.resample('5Min') # by default label='right'
1271+
ts.resample('5Min').mean() # by default label='right'
12671272
1268-
ts.resample('5Min', label='left')
1273+
ts.resample('5Min', label='left').mean()
12691274
1270-
ts.resample('5Min', label='left', loffset='1s')
1275+
ts.resample('5Min', label='left', loffset='1s').mean()
12711276
12721277
The ``axis`` parameter can be set to 0 or 1 and allows you to resample the
12731278
specified axis for a DataFrame.
@@ -1284,18 +1289,17 @@ frequency periods.
12841289
Up Sampling
12851290
~~~~~~~~~~~
12861291

1287-
For upsampling, the ``fill_method`` and ``limit`` parameters can be specified
1288-
to interpolate over the gaps that are created:
1292+
For upsampling, you can specify a way to upsample and the ``limit`` parameter to interpolate over the gaps that are created:
12891293

12901294
.. ipython:: python
12911295
12921296
# from secondly to every 250 milliseconds
12931297
1294-
ts[:2].resample('250L')
1298+
ts[:2].resample('250L').reindex()
12951299
1296-
ts[:2].resample('250L', fill_method='pad')
1300+
ts[:2].resample('250L').ffill()
12971301
1298-
ts[:2].resample('250L', fill_method='pad', limit=2)
1302+
ts[:2].resample('250L').ffill(limit=2)
12991303
13001304
Sparse Resampling
13011305
~~~~~~~~~~~~~~~~~
@@ -1317,7 +1321,7 @@ If we want to resample to the full range of the series
13171321

13181322
.. ipython:: python
13191323
1320-
ts.resample('3T',how='sum')
1324+
ts.resample('3T').sum()
13211325
13221326
We can instead only resample those groups where we have points as follows:
13231327

@@ -1333,6 +1337,74 @@ We can instead only resample those groups where we have points as follows:
13331337
13341338
ts.groupby(partial(round, freq='3T')).sum()
13351339
1340+
Aggregation
1341+
~~~~~~~~~~~
1342+
1343+
Similar to :ref:`groupby aggregates <groupby.aggregate>` and the :ref:`window functions <stats.aggregate>`, a ``Resampler`` can be selectively
1344+
resampled.
1345+
1346+
Resampling a ``DataFrame``, the default will be to act on all columns with the same function.
1347+
1348+
.. ipython:: python
1349+
1350+
df = pd.DataFrame(np.random.randn(1000, 3),
1351+
index=pd.date_range('1/1/2012', freq='S', periods=1000),
1352+
columns=['A', 'B', 'C'])
1353+
r = df.resample('3T')
1354+
r.mean()
1355+
1356+
We can select a specific column or columns using standard getitem.
1357+
1358+
.. ipython:: python
1359+
1360+
r['A'].mean()
1361+
1362+
r[['A','B']].mean()
1363+
1364+
You can pass a list or dict of functions to do aggregation with, outputting a DataFrame:
1365+
1366+
.. ipython:: python
1367+
1368+
r['A'].agg([np.sum, np.mean, np.std])
1369+
1370+
If a dict is passed, the keys will be used to name the columns. Otherwise the
1371+
function's name (stored in the function object) will be used.
1372+
1373+
.. ipython:: python
1374+
1375+
r['A'].agg({'result1' : np.sum,
1376+
'result2' : np.mean})
1377+
1378+
On a resampled DataFrame, you can pass a list of functions to apply to each
1379+
column, which produces an aggregated result with a hierarchical index:
1380+
1381+
.. ipython:: python
1382+
1383+
r.agg([np.sum, np.mean])
1384+
1385+
By passing a dict to ``aggregate`` you can apply a different aggregation to the
1386+
columns of a DataFrame:
1387+
1388+
.. ipython:: python
1389+
:okexcept:
1390+
1391+
r.agg({'A' : np.sum,
1392+
'B' : lambda x: np.std(x, ddof=1)})
1393+
1394+
The function names can also be strings. In order for a string to be valid it
1395+
must be implemented on the Resampled object
1396+
1397+
.. ipython:: python
1398+
1399+
r.agg({'A' : 'sum', 'B' : 'std'})
1400+
1401+
Furthermore you can pass a nested dict to indicate different aggregations on different columns.
1402+
1403+
.. ipython:: python
1404+
1405+
r.agg({'A' : ['sum','std'], 'B' : ['mean','std'] })
1406+
1407+
13361408
.. _timeseries.periods:
13371409

13381410
Time Span Representation

doc/source/whatsnew/v0.10.0.txt

+53-11
Original file line numberDiff line numberDiff line change
@@ -70,16 +70,59 @@ nfrequencies are unaffected. The prior defaults were causing a great deal of
7070
confusion for users, especially resampling data to daily frequency (which
7171
labeled the aggregated group with the end of the interval: the next day).
7272

73-
Note:
74-
75-
.. ipython:: python
76-
77-
dates = pd.date_range('1/1/2000', '1/5/2000', freq='4h')
78-
series = Series(np.arange(len(dates)), index=dates)
79-
series
80-
series.resample('D', how='sum')
81-
# old behavior
82-
series.resample('D', how='sum', closed='right', label='right')
73+
.. code-block:: python
74+
75+
In [1]: dates = pd.date_range('1/1/2000', '1/5/2000', freq='4h')
76+
77+
In [2]: series = Series(np.arange(len(dates)), index=dates)
78+
79+
In [3]: series
80+
Out[3]:
81+
2000-01-01 00:00:00 0
82+
2000-01-01 04:00:00 1
83+
2000-01-01 08:00:00 2
84+
2000-01-01 12:00:00 3
85+
2000-01-01 16:00:00 4
86+
2000-01-01 20:00:00 5
87+
2000-01-02 00:00:00 6
88+
2000-01-02 04:00:00 7
89+
2000-01-02 08:00:00 8
90+
2000-01-02 12:00:00 9
91+
2000-01-02 16:00:00 10
92+
2000-01-02 20:00:00 11
93+
2000-01-03 00:00:00 12
94+
2000-01-03 04:00:00 13
95+
2000-01-03 08:00:00 14
96+
2000-01-03 12:00:00 15
97+
2000-01-03 16:00:00 16
98+
2000-01-03 20:00:00 17
99+
2000-01-04 00:00:00 18
100+
2000-01-04 04:00:00 19
101+
2000-01-04 08:00:00 20
102+
2000-01-04 12:00:00 21
103+
2000-01-04 16:00:00 22
104+
2000-01-04 20:00:00 23
105+
2000-01-05 00:00:00 24
106+
Freq: 4H, dtype: int64
107+
108+
In [4]: series.resample('D', how='sum')
109+
Out[4]:
110+
2000-01-01 15
111+
2000-01-02 51
112+
2000-01-03 87
113+
2000-01-04 123
114+
2000-01-05 24
115+
Freq: D, dtype: int64
116+
117+
In [5]: # old behavior
118+
In [6]: series.resample('D', how='sum', closed='right', label='right')
119+
Out[6]:
120+
2000-01-01 0
121+
2000-01-02 21
122+
2000-01-03 57
123+
2000-01-04 93
124+
2000-01-05 129
125+
Freq: D, dtype: int64
83126

84127
- Infinity and negative infinity are no longer treated as NA by ``isnull`` and
85128
``notnull``. That they ever were was a relic of early pandas. This behavior
@@ -354,4 +397,3 @@ Adding experimental support for Panel4D and factory functions to create n-dimens
354397
See the :ref:`full release notes
355398
<release>` or issue tracker
356399
on GitHub for a complete list.
357-

0 commit comments

Comments
 (0)