Skip to content

Commit fe48704

Browse files
committed
DOC: update resample docs
1 parent f0c7c41 commit fe48704

File tree

2 files changed

+48
-34
lines changed

2 files changed

+48
-34
lines changed

doc/source/release.rst

+1
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,7 @@ Thanks
117117
- Joris Van den Bossche
118118
- Joris Vankerschaver
119119
- Josh Levy-Kramer
120+
- Julien Danjou
120121
- Ka Wo Chen
121122
- Karrie Kehoe
122123
- Kelsey Jordahl

doc/source/timeseries.rst

+47-34
Original file line numberDiff line numberDiff line change
@@ -1157,14 +1157,16 @@ Converting to Python datetimes
11571157

11581158
.. _timeseries.resampling:
11591159

1160-
Up- and downsampling
1161-
--------------------
1160+
Resampling
1161+
----------
11621162

1163-
With 0.8, pandas introduces simple, powerful, and efficient functionality for
1163+
Pandas has a simple, powerful, and efficient functionality for
11641164
performing resampling operations during frequency conversion (e.g., converting
11651165
secondly data into 5-minutely data). This is extremely common in, but not
11661166
limited to, financial applications.
11671167

1168+
``resample`` is a time-based groupby, followed by a reduction method on each of its groups.
1169+
11681170
See some :ref:`cookbook examples <cookbook.resample>` for some advanced strategies
11691171

11701172
.. ipython:: python
@@ -1203,19 +1205,6 @@ end of the interval is closed:
12031205
12041206
ts.resample('5Min', closed='left')
12051207
1206-
For upsampling, the ``fill_method`` and ``limit`` parameters can be specified
1207-
to interpolate over the gaps that are created:
1208-
1209-
.. ipython:: python
1210-
1211-
# from secondly to every 250 milliseconds
1212-
1213-
ts[:2].resample('250L')
1214-
1215-
ts[:2].resample('250L', fill_method='pad')
1216-
1217-
ts[:2].resample('250L', fill_method='pad', limit=2)
1218-
12191208
Parameters like ``label`` and ``loffset`` are used to manipulate the resulting
12201209
labels. ``label`` specifies whether the result is labeled with the beginning or
12211210
the end of the interval. ``loffset`` performs a time adjustment on the output
@@ -1240,34 +1229,58 @@ retains the input representation.
12401229
(detail below). It specifies how low frequency periods are converted to higher
12411230
frequency periods.
12421231

1243-
Note that 0.8 marks a watershed in the timeseries functionality in pandas. In
1244-
previous versions, resampling had to be done using a combination of
1245-
``date_range``, ``groupby`` with ``asof``, and then calling an aggregation
1246-
function on the grouped object. This was not nearly as convenient or performant
1247-
as the new pandas timeseries API.
12481232

1249-
Sparse timeseries
1233+
Up Sampling
1234+
~~~~~~~~~~~
1235+
1236+
For upsampling, the ``fill_method`` and ``limit`` parameters can be specified
1237+
to interpolate over the gaps that are created:
1238+
1239+
.. ipython:: python
1240+
1241+
# from secondly to every 250 milliseconds
1242+
1243+
ts[:2].resample('250L')
1244+
1245+
ts[:2].resample('250L', fill_method='pad')
1246+
1247+
ts[:2].resample('250L', fill_method='pad', limit=2)
1248+
1249+
Sparse Resampling
12501250
~~~~~~~~~~~~~~~~~
12511251

1252-
If your timeseries are sparse, be aware that upsampling will generate a lot of
1253-
intermediate points filled with whatever passed as ``fill_method``. What
1254-
``resample`` does is basically a group by and then applying an aggregation
1255-
method on each of its groups, which can also be achieve with something like the
1256-
following.
1252+
Sparse timeseries are ones where you have a lot fewer points relative
1253+
to the amount of time you are looking to resample. Naively upsampling a sparse series can potentially
1254+
generate lots of intermediate values. When you don't want to use a method to fill these values, e.g. ``fill_method`` is ``None``,
1255+
then intermediate values will be filled with ``NaN``.
1256+
1257+
Since ``resample`` is a time-based groupby, the following is a method to efficiently
1258+
resample only the groups that are not all ``NaN``
12571259

12581260
.. ipython:: python
12591261
1260-
def round(t, freq):
1261-
# round a Timestamp to a specified freq
1262-
return Timestamp((t.value // freq.delta.value) * freq.delta.value)
1262+
rng = date_range('2014-1-1', periods=100, freq='D') + Timedelta('1s')
1263+
ts = Series(range(100), index=rng)
12631264
1264-
from functools import partial
1265+
If we want to resample to the full range of the series
12651266

1266-
rng = date_range('1/1/2012', periods=100, freq='S')
1267+
.. ipython:: python
1268+
1269+
ts.resample('3T',how='sum')
1270+
1271+
We can instead only resample those groups where we have points as follows:
1272+
1273+
.. ipython:: python
12671274
1268-
ts = Series(randint(0, 500, len(rng)), index=rng)
1275+
from functools import partial
1276+
from pandas.tseries.frequencies import to_offset
1277+
1278+
def round(t, freq):
1279+
# round a Timestamp to a specified freq
1280+
freq = to_offset(freq)
1281+
return Timestamp((t.value // freq.delta.value) * freq.delta.value)
12691282
1270-
ts.groupby(partial(round, freq=offsets.Minute(3))).sum()
1283+
ts.groupby(partial(round, freq='3T')).sum()
12711284
12721285
.. _timeseries.periods:
12731286

0 commit comments

Comments
 (0)