@@ -1157,14 +1157,16 @@ Converting to Python datetimes
1157
1157
1158
1158
.. _timeseries.resampling :
1159
1159
1160
- Up- and downsampling
1161
- --------------------
1160
+ Resampling
1161
+ ----------
1162
1162
1163
- With 0.8, pandas introduces simple, powerful, and efficient functionality for
1163
+ Pandas has a simple, powerful, and efficient functionality for
1164
1164
performing resampling operations during frequency conversion (e.g., converting
1165
1165
secondly data into 5-minutely data). This is extremely common in, but not
1166
1166
limited to, financial applications.
1167
1167
1168
+ ``resample `` is a time-based groupby, followed by a reduction method on each of its groups.
1169
+
1168
1170
See some :ref: `cookbook examples <cookbook.resample >` for some advanced strategies
1169
1171
1170
1172
.. ipython :: python
@@ -1203,19 +1205,6 @@ end of the interval is closed:
1203
1205
1204
1206
ts.resample(' 5Min' , closed = ' left' )
1205
1207
1206
- For upsampling, the ``fill_method `` and ``limit `` parameters can be specified
1207
- to interpolate over the gaps that are created:
1208
-
1209
- .. ipython :: python
1210
-
1211
- # from secondly to every 250 milliseconds
1212
-
1213
- ts[:2 ].resample(' 250L' )
1214
-
1215
- ts[:2 ].resample(' 250L' , fill_method = ' pad' )
1216
-
1217
- ts[:2 ].resample(' 250L' , fill_method = ' pad' , limit = 2 )
1218
-
1219
1208
Parameters like ``label `` and ``loffset `` are used to manipulate the resulting
1220
1209
labels. ``label `` specifies whether the result is labeled with the beginning or
1221
1210
the end of the interval. ``loffset `` performs a time adjustment on the output
@@ -1240,34 +1229,58 @@ retains the input representation.
1240
1229
(detail below). It specifies how low frequency periods are converted to higher
1241
1230
frequency periods.
1242
1231
1243
- Note that 0.8 marks a watershed in the timeseries functionality in pandas. In
1244
- previous versions, resampling had to be done using a combination of
1245
- ``date_range ``, ``groupby `` with ``asof ``, and then calling an aggregation
1246
- function on the grouped object. This was not nearly as convenient or performant
1247
- as the new pandas timeseries API.
1248
1232
1249
- Sparse timeseries
1233
+ Up Sampling
1234
+ ~~~~~~~~~~~
1235
+
1236
+ For upsampling, the ``fill_method `` and ``limit `` parameters can be specified
1237
+ to interpolate over the gaps that are created:
1238
+
1239
+ .. ipython :: python
1240
+
1241
+ # from secondly to every 250 milliseconds
1242
+
1243
+ ts[:2 ].resample(' 250L' )
1244
+
1245
+ ts[:2 ].resample(' 250L' , fill_method = ' pad' )
1246
+
1247
+ ts[:2 ].resample(' 250L' , fill_method = ' pad' , limit = 2 )
1248
+
1249
+ Sparse Resampling
1250
1250
~~~~~~~~~~~~~~~~~
1251
1251
1252
- If your timeseries are sparse, be aware that upsampling will generate a lot of
1253
- intermediate points filled with whatever passed as ``fill_method ``. What
1254
- ``resample `` does is basically a group by and then applying an aggregation
1255
- method on each of its groups, which can also be achieve with something like the
1256
- following.
1252
+ Sparse timeseries are ones where you have a lot fewer points relative
1253
+ to the amount of time you are looking to resample. Naively upsampling a sparse series can potentially
1254
+ generate lots of intermediate values. When you don't want to use a method to fill these values, e.g. ``fill_method `` is ``None ``,
1255
+ then intermediate values will be filled with ``NaN ``.
1256
+
1257
+ Since ``resample `` is a time-based groupby, the following is a method to efficiently
1258
+ resample only the groups that are not all ``NaN ``
1257
1259
1258
1260
.. ipython :: python
1259
1261
1260
- def round (t , freq ):
1261
- # round a Timestamp to a specified freq
1262
- return Timestamp((t.value // freq.delta.value) * freq.delta.value)
1262
+ rng = date_range(' 2014-1-1' , periods = 100 , freq = ' D' ) + Timedelta(' 1s' )
1263
+ ts = Series(range (100 ), index = rng)
1263
1264
1264
- from functools import partial
1265
+ If we want to resample to the full range of the series
1265
1266
1266
- rng = date_range(' 1/1/2012' , periods = 100 , freq = ' S' )
1267
+ .. ipython :: python
1268
+
1269
+ ts.resample(' 3T' ,how = ' sum' )
1270
+
1271
+ We can instead only resample those groups where we have points as follows:
1272
+
1273
+ .. ipython :: python
1267
1274
1268
- ts = Series(randint(0 , 500 , len (rng)), index = rng)
1275
+ from functools import partial
1276
+ from pandas.tseries.frequencies import to_offset
1277
+
1278
+ def round (t , freq ):
1279
+ # round a Timestamp to a specified freq
1280
+ freq = to_offset(freq)
1281
+ return Timestamp((t.value // freq.delta.value) * freq.delta.value)
1269
1282
1270
- ts.groupby(partial(round , freq = offsets.Minute( 3 ) )).sum()
1283
+ ts.groupby(partial(round , freq = ' 3T ' )).sum()
1271
1284
1272
1285
.. _timeseries.periods :
1273
1286
0 commit comments