Skip to content

Commit c7eb7ff

Browse files
committed
Note on look-ahead bias with resampling
1 parent c7748ca commit c7eb7ff

File tree

1 file changed

+42
-37
lines changed

1 file changed

+42
-37
lines changed

doc/source/user_guide/timeseries.rst

+42-37
Original file line numberDiff line numberDiff line change
@@ -761,34 +761,6 @@ regularity will result in a ``DatetimeIndex``, although frequency is lost:
761761
762762
ts2[[0, 2, 6]].index
763763
764-
.. _timeseries.iterating-label:
765-
766-
Iterating through groups
767-
------------------------
768-
769-
With the ``Resampler`` object in hand, iterating through the grouped data is very
770-
natural and functions similarly to :py:func:`itertools.groupby`:
771-
772-
.. ipython:: python
773-
774-
small = pd.Series(
775-
range(6),
776-
index=pd.to_datetime(['2017-01-01T00:00:00',
777-
'2017-01-01T00:30:00',
778-
'2017-01-01T00:31:00',
779-
'2017-01-01T01:00:00',
780-
'2017-01-01T03:00:00',
781-
'2017-01-01T03:05:00'])
782-
)
783-
resampled = small.resample('H')
784-
785-
for name, group in resampled:
786-
print("Group: ", name)
787-
print("-" * 27)
788-
print(group, end="\n\n")
789-
790-
See :ref:`groupby.iterating-label` or :class:`Resampler.__iter__` for more.
791-
792764
.. _timeseries.components:
793765

794766
Time/Date Components
@@ -1628,24 +1600,29 @@ labels.
16281600
16291601
ts.resample('5Min', label='left', loffset='1s').mean()
16301602
1631-
.. note::
1603+
.. warning::
16321604

1633-
The default values for ``label`` and ``closed`` is 'left' for all
1605+
The default values for ``label`` and ``closed`` is '**left**' for all
16341606
frequency offsets except for 'M', 'A', 'Q', 'BM', 'BA', 'BQ', and 'W'
16351607
which all have a default of 'right'.
16361608

1609+
This might lead to unintended look-ahead bias as in the following example:
1610+
16371611
.. ipython:: python
16381612
1639-
rng2 = pd.date_range('1/1/2012', end='3/31/2012', freq='D')
1640-
ts2 = pd.Series(range(len(rng2)), index=rng2)
1613+
s = pd.date_range('2000-01-01', '2000-01-05').to_series()
1614+
s.iloc[2] = pd.NaT
1615+
s.dt.weekday_name
16411616
1642-
# default: label='right', closed='right'
1643-
ts2.resample('M').max()
1617+
# default: label='left', closed='left'
1618+
s.resample('B').last().dt.weekday_name
16441619
1645-
# default: label='left', closed='left'
1646-
ts2.resample('SM').max()
1620+
Notice how the value for Sunday got pulled back to the previous Friday.
1621+
To prevent any look-ahead bias, use instead
16471622

1648-
ts2.resample('SM', label='right', closed='right').max()
1623+
.. ipython:: python
1624+
1625+
s.resample('B', label='right', closed='right').last().dt.weekday_name
16491626
16501627
The ``axis`` parameter can be set to 0 or 1 and allows you to resample the
16511628
specified axis for a ``DataFrame``.
@@ -1796,6 +1773,34 @@ level of ``MultiIndex``, its name or location can be passed to the
17961773
17971774
df.resample('M', level='d').sum()
17981775
1776+
.. _timeseries.iterating-label:
1777+
1778+
Iterating through groups
1779+
~~~~~~~~~~~~~~~~~~~~~~~~
1780+
1781+
With the ``Resampler`` object in hand, iterating through the grouped data is very
1782+
natural and functions similarly to :py:func:`itertools.groupby`:
1783+
1784+
.. ipython:: python
1785+
1786+
small = pd.Series(
1787+
range(6),
1788+
index=pd.to_datetime(['2017-01-01T00:00:00',
1789+
'2017-01-01T00:30:00',
1790+
'2017-01-01T00:31:00',
1791+
'2017-01-01T01:00:00',
1792+
'2017-01-01T03:00:00',
1793+
'2017-01-01T03:05:00'])
1794+
)
1795+
resampled = small.resample('H')
1796+
1797+
for name, group in resampled:
1798+
print("Group: ", name)
1799+
print("-" * 27)
1800+
print(group, end="\n\n")
1801+
1802+
See :ref:`groupby.iterating-label` or :class:`Resampler.__iter__` for more.
1803+
17991804

18001805
.. _timeseries.periods:
18011806

0 commit comments

Comments
 (0)