Skip to content

Commit 634577e

Browse files
0x0Ljreback
authored andcommitted
DOC: warning on look-ahead bias with resampling (#26754)
1 parent a137a9c commit 634577e

File tree

1 file changed

+45
-37
lines changed

1 file changed

+45
-37
lines changed

doc/source/user_guide/timeseries.rst

+45-37
Original file line numberDiff line numberDiff line change
@@ -761,34 +761,6 @@ regularity will result in a ``DatetimeIndex``, although frequency is lost:
761761
762762
ts2[[0, 2, 6]].index
763763
764-
.. _timeseries.iterating-label:
765-
766-
Iterating through groups
767-
------------------------
768-
769-
With the ``Resampler`` object in hand, iterating through the grouped data is very
770-
natural and functions similarly to :py:func:`itertools.groupby`:
771-
772-
.. ipython:: python
773-
774-
small = pd.Series(
775-
range(6),
776-
index=pd.to_datetime(['2017-01-01T00:00:00',
777-
'2017-01-01T00:30:00',
778-
'2017-01-01T00:31:00',
779-
'2017-01-01T01:00:00',
780-
'2017-01-01T03:00:00',
781-
'2017-01-01T03:05:00'])
782-
)
783-
resampled = small.resample('H')
784-
785-
for name, group in resampled:
786-
print("Group: ", name)
787-
print("-" * 27)
788-
print(group, end="\n\n")
789-
790-
See :ref:`groupby.iterating-label` or :class:`Resampler.__iter__` for more.
791-
792764
.. _timeseries.components:
793765

794766
Time/Date Components
@@ -1628,24 +1600,32 @@ labels.
16281600
16291601
ts.resample('5Min', label='left', loffset='1s').mean()
16301602
1631-
.. note::
1603+
.. warning::
16321604

1633-
The default values for ``label`` and ``closed`` is 'left' for all
1605+
The default values for ``label`` and ``closed`` is '**left**' for all
16341606
frequency offsets except for 'M', 'A', 'Q', 'BM', 'BA', 'BQ', and 'W'
16351607
which all have a default of 'right'.
16361608

1609+
This might unintendedly lead to looking ahead, where the value for a later
1610+
time is pulled back to a previous time as in the following example with
1611+
the :class:`~pandas.tseries.offsets.BusinessDay` frequency:
1612+
16371613
.. ipython:: python
16381614
1639-
rng2 = pd.date_range('1/1/2012', end='3/31/2012', freq='D')
1640-
ts2 = pd.Series(range(len(rng2)), index=rng2)
1615+
s = pd.date_range('2000-01-01', '2000-01-05').to_series()
1616+
s.iloc[2] = pd.NaT
1617+
s.dt.weekday_name
16411618
1642-
# default: label='right', closed='right'
1643-
ts2.resample('M').max()
1619+
# default: label='left', closed='left'
1620+
s.resample('B').last().dt.weekday_name
16441621
1645-
# default: label='left', closed='left'
1646-
ts2.resample('SM').max()
1622+
Notice how the value for Sunday got pulled back to the previous Friday.
1623+
To get the behavior where the value for Sunday is pushed to Monday, use
1624+
instead
16471625

1648-
ts2.resample('SM', label='right', closed='right').max()
1626+
.. ipython:: python
1627+
1628+
s.resample('B', label='right', closed='right').last().dt.weekday_name
16491629
16501630
The ``axis`` parameter can be set to 0 or 1 and allows you to resample the
16511631
specified axis for a ``DataFrame``.
@@ -1796,6 +1776,34 @@ level of ``MultiIndex``, its name or location can be passed to the
17961776
17971777
df.resample('M', level='d').sum()
17981778
1779+
.. _timeseries.iterating-label:
1780+
1781+
Iterating through groups
1782+
~~~~~~~~~~~~~~~~~~~~~~~~
1783+
1784+
With the ``Resampler`` object in hand, iterating through the grouped data is very
1785+
natural and functions similarly to :py:func:`itertools.groupby`:
1786+
1787+
.. ipython:: python
1788+
1789+
small = pd.Series(
1790+
range(6),
1791+
index=pd.to_datetime(['2017-01-01T00:00:00',
1792+
'2017-01-01T00:30:00',
1793+
'2017-01-01T00:31:00',
1794+
'2017-01-01T01:00:00',
1795+
'2017-01-01T03:00:00',
1796+
'2017-01-01T03:05:00'])
1797+
)
1798+
resampled = small.resample('H')
1799+
1800+
for name, group in resampled:
1801+
print("Group: ", name)
1802+
print("-" * 27)
1803+
print(group, end="\n\n")
1804+
1805+
See :ref:`groupby.iterating-label` or :class:`Resampler.__iter__` for more.
1806+
17991807

18001808
.. _timeseries.periods:
18011809

0 commit comments

Comments
 (0)