Skip to content

Commit 8d3357f

Browse files
authored
DOC warn user about potential information loss in Resampler.interpolate (#52198)
* docs: improve resample interpolate docs * Improve docs * Fix comments after pr review * fix black linter outputs * Remove DataFrame.interpolate docs from resample docs * fix typo * Add returns section * Add parameters to core.resample.Resampler.interpolate * revert gitignore
1 parent 7fe71f4 commit 8d3357f

File tree

1 file changed

+154
-2
lines changed

1 file changed

+154
-2
lines changed

pandas/core/resample.py

+154-2
Original file line numberDiff line numberDiff line change
@@ -825,7 +825,6 @@ def fillna(self, method, limit: int | None = None):
825825
"""
826826
return self._upsample(method, limit=limit)
827827

828-
@doc(NDFrame.interpolate, **_shared_docs_kwargs)
829828
def interpolate(
830829
self,
831830
method: QuantileInterpolation = "linear",
@@ -839,7 +838,160 @@ def interpolate(
839838
**kwargs,
840839
):
841840
"""
842-
Interpolate values according to different methods.
841+
Interpolate values between target timestamps according to different methods.
842+
843+
The original index is first reindexed to target timestamps
844+
(see :meth:`core.resample.Resampler.asfreq`),
845+
then the interpolation of ``NaN`` values via :meth`DataFrame.interpolate`
846+
happens.
847+
848+
Parameters
849+
----------
850+
method : str, default 'linear'
851+
Interpolation technique to use. One of:
852+
853+
* 'linear': Ignore the index and treat the values as equally
854+
spaced. This is the only method supported on MultiIndexes.
855+
* 'time': Works on daily and higher resolution data to interpolate
856+
given length of interval.
857+
* 'index', 'values': use the actual numerical values of the index.
858+
* 'pad': Fill in NaNs using existing values.
859+
* 'nearest', 'zero', 'slinear', 'quadratic', 'cubic',
860+
'barycentric', 'polynomial': Passed to
861+
`scipy.interpolate.interp1d`, whereas 'spline' is passed to
862+
`scipy.interpolate.UnivariateSpline`. These methods use the numerical
863+
values of the index. Both 'polynomial' and 'spline' require that
864+
you also specify an `order` (int), e.g.
865+
``df.interpolate(method='polynomial', order=5)``. Note that,
866+
`slinear` method in Pandas refers to the Scipy first order `spline`
867+
instead of Pandas first order `spline`.
868+
* 'krogh', 'piecewise_polynomial', 'spline', 'pchip', 'akima',
869+
'cubicspline': Wrappers around the SciPy interpolation methods of
870+
similar names. See `Notes`.
871+
* 'from_derivatives': Refers to
872+
`scipy.interpolate.BPoly.from_derivatives` which
873+
replaces 'piecewise_polynomial' interpolation method in
874+
scipy 0.18.
875+
876+
axis : {{0 or 'index', 1 or 'columns', None}}, default None
877+
Axis to interpolate along. For `Series` this parameter is unused
878+
and defaults to 0.
879+
limit : int, optional
880+
Maximum number of consecutive NaNs to fill. Must be greater than
881+
0.
882+
inplace : bool, default False
883+
Update the data in place if possible.
884+
limit_direction : {{'forward', 'backward', 'both'}}, Optional
885+
Consecutive NaNs will be filled in this direction.
886+
887+
If limit is specified:
888+
* If 'method' is 'pad' or 'ffill', 'limit_direction' must be 'forward'.
889+
* If 'method' is 'backfill' or 'bfill', 'limit_direction' must be
890+
'backwards'.
891+
892+
If 'limit' is not specified:
893+
* If 'method' is 'backfill' or 'bfill', the default is 'backward'
894+
* else the default is 'forward'
895+
896+
.. versionchanged:: 1.1.0
897+
raises ValueError if `limit_direction` is 'forward' or 'both' and
898+
method is 'backfill' or 'bfill'.
899+
raises ValueError if `limit_direction` is 'backward' or 'both' and
900+
method is 'pad' or 'ffill'.
901+
902+
limit_area : {{`None`, 'inside', 'outside'}}, default None
903+
If limit is specified, consecutive NaNs will be filled with this
904+
restriction.
905+
906+
* ``None``: No fill restriction.
907+
* 'inside': Only fill NaNs surrounded by valid values
908+
(interpolate).
909+
* 'outside': Only fill NaNs outside valid values (extrapolate).
910+
911+
downcast : optional, 'infer' or None, defaults to None
912+
Downcast dtypes if possible.
913+
``**kwargs`` : optional
914+
Keyword arguments to pass on to the interpolating function.
915+
916+
Returns
917+
-------
918+
DataFrame or Series
919+
Interpolated values at the specified freq.
920+
921+
See Also
922+
--------
923+
core.resample.Resampler.asfreq: Return the values at the new freq,
924+
essentially a reindex.
925+
DataFrame.interpolate: Fill NaN values using an interpolation method.
926+
927+
Notes
928+
-----
929+
For high-frequent or non-equidistant time-series with timestamps
930+
the reindexing followed by interpolation may lead to information loss
931+
as shown in the last example.
932+
933+
Examples
934+
--------
935+
936+
>>> import datetime as dt
937+
>>> timesteps = [
938+
... dt.datetime(2023, 3, 1, 7, 0, 0),
939+
... dt.datetime(2023, 3, 1, 7, 0, 1),
940+
... dt.datetime(2023, 3, 1, 7, 0, 2),
941+
... dt.datetime(2023, 3, 1, 7, 0, 3),
942+
... dt.datetime(2023, 3, 1, 7, 0, 4)]
943+
>>> series = pd.Series(data=[1, -1, 2, 1, 3], index=timesteps)
944+
>>> series
945+
2023-03-01 07:00:00 1
946+
2023-03-01 07:00:01 -1
947+
2023-03-01 07:00:02 2
948+
2023-03-01 07:00:03 1
949+
2023-03-01 07:00:04 3
950+
dtype: int64
951+
952+
Upsample the dataframe to 0.5Hz by providing the period time of 2s.
953+
954+
>>> series.resample("2s").interpolate("linear")
955+
2023-03-01 07:00:00 1
956+
2023-03-01 07:00:02 2
957+
2023-03-01 07:00:04 3
958+
Freq: 2S, dtype: int64
959+
960+
Downsample the dataframe to 2Hz by providing the period time of 500ms.
961+
962+
>>> series.resample("500ms").interpolate("linear")
963+
2023-03-01 07:00:00.000 1.0
964+
2023-03-01 07:00:00.500 0.0
965+
2023-03-01 07:00:01.000 -1.0
966+
2023-03-01 07:00:01.500 0.5
967+
2023-03-01 07:00:02.000 2.0
968+
2023-03-01 07:00:02.500 1.5
969+
2023-03-01 07:00:03.000 1.0
970+
2023-03-01 07:00:03.500 2.0
971+
2023-03-01 07:00:04.000 3.0
972+
Freq: 500L, dtype: float64
973+
974+
Internal reindexing with ``as_freq()`` prior to interpolation leads to
975+
an interpolated timeseries on the basis the reindexed timestamps (anchors).
976+
Since not all datapoints from original series become anchors,
977+
it can lead to misleading interpolation results as in the following example:
978+
979+
>>> series.resample("400ms").interpolate("linear")
980+
2023-03-01 07:00:00.000 1.0
981+
2023-03-01 07:00:00.400 1.2
982+
2023-03-01 07:00:00.800 1.4
983+
2023-03-01 07:00:01.200 1.6
984+
2023-03-01 07:00:01.600 1.8
985+
2023-03-01 07:00:02.000 2.0
986+
2023-03-01 07:00:02.400 2.2
987+
2023-03-01 07:00:02.800 2.4
988+
2023-03-01 07:00:03.200 2.6
989+
2023-03-01 07:00:03.600 2.8
990+
2023-03-01 07:00:04.000 3.0
991+
Freq: 400L, dtype: float64
992+
993+
Note that the series erroneously increases between two anchors
994+
``07:00:00`` and ``07:00:02``.
843995
"""
844996
result = self._upsample("asfreq")
845997
return result.interpolate(

0 commit comments

Comments
 (0)