Skip to content

DOC: update the pandas.Series/DataFrame.interpolate docstring #20270

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
202 changes: 152 additions & 50 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -6081,89 +6081,191 @@ def replace(self, to_replace=None, value=None, inplace=False, limit=None,

_shared_docs['interpolate'] = """
Please note that only ``method='linear'`` is supported for
DataFrames/Series with a MultiIndex.
DataFrame/Series with a MultiIndex.

Parameters
----------
method : {'linear', 'time', 'index', 'values', 'nearest', 'zero',
'slinear', 'quadratic', 'cubic', 'barycentric', 'krogh',
'polynomial', 'spline', 'piecewise_polynomial',
'from_derivatives', 'pchip', 'akima'}
method : str, default 'linear'
Interpolation technique to use. One of:

* 'linear': ignore the index and treat the values as equally
* 'linear': Ignore the index and treat the values as equally
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally shouldn't need periods at the end of bullet points

spaced. This is the only method supported on MultiIndexes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why you added these, but generally do not put punctuation at the end of bullet points. If you get an error as a result OK to ignore

default
* 'time': interpolation works on daily and higher resolution
data to interpolate given length of interval
* 'index', 'values': use the actual numerical values of the index
* 'nearest', 'zero', 'slinear', 'quadratic', 'cubic',
'barycentric', 'polynomial' is passed to
:class:`scipy.interpolate.interp1d`. Both 'polynomial' and
'spline' require that you also specify an `order` (int),
* 'time': Works on daily and higher resolution data to interpolate
given length of interval.
* 'index', 'values': use the actual numerical values of the index.
* 'pad': Fill in NaNs using existing values.
* 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'spline',
'barycentric', 'polynomial': Passed to
``scipy.interpolate.interp1d``. Both 'polynomial' and 'spline'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have spicy.interpolate.interp1d in See Also

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do the same thing here you did for 'krogh' and move some of the implementation details down to the Notes section

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should just be single backticks no?

require that you also specify an `order` (int),
e.g. ``df.interpolate(method='polynomial', order=4)``.
These use the actual numerical values of the index.
* 'krogh', 'piecewise_polynomial', 'spline', 'pchip' and 'akima'
are all wrappers around the scipy interpolation methods of
similar names. These use the actual numerical values of the
index. For more information on their behavior, see the
`scipy documentation
<http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation>`__
and `tutorial documentation
<http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html>`__
* 'from_derivatives' refers to
:meth:`scipy.interpolate.BPoly.from_derivatives` which
These use the numerical values of the index.
* 'krogh', 'piecewise_polynomial', 'spline', 'pchip', 'akima':
Wrappers around the SciPy interpolation methods of similar
names. See `Notes`.
* 'from_derivatives': Refers to
``scipy.interpolate.BPoly.from_derivatives`` which
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single backtick too?

replaces 'piecewise_polynomial' interpolation method in
scipy 0.18
scipy 0.18.

.. versionadded:: 0.18.1

Added support for the 'akima' method.
Added interpolate method 'from_derivatives' which replaces
'piecewise_polynomial' in scipy 0.18; backwards-compatible with
scipy < 0.18

axis : {0, 1}, default 0
* 0: fill column-by-column
* 1: fill row-by-row
limit : int, default None.
Maximum number of consecutive NaNs to fill. Must be greater than 0.
limit_direction : {'forward', 'backward', 'both'}, default 'forward'
limit_area : {'inside', 'outside'}, default None
* None: (default) no fill restriction
* 'inside' Only fill NaNs surrounded by valid values (interpolate).
* 'outside' Only fill NaNs outside valid values (extrapolate).
'piecewise_polynomial' in SciPy 0.18; backwards-compatible with
SciPy < 0.18

axis : {0 or 'index', 1 or 'columns', None}, default None
Axis to interpolate along.
limit : int, optional
Maximum number of consecutive NaNs to fill. Must be greater than
0.
inplace : bool, default False
Update the data in place if possible.
limit_direction : {'forward', 'backward', 'both'}, default 'forward'
If limit is specified, consecutive NaNs will be filled in this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put back ticks around `NaN`

direction.
limit_area : {`None`, 'inside', 'outside'}, default None
If limit is specified, consecutive NaNs will be filled with this
restriction.

* None: No fill restriction.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe double backticks here to render literal value?

* 'inside': Only fill NaNs surrounded by valid values
(interpolate).
* 'outside': Only fill NaNs outside valid values (extrapolate).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to add an example for 'outside'


.. versionadded:: 0.21.0
inplace : bool, default False
Update the NDFrame in place if possible.

downcast : optional, 'infer' or None, defaults to None
Downcast dtypes if possible.
kwargs : keyword arguments to pass on to the interpolating function.
**kwargs
Keyword arguments to pass on to the interpolating function.

Returns
-------
Series or DataFrame of same shape interpolated at the NaNs
Series or DataFrame
Returns the same object type as the caller, interpolated at
some or all `NaN` values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double ticks


See Also
--------
reindex, replace, fillna
fillna : Fill missing values using different methods.
scipy.interpolate.Akima1DInterpolator : Piecewise cubic polynomials
(Akima interpolator).
scipy.interpolate.BPoly.from_derivatives : Piecewise polynomial in the
Bernstein basis.
scipy.interpolate.interp1d : Interpolate a 1-D function.
scipy.interpolate.KroghInterpolator : Interpolate polynomial (Krogh
interpolator).
scipy.interpolate.PchipInterpolator : PCHIP 1-d monotonic cubic
interpolation.
scipy.interpolate.CubicSpline : Cubic spline data interpolator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this method is referenced here, I expected it to be available just like any of the other ones, but I have combed the source code and I am unable to find the place where this method is used. Was it added because it was planned to add future support for CubicSpline (with a wrapper such as Akima's)?


Notes
-----
The 'krogh', 'piecewise_polynomial', 'spline', 'pchip' and 'akima'
methods are wrappers around the respective SciPy implementations of
similar names. These use the actual numerical values of the index.
For more information on their behavior, see the
`SciPy documentation
<http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation>`__
and `SciPy tutorial
<http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html>`__.

Examples
--------

Filling in NaNs
Filling in `NaN` in a :class:`~pandas.Series` via linear
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double

interpolation.

>>> s = pd.Series([0, 1, np.nan, 3])
>>> s
0 0.0
1 1.0
2 NaN
3 3.0
dtype: float64
>>> s.interpolate()
0 0
1 1
2 2
3 3
0 0.0
1 1.0
2 2.0
3 3.0
dtype: float64

Filling in `NaN` in a Series by padding, but filling at most two
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double ticks (here and next line)

consecutive `NaN` at a time.

>>> s = pd.Series([np.nan, "single_one", np.nan,
... "fill_two_more", np.nan, np.nan, np.nan,
... 4.71, np.nan])
>>> s
0 NaN
1 single_one
2 NaN
3 fill_two_more
4 NaN
5 NaN
6 NaN
7 4.71
8 NaN
dtype: object
>>> s.interpolate(method='pad', limit=2)
0 NaN
1 single_one
2 single_one
3 fill_two_more
4 fill_two_more
5 fill_two_more
6 NaN
7 4.71
8 4.71
dtype: object

Filling in `NaN` in a Series via polynomial interpolation or splines:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double backtick

Both `polynomial` and `spline` methods require that you also specify
an `order` (int).

>>> s = pd.Series([0, 2, np.nan, 8])
>>> s.interpolate(method='polynomial', order=2)
0 0.000000
1 2.000000
2 4.666667
3 8.000000
dtype: float64

Fill the DataFrame forward (that is, going down) along each column
using linear interpolation.

Note how the last entry in column `a` is interpolated differently,
because there is no entry after it to use for interpolation.
Note how the first entry in column `b` remains `NaN`, because there
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double backticks

is no entry befofe it to use for interpolation.

>>> df = pd.DataFrame([(0.0, np.nan, -1.0, 1.0),
... (np.nan, 2.0, np.nan, np.nan),
... (2.0, 3.0, np.nan, 9.0),
... (np.nan, 4.0, -4.0, 16.0)],
... columns=list('abcd'))
>>> df
a b c d
0 0.0 NaN -1.0 1.0
1 NaN 2.0 NaN NaN
2 2.0 3.0 NaN 9.0
3 NaN 4.0 -4.0 16.0
>>> df.interpolate(method='linear', limit_direction='forward', axis=0)
a b c d
0 0.0 NaN -1.0 1.0
1 1.0 2.0 -2.0 5.0
2 2.0 3.0 -3.0 9.0
3 2.0 4.0 -4.0 16.0

Using polynomial interpolation.

>>> df['d'].interpolate(method='polynomial', order=2)
0 1.0
1 4.0
2 9.0
3 16.0
Name: d, dtype: float64
"""

@Appender(_shared_docs['interpolate'] % _shared_doc_kwargs)
Expand Down