DOC: update the pandas.Series/DataFrame.interpolate docstring #20270

math-and-data · 2018-03-11T02:12:06Z

PR title is "DOC: update the docstring"
The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
The html version looks good: python doc/make.py --single <your-function-or-method>
It has been proofread on language by another sprint participant

Errors in the validation script:

Formatting of the 'method' param is not right (not sure how to break option list properly into multiple lines)
kwargs (to ignore)
Formatting compaints are due to extra line for "New in version ..."

################### Docstring (pandas.DataFrame.interpolate) ###################
################################################################################

Interpolate values according to different methods.

Please note that only ``method='linear'`` is supported for
DataFrames/Series with a MultiIndex.

Parameters
----------
method : {'linear', 'time', 'index', 'values', 'nearest', 'zero',
          'slinear', 'quadratic', 'cubic', 'barycentric', 'krogh',
          'polynomial', 'spline', 'piecewise_polynomial', 'pad',
          'from_derivatives', 'pchip', 'akima'}, default 'linear'
    Interpolation technique to use.

    * 'linear': Ignore the index and treat the values as equally
      spaced. This is the only method supported on MultiIndexes.
      Default.
    * 'time': Interpolation works on daily and higher resolution
      data to interpolate given length of interval.
    * 'index', 'values': use the actual numerical values of the index.
    * 'pad': Fill in NaNs using existing values.
    * 'nearest', 'zero', 'slinear', 'quadratic', 'cubic',
      'barycentric', 'polynomial': Passed to
      ``scipy.interpolate.interp1d``. Both 'polynomial' and 'spline'
      require that you also specify an `order` (int),
      e.g. df.interpolate(method='polynomial', order=4).
      These use the actual numerical values of the index.
    * 'krogh', 'piecewise_polynomial', 'spline', 'pchip', 'akima':
      Wrappers around the scipy interpolation methods of
      similar names. These use the actual numerical values of the
      index. For more information on their behavior, see the
      `scipy documentation
      <http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation>`__
      and `tutorial documentation
      <http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html>`__.
    * 'from_derivatives': Refers to
      ``scipy.intrepolate.BPoly.from_derivatives`` which
      replaces 'piecewise_polynomial' interpolation method in
      scipy 0.18.

    .. versionadded:: 0.18.1

       Added support for the 'akima' method
       Added interpolate method 'from_derivatives' which replaces
       'piecewise_polynomial' in scipy 0.18; backwards-compatible with
       scipy < 0.18

axis : {0, 1}, default 0
    Axis to interpolate along.

    * 0: Fill column-by-column.
    * 1: Fill row-by-row.
limit : int, default None
    Maximum number of consecutive NaNs to fill. Must be greater than 0.
inplace : bool, default False
    Update the data in place if possible.
limit_direction : {'forward', 'backward', 'both'}, default 'forward'
    If limit is specified, consecutive NaNs will be filled in this
    direction.
limit_area : {'inside', 'outside'}, default None
    If limit is specified, consecutive NaNs will be filled with this
    restriction.

    * None: No fill restriction (default).
    * 'inside': Only fill NaNs surrounded by valid values
      (interpolate).
    * 'outside': Only fill NaNs outside valid values (extrapolate).

    .. versionadded:: 0.21.0

downcast : optional, 'infer' or None, defaults to None
    Downcast dtypes if possible.
kwargs
    Keyword arguments to pass on to the interpolating function.

Returns
-------
Series or DataFrame
    Same-shape object interpolated at the NaN values

See Also
--------
replace : replace a value
fillna : fill missing values

Examples
--------

Filling in NaNs in a Series via linear interpolation.

>>> ser = pd.Series([0, 1, np.nan, 3])
>>> ser.interpolate()
0    0.0
1    1.0
2    2.0
3    3.0
dtype: float64

Filling in NaNs in a Series by padding, but filling at most two
consecutive NaN at a time.

>>> ser = pd.Series([np.nan, "single_one", np.nan,
...                  "fill_two_more", np.nan, np.nan, np.nan,
...                  4.71, np.nan])
>>> ser
0              NaN
1       single_one
2              NaN
3    fill_two_more
4              NaN
5              NaN
6              NaN
7             4.71
8              NaN
dtype: object
>>> ser.interpolate(method='pad', limit=2)
0              NaN
1       single_one
2       single_one
3    fill_two_more
4    fill_two_more
5    fill_two_more
6              NaN
7             4.71
8             4.71
dtype: object

Create a DataFrame with missing values.

>>> df = pd.DataFrame([[0,1,2,0,4],[1,2,3,-1,8],
...                    [2,3,4,-2,12],[3,4,5,-3,16]],
...                   columns=['a', 'b', 'c', 'd', 'e'])
>>> df
   a  b  c  d   e
0  0  1  2  0   4
1  1  2  3 -1   8
2  2  3  4 -2  12
3  3  4  5 -3  16
>>> df.loc[3,'a'] = np.nan
>>> df.loc[0,'b'] = np.nan
>>> df.loc[1,'d'] = np.nan
>>> df.loc[2,'d'] = np.nan
>>> df.loc[1,'e'] = np.nan
>>> df
     a    b  c    d     e
0  0.0  NaN  2  0.0   4.0
1  1.0  2.0  3  NaN   NaN
2  2.0  3.0  4  NaN  12.0
3  NaN  4.0  5 -3.0  16.0

Fill the DataFrame forward (that is, going down) along each column.
Note how the last entry in column `a` is interpolated differently
(because there is no entry after it to use for interpolation).
Note how the first entry in column `b` remains NA (because there
is no entry befofe it to use for interpolation).

>>> df.interpolate(method='linear', limit_direction='forward', axis=0)
     a    b  c    d     e
0  0.0  NaN  2  0.0   4.0
1  1.0  2.0  3 -1.0   8.0
2  2.0  3.0  4 -2.0  12.0
3  2.0  4.0  5 -3.0  16.0

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
        Errors in parameters section
                Parameter "method" description should start with capital letter
                Parameter "method" description should finish with "."
                Parameter "limit_area" description should finish with "."
                Parameter "kwargs" has no type

…put param

WillAyd · 2018-03-11T02:51:57Z

pandas/core/generic.py


-            * 'linear': ignore the index and treat the values as equally
+            * 'linear': Ignore the index and treat the values as equally


Generally shouldn't need periods at the end of bullet points

WillAyd · 2018-03-11T02:53:19Z

pandas/core/generic.py

-            * 'time': interpolation works on daily and higher resolution
-              data to interpolate given length of interval
-            * 'index', 'values': use the actual numerical values of the index
+              Default.


Don't need this

WillAyd · 2018-03-11T02:53:57Z

pandas/core/generic.py

-                  'polynomial', 'spline', 'piecewise_polynomial',
-                  'from_derivatives', 'pchip', 'akima'}
+                  'polynomial', 'spline', 'piecewise_polynomial', 'pad',
+                  'from_derivatives', 'pchip', 'akima'}, default 'linear'


Shouldn't need the default designation at end (implied by linear being the first value)

WillAyd · 2018-03-11T02:55:08Z

pandas/core/generic.py

-              data to interpolate given length of interval
-            * 'index', 'values': use the actual numerical values of the index
+              Default.
+            * 'time': Interpolation works on daily and higher resolution


"Interpolation works...to interpolate" seems unnecessarily verbose. Perhaps just "Works on daily and higher resolution data"?

WillAyd · 2018-03-11T02:56:14Z

pandas/core/generic.py

            * 'nearest', 'zero', 'slinear', 'quadratic', 'cubic',
-              'barycentric', 'polynomial' is passed to
+              'barycentric', 'polynomial': Passed to
              ``scipy.interpolate.interp1d``. Both 'polynomial' and 'spline'
              require that you also specify an `order` (int),
              e.g. df.interpolate(method='polynomial', order=4).


Seems better served as a dedicated example than crammed into this

WillAyd · 2018-03-11T03:04:44Z

pandas/core/generic.py


        Examples
        --------

-        Filling in NaNs
+        Filling in NaNs in a Series via linear interpolation.


:class:`~pandas.Series`

WillAyd · 2018-03-11T03:05:33Z

pandas/core/generic.py

-        1    1
-        2    2
-        3    3
+        >>> ser = pd.Series([0, 1, np.nan, 3])


Convention here is s = instead of ser =

WillAyd · 2018-03-11T03:06:47Z

pandas/core/generic.py

+        >>> ser = pd.Series([np.nan, "single_one", np.nan,
+        ...                  "fill_two_more", np.nan, np.nan, np.nan,
+        ...                  4.71, np.nan])
+        >>> ser


To save space you don't need to print the Series here - should be straightforward based off the constructor directly above it

I did not include the print for the very small series example (it was straightforward to see), but I'd like to keep this longer one if that's alright - it was encouraged so the differences can be spotted easier.

WillAyd · 2018-03-11T03:07:26Z

pandas/core/generic.py

+
+        Create a DataFrame with missing values.
+
+        >>> df = pd.DataFrame([[0,1,2,0,4],[1,2,3,-1,8],


Why not just construct with the missing values?

Mainly so people can see the "expected" Interpolation (I tried to have a pattern column-wise) and they can compare it with what actually happens, e.g. with lin. Interpolation (especially if the last entry is an NA)

WillAyd · 2018-03-11T03:07:50Z

pandas/core/generic.py

+        Fill the DataFrame forward (that is, going down) along each column.
+        Note how the last entry in column `a` is interpolated differently
+        (because there is no entry after it to use for interpolation).
+        Note how the first entry in column `b` remains NA (because there


WillAyd · 2018-03-11T03:10:11Z

Lot's of comments but just wanted to say nice job! This is one of the tougher docstrings

math-and-data · 2018-03-12T09:17:30Z

I hope to finish these great suggestions some time today.

math-and-data · 2018-03-12T22:41:32Z

Thank you for the thorough review, @WillAyd
I made some changes.

I believe the errors below can be ignored, because they relate to known issues (**kwargs, .. versionadded::, etc.)

################################################################################
################### Docstring (pandas.DataFrame.interpolate) ###################
################################################################################

Interpolate values according to different methods.

Please note that only ``method='linear'`` is supported for
DataFrames/Series with a MultiIndex.

Parameters
----------
method : {'linear', 'time', 'index', 'values', 'nearest', 'zero',
          'slinear', 'quadratic', 'cubic', 'barycentric', 'krogh',
          'polynomial', 'spline', 'piecewise_polynomial', 'pad',
          'from_derivatives', 'pchip', 'akima'}
    Interpolation technique to use.

    * 'linear': Ignore the index and treat the values as equally
      spaced. This is the only method supported on MultiIndexes.
    * 'time': Works on daily and higher resolution
      data to interpolate given length of interval.
    * 'index', 'values': use the actual numerical values of the index.
    * 'pad': Fill in NaNs using existing values.
    * 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'spline',
      'barycentric', 'polynomial': Passed to
      ``scipy.interpolate.interp1d``. Both 'polynomial' and 'spline'
      require that you also specify an `order` (int),
      e.g. df.interpolate(method='polynomial', order=4).
      These use the actual numerical values of the index.
    * 'krogh', 'piecewise_polynomial', 'spline', 'pchip', 'akima':
      Wrappers around the scipy interpolation methods of similar
      names. See `Notes`.
    * 'from_derivatives': Refers to
      ``scipy.interpolate.BPoly.from_derivatives`` which
      replaces 'piecewise_polynomial' interpolation method in
      scipy 0.18.

    .. versionadded:: 0.18.1

       Added support for the 'akima' method
       Added interpolate method 'from_derivatives' which replaces
       'piecewise_polynomial' in scipy 0.18; backwards-compatible with
       scipy < 0.18

axis : {0 or 'index', 1 or 'columns', None}, default None
    Axis to interpolate along.
limit : int, optional
    Maximum number of consecutive NaNs to fill. Must be greater than
    0.
inplace : bool, default False
    Update the data in place if possible.
limit_direction : {'forward', 'backward', 'both'}, default 'forward'
    If limit is specified, consecutive NaNs will be filled in this
    direction.
limit_area : {`None`, 'inside', 'outside'}
    If limit is specified, consecutive NaNs will be filled with this
    restriction.

    * None: No fill restriction (default).
    * 'inside': Only fill NaNs surrounded by valid values
      (interpolate).
    * 'outside': Only fill NaNs outside valid values (extrapolate).

    .. versionadded:: 0.21.0

downcast : optional, 'infer' or None, defaults to None
    Downcast dtypes if possible.
**kwargs
    Keyword arguments to pass on to the interpolating function.

Returns
-------
Series or DataFrame
    Same-shape object interpolated at the NaN values

See Also
--------
replace : replace a value
fillna : fill missing values
scipy.interpolate.Akima1DInterpolator : piecewise cubic polynomials
    (Akima interpolator)
scipy.interpolate.BPoly.from_derivatives : piecewise polynomial in the
    Bernstein basis
scipy.interpolate.interp1d : interpolate a 1-D function
scipy.interpolate.KroghInterpolator : interpolate polynomial (Krogh
    interpolator)
scipy.interpolate.PchipInterpolator : PCHIP 1-d monotonic cubic
    interpolation
scipy.interpolate.CubicSpline : cubic spline data interpolator

Notes
-----
If the selected `method` is one of 'krogh', 'piecewise_polynomial',
'spline', 'pchip', 'akima':
They are wrappers around the scipy interpolation methods of similar
names. These use the actual numerical values of the index.
For more information on their behavior, see the
`scipy documentation
<http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation>`__
and `tutorial documentation
<http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html>`__.

Examples
--------

Filling in `NaN` in a :class:`~pandas.Series` via linear
interpolation.

>>> s = pd.Series([0, 1, np.nan, 3])
>>> s.interpolate()
0    0.0
1    1.0
2    2.0
3    3.0
dtype: float64

Filling in `NaN` in a Series by padding, but filling at most two
consecutive `NaN` at a time.

>>> s = pd.Series([np.nan, "single_one", np.nan,
...                  "fill_two_more", np.nan, np.nan, np.nan,
...                  4.71, np.nan])
>>> s
0              NaN
1       single_one
2              NaN
3    fill_two_more
4              NaN
5              NaN
6              NaN
7             4.71
8              NaN
dtype: object
>>> s.interpolate(method='pad', limit=2)
0              NaN
1       single_one
2       single_one
3    fill_two_more
4    fill_two_more
5    fill_two_more
6              NaN
7             4.71
8             4.71
dtype: object

Filling in `NaN` in a Series via polynomial interpolation or splines:
Both `polynomial` and `spline` methods require that you also specify
an `order` (int).

>>> s = pd.Series([0, 2, np.nan, 8])
>>> s.interpolate(method='polynomial', order=1)
0    0.0
1    2.0
2    5.0
3    8.0
dtype: float64
>>> s.interpolate(method='polynomial', order=2)
0    0.000000
1    2.000000
2    4.666667
3    8.000000
dtype: float64

Create a :class:`~pandas.DataFrame` with missing values to fill it
with diffferent methods.

>>> df = pd.DataFrame([[0,1,2,0,4],[1,2,3,-1,8],
...                    [2,3,4,-2,12],[3,4,5,-3,16]],
...                   columns=['a', 'b', 'c', 'd', 'e'])
>>> df
   a  b  c  d   e
0  0  1  2  0   4
1  1  2  3 -1   8
2  2  3  4 -2  12
3  3  4  5 -3  16
>>> df.loc[1,'a'] = np.nan
>>> df.loc[3,'a'] = np.nan
>>> df.loc[0,'b'] = np.nan
>>> df.loc[1,'d'] = np.nan
>>> df.loc[2,'d'] = np.nan
>>> df.loc[1,'e'] = np.nan
>>> df
     a    b  c    d     e
0  0.0  NaN  2  0.0   4.0
1  NaN  2.0  3  NaN   NaN
2  2.0  3.0  4  NaN  12.0
3  NaN  4.0  5 -3.0  16.0

Fill the DataFrame forward (that is, going down) along each column.
Note how the last entry in column `a` is interpolated differently
(because there is no entry after it to use for interpolation).
Note how the first entry in column `b` remains `NaN` (because there
is no entry befofe it to use for interpolation).

>>> df.interpolate(method='linear', limit_direction='forward', axis=0)
     a    b  c    d     e
0  0.0  NaN  2  0.0   4.0
1  1.0  2.0  3 -1.0   8.0
2  2.0  3.0  4 -2.0  12.0
3  2.0  4.0  5 -3.0  16.0

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
        Errors in parameters section
                Parameters {'kwargs'} not documented
                Unknown parameters {'**kwargs'}
                Parameter "method" description should start with capital letter
                Parameter "method" description should finish with "."
                Parameter "limit_area" description should finish with "."
                Parameter "**kwargs" has no  @type

WillAyd · 2018-03-12T23:59:22Z

pandas/core/generic.py


-            * 'linear': ignore the index and treat the values as equally
+            * 'linear': Ignore the index and treat the values as equally
              spaced. This is the only method supported on MultiIndexes.


I understand why you added these, but generally do not put punctuation at the end of bullet points. If you get an error as a result OK to ignore

WillAyd · 2018-03-13T00:00:14Z

pandas/core/generic.py

+            * 'index', 'values': use the actual numerical values of the index.
+            * 'pad': Fill in NaNs using existing values.
+            * 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'spline',
+              'barycentric', 'polynomial': Passed to
              ``scipy.interpolate.interp1d``. Both 'polynomial' and 'spline'


I would do the same thing here you did for 'krogh' and move some of the implementation details down to the Notes section

WillAyd · 2018-03-13T00:00:59Z

pandas/core/generic.py

-              <http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html>`__
-            * 'from_derivatives' refers to BPoly.from_derivatives which
+            * 'krogh', 'piecewise_polynomial', 'spline', 'pchip', 'akima':
+              Wrappers around the scipy interpolation methods of similar


Use SciPy instead of scipy when referring to the package outside of code (couple other places this pops up)

WillAyd · 2018-03-13T00:01:35Z

pandas/core/generic.py

            If limit is specified, consecutive NaNs will be filled in this
            direction.
-        inplace : bool, default False
-            Update the NDFrame in place if possible.
+        limit_area : {`None`, 'inside', 'outside'}


Add ", default None" to the end here and remove the comment about it being the default below

WillAyd · 2018-03-13T00:02:17Z

pandas/core/generic.py

+            * None: No fill restriction (default).
+            * 'inside': Only fill NaNs surrounded by valid values
+              (interpolate).
+            * 'outside': Only fill NaNs outside valid values (extrapolate).


Would be good to add an example for 'outside'

WillAyd · 2018-03-13T00:04:54Z

pandas/core/generic.py

+        If the selected `method` is one of 'krogh', 'piecewise_polynomial',
+        'spline', 'pchip', 'akima':
+        They are wrappers around the scipy interpolation methods of similar
+        names. These use the actual numerical values of the index.


What does "These use the actual numerical values of the index" mean?

"These use the ~~actual~~ numerical values of the index." Better grammar?

WillAyd · 2018-03-13T00:06:04Z

pandas/core/generic.py

+        3    8.000000
+        dtype: float64
+
+        Create a :class:`~pandas.DataFrame` with missing values to fill it


You are explaining here what the below code is going to do, but not really saying what it's important. Would be better worded as "Interpolation can also be applied to DataFrames" or something to the effect

WillAyd · 2018-03-13T00:06:57Z

pandas/core/generic.py


+        >>> df = pd.DataFrame([[0,1,2,0,4],[1,2,3,-1,8],


Thought I had this comment before but just use the NA values in your constructor - no reason to instantiate the DataFrame with values and then assign them missing values after the fact.

Also make sure you put a space after every comma

I changed it so one can see how the columns get created - and we have linear values in 3 columns and quadratic on the 4th.

WillAyd · 2018-03-13T00:07:33Z

pandas/core/generic.py

+
+        Fill the DataFrame forward (that is, going down) along each column.
+        Note how the last entry in column `a` is interpolated differently
+        (because there is no entry after it to use for interpolation).


Don't need the parentheses here (nor on the next line)

WillAyd · 2018-03-13T00:09:35Z

pandas/core/generic.py


        Returns
        -------
-        Series or DataFrame of same shape interpolated at the NaNs
+        Series or DataFrame
+            Same-shape object interpolated at the NaN values


For the description here say "Returns the same object type as the caller" - that wording has been used by a few other PRs so just want to be consistent

pep8speaks · 2018-03-14T23:10:32Z

Hello @math-and-data! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 19, 2018 at 00:25 Hours UTC

codecov · 2018-03-14T23:10:34Z

Codecov Report

Merging #20270 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #20270   +/-   ##
=======================================
  Coverage   92.05%   92.05%           
=======================================
  Files         169      169           
  Lines       50713    50713           
=======================================
  Hits        46683    46683           
  Misses       4030     4030

Flag	Coverage Δ
#multiple	`90.46% <ø> (ø)`	⬆️
#single	`42.25% <ø> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/generic.py	`96.44% <ø> (ø)`	⬆️
pandas/core/series.py	`93.73% <0%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 92dcf5f...3f00e93. Read the comment docs.

math-and-data · 2018-03-14T23:31:39Z

docstring validation issues below, should be known kwarg and .

################################################################################
################### Docstring (pandas.DataFrame.interpolate) ###################
################################################################################

Interpolate values according to different methods.

Please note that only ``method='linear'`` is supported for
DataFrames/Series with a MultiIndex.

Parameters
----------
method : {'linear', 'time', 'index', 'values', 'nearest', 'zero',
          'slinear', 'quadratic', 'cubic', 'barycentric', 'krogh',
          'polynomial', 'spline', 'piecewise_polynomial', 'pad',
          'from_derivatives', 'pchip', 'akima'}
    Interpolation technique to use.

    * 'linear': Ignore the index and treat the values as equally
      spaced. This is the only method supported on MultiIndexes.
    * 'time': Works on daily and higher resolution
      data to interpolate given length of interval.
    * 'index', 'values': use the actual numerical values of the index.
    * 'pad': Fill in NaNs using existing values.
    * 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'spline',
      'barycentric', 'polynomial': Passed to
      ``scipy.interpolate.interp1d``. Both 'polynomial' and 'spline'
      require that you also specify an `order` (int),
      e.g. df.interpolate(method='polynomial', order=4).
      These use the numerical values of the index.
    * 'krogh', 'piecewise_polynomial', 'spline', 'pchip', 'akima':
      Wrappers around the SciPy interpolation methods of similar
      names. See `Notes`.
    * 'from_derivatives': Refers to
      ``scipy.interpolate.BPoly.from_derivatives`` which
      replaces 'piecewise_polynomial' interpolation method in
      scipy 0.18.

    .. versionadded:: 0.18.1

       Added support for the 'akima' method
       Added interpolate method 'from_derivatives' which replaces
       'piecewise_polynomial' in SciPy 0.18; backwards-compatible with
       SciPy < 0.18

axis : {0 or 'index', 1 or 'columns', None}, default None
    Axis to interpolate along.
limit : int, optional
    Maximum number of consecutive NaNs to fill. Must be greater than
    0.
inplace : bool, default False
    Update the data in place if possible.
limit_direction : {'forward', 'backward', 'both'}, default 'forward'
    If limit is specified, consecutive NaNs will be filled in this
    direction.
limit_area : {`None`, 'inside', 'outside'}
    If limit is specified, consecutive NaNs will be filled with this
    restriction.

    * None: No fill restriction.
    * 'inside': Only fill NaNs surrounded by valid values
      (interpolate).
    * 'outside': Only fill NaNs outside valid values (extrapolate).

    .. versionadded:: 0.21.0

downcast : optional, 'infer' or None, defaults to None
    Downcast dtypes if possible.
**kwargs
    Keyword arguments to pass on to the interpolating function.

Returns
-------
Series or DataFrame
    Returns the same object type as the caller, interpolated at
    some or all `NaN` values

See Also
--------
replace : replace a value
fillna : fill missing values
scipy.interpolate.Akima1DInterpolator : piecewise cubic polynomials
    (Akima interpolator)
scipy.interpolate.BPoly.from_derivatives : piecewise polynomial in the
    Bernstein basis
scipy.interpolate.interp1d : interpolate a 1-D function
scipy.interpolate.KroghInterpolator : interpolate polynomial (Krogh
    interpolator)
scipy.interpolate.PchipInterpolator : PCHIP 1-d monotonic cubic
    interpolation
scipy.interpolate.CubicSpline : cubic spline data interpolator

Notes
-----
The 'krogh', 'piecewise_polynomial', 'spline', 'pchip' and 'akima'
methods are wrappers around the respective SciPy implementations of
similar names. These use the actual numerical values of the index.
For more information on their behavior, see the
`SciPy documentation
<http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation>`__
and `SciPy tutorial
<http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html>`__.

Examples
--------

Filling in `NaN` in a :class:`~pandas.Series` via linear
interpolation.

>>> s = pd.Series([0, 1, np.nan, 3])
>>> s.interpolate()
0    0.0
1    1.0
2    2.0
3    3.0
dtype: float64

Filling in `NaN` in a Series by padding, but filling at most two
consecutive `NaN` at a time.

>>> s = pd.Series([np.nan, "single_one", np.nan,
...                "fill_two_more", np.nan, np.nan, np.nan,
...                4.71, np.nan])
>>> s
0              NaN
1       single_one
2              NaN
3    fill_two_more
4              NaN
5              NaN
6              NaN
7             4.71
8              NaN
dtype: object
>>> s.interpolate(method='pad', limit=2)
0              NaN
1       single_one
2       single_one
3    fill_two_more
4    fill_two_more
5    fill_two_more
6              NaN
7             4.71
8             4.71
dtype: object

Filling in `NaN` in a Series via polynomial interpolation or splines:
Both `polynomial` and `spline` methods require that you also specify
an `order` (int).

>>> s = pd.Series([0, 2, np.nan, 8])
>>> s.interpolate(method='polynomial', order=2)
0    0.000000
1    2.000000
2    4.666667
3    8.000000
dtype: float64

Filling in `NaN` in a :class:`~pandas.DataFrame` via linear
interpolation.

>>> df = pd.DataFrame({'a': range(0,4),
...                    'b': range(1,5),
...                    'c': range(-1, -5, -1),
...                    'd': [x**2 for x in range(1,5)]})
>>> df
   a  b  c   d
0  0  1 -1   1
1  1  2 -2   4
2  2  3 -3   9
3  3  4 -4  16
>>> df.loc[1,'a'] = np.nan
>>> df.loc[3,'a'] = np.nan
>>> df.loc[0,'b'] = np.nan
>>> df.loc[1,'c'] = np.nan
>>> df.loc[2,'c'] = np.nan
>>> df.loc[1,'d'] = np.nan
>>> df
     a    b    c     d
0  0.0  NaN -1.0   1.0
1  NaN  2.0  NaN   NaN
2  2.0  3.0  NaN   9.0
3  NaN  4.0 -4.0  16.0

Fill the DataFrame forward (that is, going down) along each column.
Note how the last entry in column `a` is interpolated differently,
because there is no entry after it to use for interpolation.
Note how the first entry in column `b` remains `NaN`, because there
is no entry befofe it to use for interpolation.

>>> df.interpolate(method='linear', limit_direction='forward', axis=0)
     a    b    c     d
0  0.0  NaN -1.0   1.0
1  1.0  2.0 -2.0   5.0
2  2.0  3.0 -3.0   9.0
3  2.0  4.0 -4.0  16.0

>>> df['d'].interpolate(method='polynomial', order=2)
0     1.0
1     4.0
2     9.0
3    16.0
Name: d, dtype: float64

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
        Errors in parameters section
                Parameters {'kwargs'} not documented
                Unknown parameters {'**kwargs'}
                Parameter "method" description should start with capital letter
                Parameter "method" description should finish with "."
                Parameter "limit_area" description should finish with "."
                Parameter "**kwargs" has no type

datapythonista

@WillAyd if you don't mind reviewing this one too. Made few minor changes like pep8 of the doctest, the type of method couldn't be all options, as sphinx do not let parameter types be multiline and things like this.

Thanks for the docstring @math-and-data, really good work. And sorry for the long wait.

WillAyd

IIUC needs some fixes around backtick usage

WillAyd · 2018-08-18T23:58:57Z

pandas/core/generic.py

+            * 'pad': Fill in NaNs using existing values.
+            * 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'spline',
+              'barycentric', 'polynomial': Passed to
+              ``scipy.interpolate.interp1d``. Both 'polynomial' and 'spline'


This should just be single backticks no?

WillAyd · 2018-08-18T23:59:30Z

pandas/core/generic.py

+              Wrappers around the SciPy interpolation methods of similar
+              names. See `Notes`.
+            * 'from_derivatives': Refers to
+              ``scipy.interpolate.BPoly.from_derivatives`` which


Single backtick too?

WillAyd · 2018-08-19T00:00:21Z

pandas/core/generic.py

+            If limit is specified, consecutive NaNs will be filled with this
+            restriction.
+
+            * None: No fill restriction.


Maybe double backticks here to render literal value?

WillAyd · 2018-08-19T00:00:38Z

pandas/core/generic.py

-        Series or DataFrame of same shape interpolated at the NaNs
+        Series or DataFrame
+            Returns the same object type as the caller, interpolated at
+            some or all `NaN` values


Double ticks

WillAyd · 2018-08-19T00:01:01Z

pandas/core/generic.py


        Examples
        --------
-
-        Filling in NaNs
+        Filling in `NaN` in a :class:`~pandas.Series` via linear


WillAyd · 2018-08-19T00:01:15Z

pandas/core/generic.py

        dtype: float64

+        Filling in `NaN` in a Series by padding, but filling at most two


Double ticks (here and next line)

WillAyd · 2018-08-19T00:01:37Z

pandas/core/generic.py

+        8             4.71
+        dtype: object
+
+        Filling in `NaN` in a Series via polynomial interpolation or splines:


Double backtick

WillAyd · 2018-08-19T00:01:54Z

pandas/core/generic.py

+
+        Note how the last entry in column `a` is interpolated differently,
+        because there is no entry after it to use for interpolation.
+        Note how the first entry in column `b` remains `NaN`, because there


Double backticks

datapythonista · 2018-08-19T00:27:23Z

Yep, agree. I think they should be all right now. Thanks @WillAyd !

WillAyd · 2018-08-19T00:27:56Z

pandas/core/generic.py

-        an `order` (int).
+        Filling in ``NaN`` in a Series via polynomial interpolation or splines:
+        Both 'polynomial' and 'spline' methods require that you also specify
+        an ``order`` (int).


Parameters should be single backticks - double is only for literals and code samples I think

My understanding is that double backticks is for code, including parts like a single variable None, an assignment foo=1... Single backticks is for things that you can refer (link) to, like a function, class, module... And for values just quotes.

For an argument, I'd consider it more code, that something you can link to. That's why I added double backticks. But it's very subtle, I'd be happy with any option (no quoting, single backticks, double backticks and quotes).

Does this make sense?

Yep thanks. I think I've seen other instances where parameters are in single backticks but this is nuanced enough that it shouldn't hold up the PR - can be part of a larger conversation.

I added a bullet point to #20298 to decide a standard for these cases. I think at the moment there is not much consistency.

WillAyd · 2018-08-19T00:39:11Z

Thanks @math-and-data and @datapythonista !

…-dev#20270)

khyox · 2020-03-31T05:51:35Z

pandas/core/generic.py

+            interpolator).
+        scipy.interpolate.PchipInterpolator : PCHIP 1-d monotonic cubic
+            interpolation.
+        scipy.interpolate.CubicSpline : Cubic spline data interpolator.


As this method is referenced here, I expected it to be available just like any of the other ones, but I have combed the source code and I am unable to find the place where this method is used. Was it added because it was planned to add future support for CubicSpline (with a wrapper such as Akima's)?

math-and-data added 3 commits March 11, 2018 01:02

DOC: update pd.Series/DataFrame.interpolate

b39289f

DOC: update pd.Series/DataFrame.interpolate, removed whitespace

0f5f666

DOC: update pd.Series/DataFrame.interpolate, new allowed value for in…

5358af9

…put param

WillAyd requested changes Mar 11, 2018

View reviewed changes

jorisvandenbossche added Docs and removed Docs labels Mar 11, 2018

math-and-data added 3 commits March 12, 2018 23:45

DOC: pandas.DataFrame.interpolate incorporated recommended channges

6cee318

DOC: pandas.DataFrame.interpolnew examples

ba8cfd2

DOC: pandas.DataFrame.interpol example update

272d5e2

WillAyd requested changes Mar 13, 2018

View reviewed changes

DOC> pandas.DataFrame.interpolate - implemented feedback

d64f5f3

math-and-data added 2 commits March 15, 2018 00:24

DOC: pandas.DataFrame.interpolate - implemented feedback

0734c3b

DOC: pandas.DataFrame.interpolate - sneaky whitespace

1eab0a8

datapythonista self-assigned this Aug 18, 2018

datapythonista added 3 commits August 19, 2018 00:21

Minor fixes to interpolate docstring

3ca95ec

Merging from master

b123846

Last minor fixes

3af4306

datapythonista approved these changes Aug 18, 2018

View reviewed changes

WillAyd requested changes Aug 19, 2018

View reviewed changes

Fixing quotes and backticks

3f00e93

WillAyd requested changes Aug 19, 2018

View reviewed changes

WillAyd approved these changes Aug 19, 2018

View reviewed changes

WillAyd merged commit 8bb2cc1 into pandas-dev:master Aug 19, 2018

datapythonista added this to the 0.24.0 milestone Aug 19, 2018

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

DOC: update the pandas.Series/DataFrame.interpolate docstring (pandas…

91a1c0d

…-dev#20270)

khyox reviewed Mar 31, 2020

View reviewed changes

khyox mentioned this pull request Apr 20, 2020

Solve missing interpolation method (cubicspline) #33670

Merged

5 tasks


		* 'linear': ignore the index and treat the values as equally
		* 'linear': Ignore the index and treat the values as equally


		Create a DataFrame with missing values.

		>>> df = pd.DataFrame([[0,1,2,0,4],[1,2,3,-1,8],

		dtype: float64

		Filling in `NaN` in a Series by padding, but filling at most two

DOC: update the pandas.Series/DataFrame.interpolate docstring #20270

DOC: update the pandas.Series/DataFrame.interpolate docstring #20270

Conversation

math-and-data commented Mar 11, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd commented Mar 11, 2018

math-and-data commented Mar 12, 2018

math-and-data commented Mar 12, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

math-and-data Mar 14, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pep8speaks commented Mar 14, 2018 • edited Loading

Comment last updated on August 19, 2018 at 00:25 Hours UTC

codecov bot commented Mar 14, 2018 • edited Loading

Codecov Report

math-and-data commented Mar 14, 2018

datapythonista left a comment

Choose a reason for hiding this comment

WillAyd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

datapythonista commented Aug 19, 2018

WillAyd Aug 19, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd commented Aug 19, 2018

Choose a reason for hiding this comment

math-and-data commented Mar 12, 2018 •

edited

Loading

math-and-data Mar 14, 2018 •

edited

Loading

pep8speaks commented Mar 14, 2018 •

edited

Loading

codecov bot commented Mar 14, 2018 •

edited

Loading

WillAyd Aug 19, 2018 •

edited

Loading