-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: update the pandas.Series/DataFrame.interpolate docstring #20270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
b39289f
0f5f666
5358af9
6cee318
ba8cfd2
272d5e2
d64f5f3
0734c3b
1eab0a8
3ca95ec
b123846
3af4306
3f00e93
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5257,32 +5257,35 @@ def replace(self, to_replace=None, value=None, inplace=False, limit=None, | |
---------- | ||
method : {'linear', 'time', 'index', 'values', 'nearest', 'zero', | ||
'slinear', 'quadratic', 'cubic', 'barycentric', 'krogh', | ||
'polynomial', 'spline', 'piecewise_polynomial', | ||
'from_derivatives', 'pchip', 'akima'} | ||
'polynomial', 'spline', 'piecewise_polynomial', 'pad', | ||
'from_derivatives', 'pchip', 'akima'}, default 'linear' | ||
Interpolation technique to use. | ||
|
||
* 'linear': ignore the index and treat the values as equally | ||
* 'linear': Ignore the index and treat the values as equally | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Generally shouldn't need periods at the end of bullet points |
||
spaced. This is the only method supported on MultiIndexes. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I understand why you added these, but generally do not put punctuation at the end of bullet points. If you get an error as a result OK to ignore |
||
default | ||
* 'time': interpolation works on daily and higher resolution | ||
data to interpolate given length of interval | ||
* 'index', 'values': use the actual numerical values of the index | ||
Default. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't need this |
||
* 'time': Interpolation works on daily and higher resolution | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "Interpolation works...to interpolate" seems unnecessarily verbose. Perhaps just "Works on daily and higher resolution data"? |
||
data to interpolate given length of interval. | ||
* 'index', 'values': use the actual numerical values of the index. | ||
* 'pad': Fill in NaNs using existing values. | ||
* 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', | ||
'barycentric', 'polynomial' is passed to | ||
'barycentric', 'polynomial': Passed to | ||
``scipy.interpolate.interp1d``. Both 'polynomial' and 'spline' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should have There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would do the same thing here you did for 'krogh' and move some of the implementation details down to the Notes section There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should just be single backticks no? |
||
require that you also specify an `order` (int), | ||
e.g. df.interpolate(method='polynomial', order=4). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems better served as a dedicated example than crammed into this |
||
These use the actual numerical values of the index. | ||
* 'krogh', 'piecewise_polynomial', 'spline', 'pchip' and 'akima' | ||
are all wrappers around the scipy interpolation methods of | ||
* 'krogh', 'piecewise_polynomial', 'spline', 'pchip', 'akima': | ||
Wrappers around the scipy interpolation methods of | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add SciPy methods to See Also. Move some of this information into Notes |
||
similar names. These use the actual numerical values of the | ||
index. For more information on their behavior, see the | ||
`scipy documentation | ||
<http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation>`__ | ||
and `tutorial documentation | ||
<http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html>`__ | ||
* 'from_derivatives' refers to BPoly.from_derivatives which | ||
<http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html>`__. | ||
* 'from_derivatives': Refers to | ||
``scipy.intrepolate.BPoly.from_derivatives`` which | ||
replaces 'piecewise_polynomial' interpolation method in | ||
scipy 0.18 | ||
scipy 0.18. | ||
|
||
.. versionadded:: 0.18.1 | ||
|
||
|
@@ -5292,46 +5295,120 @@ def replace(self, to_replace=None, value=None, inplace=False, limit=None, | |
scipy < 0.18 | ||
|
||
axis : {0, 1}, default 0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
* 0: fill column-by-column | ||
* 1: fill row-by-row | ||
limit : int, default None. | ||
Axis to interpolate along. | ||
|
||
* 0: Fill column-by-column. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can remove these sub-bullets |
||
* 1: Fill row-by-row. | ||
limit : int, default None | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since |
||
Maximum number of consecutive NaNs to fill. Must be greater than 0. | ||
inplace : bool, default False | ||
Update the data in place if possible. | ||
limit_direction : {'forward', 'backward', 'both'}, default 'forward' | ||
If limit is specified, consecutive NaNs will be filled in this | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Put back ticks around `NaN` |
||
direction. | ||
limit_area : {'inside', 'outside'}, default None | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. {`None`, 'inside', 'outside'} |
||
* None: (default) no fill restriction | ||
* 'inside' Only fill NaNs surrounded by valid values (interpolate). | ||
* 'outside' Only fill NaNs outside valid values (extrapolate). | ||
If limit is specified, consecutive NaNs will be filled with this | ||
restriction. | ||
|
||
* None: No fill restriction (default). | ||
* 'inside': Only fill NaNs surrounded by valid values | ||
(interpolate). | ||
* 'outside': Only fill NaNs outside valid values (extrapolate). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would be good to add an example for 'outside' |
||
|
||
.. versionadded:: 0.21.0 | ||
|
||
If limit is specified, consecutive NaNs will be filled in this | ||
direction. | ||
inplace : bool, default False | ||
Update the NDFrame in place if possible. | ||
downcast : optional, 'infer' or None, defaults to None | ||
Downcast dtypes if possible. | ||
kwargs : keyword arguments to pass on to the interpolating function. | ||
kwargs | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. **kwargs |
||
Keyword arguments to pass on to the interpolating function. | ||
|
||
Returns | ||
------- | ||
Series or DataFrame of same shape interpolated at the NaNs | ||
Series or DataFrame | ||
Same-shape object interpolated at the NaN values | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the description here say "Returns the same object type as the caller" - that wording has been used by a few other PRs so just want to be consistent |
||
|
||
See Also | ||
-------- | ||
reindex, replace, fillna | ||
replace : replace a value | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See comments above - so much can be added here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
fillna : fill missing values | ||
|
||
Examples | ||
-------- | ||
|
||
Filling in NaNs | ||
Filling in NaNs in a Series via linear interpolation. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. :class:`~pandas.Series` |
||
|
||
>>> s = pd.Series([0, 1, np.nan, 3]) | ||
>>> s.interpolate() | ||
0 0 | ||
1 1 | ||
2 2 | ||
3 3 | ||
>>> ser = pd.Series([0, 1, np.nan, 3]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Convention here is |
||
>>> ser.interpolate() | ||
0 0.0 | ||
1 1.0 | ||
2 2.0 | ||
3 3.0 | ||
dtype: float64 | ||
|
||
Filling in NaNs in a Series by padding, but filling at most two | ||
consecutive NaN at a time. | ||
|
||
>>> ser = pd.Series([np.nan, "single_one", np.nan, | ||
... "fill_two_more", np.nan, np.nan, np.nan, | ||
... 4.71, np.nan]) | ||
>>> ser | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To save space you don't need to print the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did not include the print for the very small series example (it was straightforward to see), but I'd like to keep this longer one if that's alright - it was encouraged so the differences can be spotted easier. |
||
0 NaN | ||
1 single_one | ||
2 NaN | ||
3 fill_two_more | ||
4 NaN | ||
5 NaN | ||
6 NaN | ||
7 4.71 | ||
8 NaN | ||
dtype: object | ||
>>> ser.interpolate(method='pad', limit=2) | ||
0 NaN | ||
1 single_one | ||
2 single_one | ||
3 fill_two_more | ||
4 fill_two_more | ||
5 fill_two_more | ||
6 NaN | ||
7 4.71 | ||
8 4.71 | ||
dtype: object | ||
|
||
Create a DataFrame with missing values. | ||
|
||
>>> df = pd.DataFrame([[0,1,2,0,4],[1,2,3,-1,8], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not just construct with the missing values? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Mainly so people can see the "expected" Interpolation (I tried to have a pattern column-wise) and they can compare it with what actually happens, e.g. with lin. Interpolation (especially if the last entry is an NA) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thought I had this comment before but just use the NA values in your constructor - no reason to instantiate the DataFrame with values and then assign them missing values after the fact. Also make sure you put a space after every comma There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I changed it so one can see how the columns get created - and we have linear values in 3 columns and quadratic on the 4th. |
||
... [2,3,4,-2,12],[3,4,5,-3,16]], | ||
... columns=['a', 'b', 'c', 'd', 'e']) | ||
>>> df | ||
a b c d e | ||
0 0 1 2 0 4 | ||
1 1 2 3 -1 8 | ||
2 2 3 4 -2 12 | ||
3 3 4 5 -3 16 | ||
>>> df.loc[3,'a'] = np.nan | ||
>>> df.loc[0,'b'] = np.nan | ||
>>> df.loc[1,'d'] = np.nan | ||
>>> df.loc[2,'d'] = np.nan | ||
>>> df.loc[1,'e'] = np.nan | ||
>>> df | ||
a b c d e | ||
0 0.0 NaN 2 0.0 4.0 | ||
1 1.0 2.0 3 NaN NaN | ||
2 2.0 3.0 4 NaN 12.0 | ||
3 NaN 4.0 5 -3.0 16.0 | ||
|
||
Fill the DataFrame forward (that is, going down) along each column. | ||
Note how the last entry in column `a` is interpolated differently | ||
(because there is no entry after it to use for interpolation). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't need the parentheses here (nor on the next line) |
||
Note how the first entry in column `b` remains NA (because there | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
is no entry befofe it to use for interpolation). | ||
|
||
>>> df.interpolate(method='linear', limit_direction='forward', axis=0) | ||
a b c d e | ||
0 0.0 NaN 2 0.0 4.0 | ||
1 1.0 2.0 3 -1.0 8.0 | ||
2 2.0 3.0 4 -2.0 12.0 | ||
3 2.0 4.0 5 -3.0 16.0 | ||
""" | ||
|
||
@Appender(_shared_docs['interpolate'] % _shared_doc_kwargs) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't need the default designation at end (implied by
linear
being the first value)