-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: Update interpolation docstring #49681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Update interpolation docstring #49681
Conversation
…ifferences between pandas and scipy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, I just have two questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One test is failing else looks good. I can merge after you get all ci tests green:-)
0a083c8
to
bf44925
Compare
@topper-123 Now everything is green :) |
Thanks @milosz-martynow |
* DOC: Update interpolation docstring to highlight slinear and spline differences between pandas and scipy. * Fix typos. * Shorter docstring comment.
* DOC: Update interpolation docstring to highlight slinear and spline differences between pandas and scipy. * Fix typos. * Shorter docstring comment.
DOC: Update interpolation docstring to highlight slinear and spline differences between pandas and scipy.
Scipy vs Pandas interpolation documentation missmatching
We can find information in Pandas documentation that, Pandas interpolation module on both - Series and on DataFrames, pases input data to Scipy
scipy.interpolate.interp1d
method. We can read about method paramter:'nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘spline’, ‘barycentric’, ‘polynomial’: Passed to scipy.interpolate.interp1d. These methods use the numerical values of the index. Both ‘polynomial’ and ‘spline’ require that you also specify an order (int), e.g. df.interpolate(method='polynomial', order=5).
Thus, user might think that Pandas, uses methods of
scipy.interpolate.interp1d
to all interpolation methods. But as it shown in below code, this is true forslinear
and not forspline
. This is also visible in thepandas.core.missing._interpolate_scipy_wrapper
function definition, where we can find thatspline
in fact usesinterpolate.UnivariateSpline
to predict gaps in the data.This might confuse end user because in scipy.interpolate.interp1d documentation might be found that
slinear
method refer to a spline interpolation of first orger. At the same time, end user can read in pandas.Series.interpolate thatspline
require that you also specify an order (int), e.g 1, as in this work.Summarizing:
slinear
refers to Scipyslinear
, but Scipyslinear
refers to first orderspline
method with order 1, which is defined separatley in Pandas.pandas.core.missing._interpolate_scipy_wrapper
points that Pandas does not use for interpolationscipy.interpolate.interp1d
forspline
as forslinear
. Pandas usesinterpolate.UnivariateSpline
forspline
interpolation, what is not pointed in the pandas.Series.interpolate documentation.Proposed solution:
Workspace - Jupyter notebook used to highlight the differences between Scipy and Pandas interpolation methods:
interpolation_comparison.zip
Visualization of differences between Scipy and Pandas interpolations.


CSV files with plot data.
case_1_comparison.csv
case_2_comparison.csv