Skip to content

DOC: Update interpolation docstring #49681

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

milosz-martynow
Copy link
Contributor

DOC: Update interpolation docstring to highlight slinear and spline differences between pandas and scipy.

Scipy vs Pandas interpolation documentation missmatching

We can find information in Pandas documentation that, Pandas interpolation module on both - Series and on DataFrames, pases input data to Scipy scipy.interpolate.interp1d method. We can read about method paramter:

'nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘spline’, ‘barycentric’, ‘polynomial’: Passed to scipy.interpolate.interp1d. These methods use the numerical values of the index. Both ‘polynomial’ and ‘spline’ require that you also specify an order (int), e.g. df.interpolate(method='polynomial', order=5).

Thus, user might think that Pandas, uses methods of scipy.interpolate.interp1d to all interpolation methods. But as it shown in below code, this is true for slinear and not for spline. This is also visible in the pandas.core.missing._interpolate_scipy_wrapper function definition, where we can find that splinein fact uses interpolate.UnivariateSpline to predict gaps in the data.

This might confuse end user because in scipy.interpolate.interp1d documentation might be found that slinear method refer to a spline interpolation of first orger. At the same time, end user can read in pandas.Series.interpolate that spline require that you also specify an order (int), e.g 1, as in this work.

Summarizing:

  1. Pandas slinear refers to Scipy slinear, but Scipy slinear refers to first order spline method with order 1, which is defined separatley in Pandas.
  2. pandas.core.missing._interpolate_scipy_wrapper points that Pandas does not use for interpolation scipy.interpolate.interp1d for spline as for slinear. Pandas uses interpolate.UnivariateSpline for spline interpolation, what is not pointed in the pandas.Series.interpolate documentation.
  3. Scipy slinear and linear give the same results.
  4. Pandas slinear gives the same results as Scipy slinear
  5. Pandas slinear does not gives the same results as Pandas first order spline, to which Scipy slinear refers to.

Proposed solution:

  1. Update the docstring of interpolation method.

Workspace - Jupyter notebook used to highlight the differences between Scipy and Pandas interpolation methods:
interpolation_comparison.zip

Visualization of differences between Scipy and Pandas interpolations.
pandas_int case_2_comparison.csv erpolation
scipy_interpolation

CSV files with plot data.
case_1_comparison.csv
case_2_comparison.csv

Copy link
Contributor

@topper-123 topper-123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, I just have two questions.

@topper-123 topper-123 added Docs Numeric Operations Arithmetic, Comparison, and Logical operations labels Nov 13, 2022
@topper-123 topper-123 added this to the 2.0 milestone Nov 13, 2022
Copy link
Contributor

@topper-123 topper-123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One test is failing else looks good. I can merge after you get all ci tests green:-)

@milosz-martynow milosz-martynow force-pushed the pandas-vs-scipy-interpolation-naming-clarification branch from 0a083c8 to bf44925 Compare November 14, 2022 18:57
@milosz-martynow
Copy link
Contributor Author

@topper-123 Now everything is green :)

@topper-123 topper-123 merged commit 1f6e44a into pandas-dev:main Nov 14, 2022
@topper-123
Copy link
Contributor

Thanks @milosz-martynow

MarcoGorelli pushed a commit to MarcoGorelli/pandas that referenced this pull request Nov 18, 2022
* DOC: Update interpolation docstring to highlight slinear and spline differences between pandas and scipy.

* Fix typos.

* Shorter docstring comment.
mliu08 pushed a commit to mliu08/pandas that referenced this pull request Nov 27, 2022
* DOC: Update interpolation docstring to highlight slinear and spline differences between pandas and scipy.

* Fix typos.

* Shorter docstring comment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants