DataFrame.interpolate() extrapolates over trailing missing data #8000

grahamjeffries · 2014-08-12T13:46:04Z

See also the discussion at StackOverflow.

Linear interpolation on a series with missing data at the end of the array will overwrite trailing missing values with the last non-missing value. In effect, the function extrapolates rather than strictly interpolating.

Example:

import pandas as pd
import numpy as np

a = pd.Series([np.nan, 1, np.nan, 3, np.nan])
a.interpolate()

Yields (note the extrapolated 4):

0   NaN
1     1
2     2
3     3
4     4
5     4
dtype: float64

not

0   NaN
1     1
2     2
3     3
4     4
5     NaN
dtype: float64

I believe the fix is something along the lines of changing lines 1545:1546 in core/common.py from

result[firstIndex:][invalid] = np.interp(inds[invalid], inds[valid], yvalues[firstIndex:][valid])

to

result[firstIndex:][invalid] = np.interp(inds[invalid], inds[valid], yvalues[firstIndex:][valid], np.nan, np.nan)

The text was updated successfully, but these errors were encountered:

jreback · 2014-08-12T14:00:01Z

pls show a complete reproducible example (e.g. copy-pastable code).

Then if you would like to do a pull-request would be great. These examples serve as the basis for a test, which should fail w/o a fix and pass after.

TomAugspurger · 2014-08-14T08:00:02Z

Traveling back today. I can take a look this weekend.

I'd like to see what the behavior was before I refactored this stuff.

jreback · 2014-09-09T23:45:19Z

@TomAugspurger can you circle back on this?

TomAugspurger · 2014-09-10T15:47:09Z

OK, so this is the same behavior as back in 0.11 before I refactored all the interpolate stuff.

>>> pd.__version__
>>> s = pd.Series([np.nan, 1, np.nan, 3, np.nan])
>>> s
0   NaN
1     1
2   NaN
3     3
4   NaN
dtype: float64
>>> s.interpolate()
0   NaN
1     1
2     2
3     3
4     3
dtype: float64

I'll look into adding an argument to handle the NaNs before and after. The default will have to stay the same for now, I think. Possibly switch to the "correct' default of not extrapolating later on.

Jezzamonn · 2015-12-17T03:07:44Z

is there a work around for now?

jluttine · 2015-12-17T08:19:53Z

@Jezzamonn One workaround solution: http://stackoverflow.com/questions/25255496/dataframe-interpolate-extrapolates-over-trailing-missing-data/33390872#33390872

cancan101 · 2016-03-30T22:47:21Z

Any updates on this?

jreback · 2016-03-30T22:55:38Z

@cancan101 there is a closed PR (not merge) #8010 / #8013 which I believe was almost there. If you want to rebase and see where it is would be great.

jorisvandenbossche · 2017-02-09T12:59:42Z

Given that the filling of the trailing values does not follow the specified method, but just forward fills, I think we could consider this as a bug. However, of course, still a bug that people could rely upon, so not sure whether we should just change the behaviour.

relonger · 2017-11-12T14:04:44Z

This is definitely a bug. All new panda users will find this behaviour as confusing and error-prone (as I just did). If there is a code that rely on this bug - that's mean there is a bug in that code also. You should fix it.
Interpolate - means interpolate, not extrapolate in any way.

jreback · 2017-11-12T14:49:15Z

You should fix it.

@relonger welcome to have a PR for this.

this PR actually does provide for this option: #16513

welcome to have a look at it, seems stalled.

willweil · 2019-02-20T18:18:58Z

Just curious if there any updates on this issue? 'Cause as in pandas 0.20.3 this is still a puzzling question. See StackOverflow.

jreback · 2019-02-20T18:21:26Z

see
35812ea

might be able to close this issue

willweil · 2019-02-20T19:03:34Z

@jreback Thanks for the link. But I just tried one of the test examples in commit 35812ea and I didn't get the expected result as in the test:

>>> pd.__version__
 '0.20.3'
>>> s = pd.Series([nan, nan, 3, nan, nan, nan, 7, nan, nan])
>>> s
0    NaN
1    NaN
2    3.0
3    NaN
4    NaN
5    NaN
6    7.0
7    NaN
8    NaN
dtype: float64
>>> s.interpolate(method='linear', limit_area='inside')
0    NaN
1    NaN
2    3.0
3    4.0
4    5.0
5    6.0
6    7.0
7    7.0
8    7.0
dtype: float64

Any ideas? Should I try a newer version of pandas?

EDIT:
Also tried in a newer version of pandas '0.22.0' but still didn't get the expected results. The pandas document says the "limit_area" is new feature in version 0.21.0+. Any ideas?

>>> pd.__version__
'0.22.0'

willweil · 2019-02-20T23:46:04Z

@jreback UPDATE: limit_area works as expected in pandas 0.23.0+, but not in 0.21.0 or 0.22.0. Maybe the pandas document has a typo as it marks limit_area as "New in version 0.21.0."?

jreback · 2019-02-20T23:51:54Z

yeah it looks like a typo; this change is in 0.23

would love a PR to update!

…terpolate'] which is the docstring for pandas.core.resample.Resampler.interpolate, pandas.DataFrame.interpolate, pandas.Series.interpolate, and pandas.Panel.interpolate. Reference can be found at pandas-dev#8000

…data about the limit_area keyword argument in interpolate(). The reference can be found at pandas-dev#8000 (comment).

simonjayhawkins · 2019-07-13T16:49:49Z

yeah it looks like a typo; this change is in 0.23

would love a PR to update!

xref #25418

jreback added Bug labels Aug 12, 2014

jreback added this to the 0.15.0 milestone Aug 12, 2014

This was referenced Aug 12, 2014

fix issue #8000 #8010

Closed

fix issue #8000 - interpolation extrapolates over trailing missing values #8013

Closed

jreback modified the milestones: 0.15.1, 0.15.0 Sep 9, 2014

jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015

shoyer mentioned this issue Aug 30, 2015

API: Interpolate at new values #9340

Open

jreback added Difficulty Intermediate labels Mar 30, 2016

jorisvandenbossche mentioned this issue Feb 9, 2017

pd.Series interpolate with method='time' returns inconsistent results for first or last NaN #15356

Closed

jreback mentioned this issue Nov 12, 2017

ENH: interpolate.limit_area() 16284 #16513

Closed

4 tasks

willweil added a commit to willweil/pandas that referenced this issue Feb 22, 2019

Correct a typo of version number in documentation/user_guide/missing_…

b93be78

…data about the limit_area keyword argument in interpolate(). The reference can be found at pandas-dev#8000 (comment).

simonjayhawkins closed this as completed Jul 13, 2019

typorian mentioned this issue Feb 13, 2020

pandas.Dataframe.interpolate() does not extrapolate even if it is asked to, depending on interpolation method #31949

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFrame.interpolate() extrapolates over trailing missing data #8000

DataFrame.interpolate() extrapolates over trailing missing data #8000

grahamjeffries commented Aug 12, 2014

jreback commented Aug 12, 2014

TomAugspurger commented Aug 14, 2014

jreback commented Sep 9, 2014

TomAugspurger commented Sep 10, 2014

Jezzamonn commented Dec 17, 2015

jluttine commented Dec 17, 2015

cancan101 commented Mar 30, 2016

jreback commented Mar 30, 2016

jorisvandenbossche commented Feb 9, 2017

relonger commented Nov 12, 2017

jreback commented Nov 12, 2017

willweil commented Feb 20, 2019

jreback commented Feb 20, 2019

willweil commented Feb 20, 2019 •

edited

Loading

willweil commented Feb 20, 2019

jreback commented Feb 20, 2019

simonjayhawkins commented Jul 13, 2019

DataFrame.interpolate() extrapolates over trailing missing data #8000

DataFrame.interpolate() extrapolates over trailing missing data #8000

Comments

grahamjeffries commented Aug 12, 2014

jreback commented Aug 12, 2014

TomAugspurger commented Aug 14, 2014

jreback commented Sep 9, 2014

TomAugspurger commented Sep 10, 2014

Jezzamonn commented Dec 17, 2015

jluttine commented Dec 17, 2015

cancan101 commented Mar 30, 2016

jreback commented Mar 30, 2016

jorisvandenbossche commented Feb 9, 2017

relonger commented Nov 12, 2017

jreback commented Nov 12, 2017

willweil commented Feb 20, 2019

jreback commented Feb 20, 2019

willweil commented Feb 20, 2019 • edited Loading

willweil commented Feb 20, 2019

jreback commented Feb 20, 2019

simonjayhawkins commented Jul 13, 2019

willweil commented Feb 20, 2019 •

edited

Loading