-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DataFrame.interpolate() extrapolates over trailing missing data #8000
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
pls show a complete reproducible example (e.g. copy-pastable code). Then if you would like to do a pull-request would be great. These examples serve as the basis for a test, which should fail w/o a fix and pass after. |
Traveling back today. I can take a look this weekend. I'd like to see what the behavior was before I refactored this stuff. |
@TomAugspurger can you circle back on this? |
OK, so this is the same behavior as back in >>> pd.__version__
>>> s = pd.Series([np.nan, 1, np.nan, 3, np.nan])
>>> s
0 NaN
1 1
2 NaN
3 3
4 NaN
dtype: float64
>>> s.interpolate()
0 NaN
1 1
2 2
3 3
4 3
dtype: float64 I'll look into adding an argument to handle the NaNs before and after. The default will have to stay the same for now, I think. Possibly switch to the "correct' default of not extrapolating later on. |
is there a work around for now? |
Any updates on this? |
@cancan101 there is a closed PR (not merge) #8010 / #8013 which I believe was almost there. If you want to rebase and see where it is would be great. |
Given that the filling of the trailing values does not follow the specified method, but just forward fills, I think we could consider this as a bug. However, of course, still a bug that people could rely upon, so not sure whether we should just change the behaviour. |
This is definitely a bug. All new panda users will find this behaviour as confusing and error-prone (as I just did). If there is a code that rely on this bug - that's mean there is a bug in that code also. You should fix it. |
Just curious if there any updates on this issue? 'Cause as in pandas 0.20.3 this is still a puzzling question. See StackOverflow. |
see might be able to close this issue |
@jreback Thanks for the link. But I just tried one of the test examples in commit
Any ideas? Should I try a newer version of pandas? EDIT:
|
yeah it looks like a typo; this change is in 0.23 would love a PR to update! |
…terpolate'] which is the docstring for pandas.core.resample.Resampler.interpolate, pandas.DataFrame.interpolate, pandas.Series.interpolate, and pandas.Panel.interpolate. Reference can be found at pandas-dev#8000
…data about the limit_area keyword argument in interpolate(). The reference can be found at pandas-dev#8000 (comment).
xref #25418 |
See also the discussion at StackOverflow.
Linear interpolation on a series with missing data at the end of the array will overwrite trailing missing values with the last non-missing value. In effect, the function extrapolates rather than strictly interpolating.
Example:
Yields (note the extrapolated 4):
not
I believe the fix is something along the lines of changing lines 1545:1546 in core/common.py from
to
The text was updated successfully, but these errors were encountered: