-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
fix issue #8000 - interpolation extrapolates over trailing missing values #8013
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit changes `np.interp()` arguments to include the default values of the left and right parameters as np.nan. In effect, when pandas interpolates a Series with trailing missing data, missing data values are preserved rather than being overwritten with the default value (last non-missing value).
Added a test that confirms that linear interpolation of a Series does not extrapolate over missing data that trails the last known value.
@grahamjeffries see my comment #8000 (comment) Any interest in doing a fix? I don't think we need to add Right now we can just keep it as I can take this if you aren't able/interested. |
@TomAugspurger, I'd like to try my hand at this fix. It's my first though so I'll ask for your patience and close review. I'll get to this sometime next week, I suspect |
No rush at all. This has been "broken" for at least a year :) |
@grahamjeffries want to revist this? |
@grahamjeffries can you revisit this? |
I won't be able to get to this for at least a month. If there's someone else willing and able to make the fix, I'd encourage them to do so. Otherwise, I'll make an effort at that point. |
closing as stale, but if you would like to reopen and fixup pls do. |
@jreback What is missing from this PR to get it merged? It seems like a bare minimum:
|
you could add an |
This pull request is in response to issue #8000.
Changes to core/common.py add np.nan as the default value for missing values to the left and right non-missing values during interpolation. This prevents DataFrame.interpolate() from extrapolating the last non-missing value over all trailing missing values (the default).
Changes to tests/test_generic.py add test coverage to the above change. A passing test is where an interpolated series with a trailing missing value maintains that trailing missing value after interpolation.