Skip to content

fix issue #8000 - interpolation extrapolates over trailing missing values #8013

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Conversation

grahamjeffries
Copy link
Contributor

This pull request is in response to issue #8000.

Changes to core/common.py add np.nan as the default value for missing values to the left and right non-missing values during interpolation. This prevents DataFrame.interpolate() from extrapolating the last non-missing value over all trailing missing values (the default).

Changes to tests/test_generic.py add test coverage to the above change. A passing test is where an interpolated series with a trailing missing value maintains that trailing missing value after interpolation.

This commit changes `np.interp()` arguments to include the default values of the left and right parameters as np.nan. In effect, when pandas interpolates a Series with trailing missing data, missing data values are preserved rather than being overwritten with the default value (last non-missing value).
Added a test that confirms that linear interpolation of a Series does not extrapolate over missing data that trails the last known value.
@grahamjeffries grahamjeffries mentioned this pull request Aug 13, 2014
@jreback jreback added this to the 0.15.0 milestone Aug 13, 2014
@jreback jreback modified the milestones: 0.15.1, 0.15.0 Sep 9, 2014
@TomAugspurger
Copy link
Contributor

@grahamjeffries see my comment #8000 (comment)

Any interest in doing a fix? I don't think we need to add left and right kwds to .interpolate() like np.interp has. We already have .fillna which is better for these things. I'd say we can add a keyword like extrapolate or extend or something like that to control whether things get filled on.

Right now we can just keep it as extrapolate=True (for compatibility), but warn that it may change in the future and recommend that people set it to false and fillna after interpolating.

I can take this if you aren't able/interested.

@grahamjeffries
Copy link
Contributor Author

@TomAugspurger, I'd like to try my hand at this fix. It's my first though so I'll ask for your patience and close review. I'll get to this sometime next week, I suspect

@TomAugspurger
Copy link
Contributor

No rush at all. This has been "broken" for at least a year :)

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 2, 2015
@jreback
Copy link
Contributor

jreback commented Apr 8, 2015

@grahamjeffries want to revist this?

@jreback
Copy link
Contributor

jreback commented May 9, 2015

@grahamjeffries can you revisit this?

@grahamjeffries
Copy link
Contributor Author

I won't be able to get to this for at least a month. If there's someone else willing and able to make the fix, I'd encourage them to do so. Otherwise, I'll make an effort at that point.

@jreback
Copy link
Contributor

jreback commented Jul 28, 2015

closing as stale, but if you would like to reopen and fixup pls do.

@cancan101
Copy link
Contributor

@jreback What is missing from this PR to get it merged? It seems like a bare minimum:

  • add extrapolate argument that defaults to False. Potentially warn if the user does not set.

@jreback
Copy link
Contributor

jreback commented Apr 4, 2016

you could add an extrapolate kw, and set it to None for now (which will do nothing ATM, and not warn). Then in 0.19 could do a warning if its not set. So not much really to fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants