Skip to content

Fix interpolate -limit Add interpolate limit_direction='inside' #16307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed

Fix interpolate -limit Add interpolate limit_direction='inside' #16307

wants to merge 6 commits into from

Conversation

WBare
Copy link
Contributor

@WBare WBare commented May 9, 2017

@@ -20,6 +20,12 @@ Check the :ref:`API Changes <whatsnew_0210.api_breaking>` and :ref:`deprecations
New features
~~~~~~~~~~~~

- DataFrame.interpolate() has a new setting: limit_direction='inside'.
This will cause the interpolation to fill missing values only when
the missing value is surounded by valid values. It is useful when
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

surrounded

@jreback
Copy link
Contributor

jreback commented May 10, 2017

pls add tests for any changes.

@WBare
Copy link
Contributor Author

WBare commented May 12, 2017 via email

@WBare
Copy link
Contributor Author

WBare commented May 17, 2017

I've added required tests for the limit direction feature. I see them in my fork and I see them referenced in the pull request.

It should (I think) be ready to go, but I'm not sure if I need to do anything to update the pull. If I've missed anything else, just let me know. This is my first contribution, so please bear with me.

Thanks

@codecov
Copy link

codecov bot commented May 17, 2017

Codecov Report

Merging #16307 into master will increase coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16307      +/-   ##
==========================================
+ Coverage   90.38%   90.41%   +0.02%     
==========================================
  Files         161      161              
  Lines       50999    51001       +2     
==========================================
+ Hits        46096    46111      +15     
+ Misses       4903     4890      -13
Flag Coverage Δ
#multiple 88.24% <100%> (+0.02%) ⬆️
#single 40.19% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/missing.py 84.37% <100%> (+0.09%) ⬆️
pandas/core/indexes/datetimes.py 95.33% <0%> (+0.09%) ⬆️
pandas/plotting/_converter.py 65.05% <0%> (+1.81%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a6fcec6...aa92985. Read the comment docs.

@WBare
Copy link
Contributor Author

WBare commented May 22, 2017

Hi Guys,

Looks like I got all checks passed, coverage increased with new tests, I got my spelling error fixed, and I currently have no conflicts with the base branch.

May I get a review on this? It is actually a fairly simple change. I just had to learn the process for submitting.

Thanks!

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new limit_direction='inside' seems a bit confusing to me. We use that to refer to the number of consecutive NaNs to fill. With this change, it's either that, or a special case for "out of valid bounds".

How about we add a new keyword that mimics fill_value from https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html#scipy.interpolate.interp1d. We can call it either fill_value or extrapolate. I think that will require more discussion that the bugfix, so could you split it off of this change and make a new PR?

This will cause the interpolation to fill missing values only when
the missing value is surrounded by valid values. It is useful when
a series needs to be interpolated, but must not expand into NaN
values that were outside the range of the original series. (GH16284)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reference to the issue should be like

:issue:`16284`

- DataFrame.interpolate was not respecting limit_direction when
limit=0 (unlimited). Specifically, it would always use
limit_direction='forward' even when specified otherwise. Now
default limit=0 will work with other directions. (GH16282)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about the reference

@@ -931,6 +931,40 @@ def test_interp_limit_forward(self):
limit_direction='FORWARD')
assert_series_equal(result, expected)

def test_interp_limit_inside(self):
# these test are for issue #16307 'inside' direction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just change this comment to GH #16307 or the link to the issue.

@WBare
Copy link
Contributor Author

WBare commented May 22, 2017

Hi Tom,

Please help me understand your comment. There are currently two parameters in use:
limit=n This is the NUMBER of missing values to fill in a given direction.
limit_direction='param' is an already existing parameter indicating the direction in which values should be filled. Specifically, to the already existing values, I added a new direction of 'inside'.

  • forward - fill forward including values past the last non-NaN
  • backward - fill backward including values before the last non-NaN
  • both - fill both directions including values before and after the last non-NaN
  • ADD: inside - fill values that are surrounded by non-NaN (will only fill missing values that are surrounded by non-NaN and thus will not expand size of the series).

The new direction 'inside' fits with the other existing directions. Please help me understand the need to add a new parameter.

Thanks!

@TomAugspurger
Copy link
Contributor

Your test kind of shows it (`s.interpolate(method='linear', limit=1, limit_direction='inside'):

s = Series([np.nan, 1, 3, np.nan, np.nan, np.nan, 11, np.nan])
expected = Series([np.nan, 1, 3, 5, np.nan, 9, 11, np.nan])

It looks like that's a limit_direction="both", but without extrapolate. You could imagine a case where the user wants to only fill forward (limit_direction="forward") but not extrapolate, so their expected is

expected = Series([np.nan, 1, 3, 5, np.nan, np.nan, 11, np.nan])
#                                           ^ this one changed to still be nan

Basically, extrapolation is different that interpolation ;)

@WBare
Copy link
Contributor Author

WBare commented May 22, 2017

Ah... right. Yes, I agree with you. My key issue was actually with the fact that we are not differentiating between extrapolating and interpolating. I was fixing one case, but a general solution would cover all the cases.

By default scypi interpolates, and by default pandas interpolates AND extrapolates.

Perhaps:
limit_type=None (default to interpolate and extrapolate as it is today)
limit_type=interpolate (only fills inside values)
limit_type=extrapolate (only fills outside values)

Or better yet... maybe:
interpolate=True (by default... False to turn it off)
extrapolate=True (by default... False to turn it off)

Since I'm in the middle of this, I'd be happy to do this if you guys want it done. Just let me know what the core contributors decide.

In the mean time, I will revert my code to master and just apply code required for the limit=n bug fix (I should not have combined these in the first place), then submit changes again.

Thanks!

@TomAugspurger
Copy link
Contributor

@WBare let's move the extrapolate discussion back to #16284

@WBare WBare closed this May 22, 2017
@WBare WBare deleted the Fix-Interpolate-Limit branch May 22, 2017 15:34
@WBare WBare restored the Fix-Interpolate-Limit branch May 22, 2017 15:34
@WBare WBare deleted the Fix-Interpolate-Limit branch May 22, 2017 15:41
@TomAugspurger TomAugspurger added this to the No action milestone May 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

df.interpolate limit_direction= does not work without limit=x setting
4 participants