ENH: interpolate.limit_area() 16284 #16513

WBare · 2017-05-26T14:11:09Z

closes Enhancement Request: control extrapolation on .interpolate #16284
tests added / passed
passes git diff upstream/master --name-only -- '*.py' | flake8 --diff
whatsnew entry

codecov · 2017-05-26T15:34:46Z

Codecov Report

Merging #16513 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #16513      +/-   ##
==========================================
+ Coverage   90.79%   90.79%   +<.01%     
==========================================
  Files         161      161              
  Lines       51063    51074      +11     
==========================================
+ Hits        46364    46375      +11     
  Misses       4699     4699

Flag	Coverage Δ
#multiple	`88.64% <100%> (ø)`	⬆️
#single	`40.14% <0%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/generic.py	`92.26% <ø> (ø)`	⬆️
pandas/core/resample.py	`96.09% <ø> (ø)`	⬆️
pandas/core/internals.py	`93.44% <ø> (ø)`	⬆️
pandas/core/missing.py	`84.8% <100%> (+0.52%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 75c8698...80d67b7. Read the comment docs.

codecov · 2017-05-26T15:35:05Z

Codecov Report

Merging #16513 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #16513      +/-   ##
==========================================
+ Coverage   91.57%   91.57%   +<.01%     
==========================================
  Files         150      150              
  Lines       48684    48695      +11     
==========================================
+ Hits        44583    44594      +11     
  Misses       4101     4101

Flag	Coverage Δ
#multiple	`89.94% <100%> (ø)`	⬆️
#single	`41.71% <0%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/resample.py	`96.43% <ø> (ø)`	⬆️
pandas/core/internals.py	`95.46% <ø> (ø)`	⬆️
pandas/core/generic.py	`95.91% <ø> (ø)`	⬆️
pandas/core/missing.py	`84.78% <100%> (+0.48%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b4662cd...596f145. Read the comment docs.

WBare · 2017-05-26T15:40:45Z

Hi Guys,

Looks like I got 7 tests failed, but they are not related to my code (as far as I can tell).

I see that we have some other active issues on testing right now, but I'm not sure if this issue is knows.

Do I need to do something about this?

Thanks.

XFAIL pandas/tests/test_window.py::TestExpanding::()::tests_empty_df_expanding[1s]
GH 16425 expanding with offset not supported
XFAIL pandas/tests/frame/test_analytics.py::TestDataFrameAnalytics::()::test_clip_mixed_numeric
clip on mixed integer or floats with integer clippers coerces to float
XFAIL pandas/tests/indexes/test_interval.py::TestIntervalIndex::()::test_repr
not a valid repr as we use interval notation
XFAIL pandas/tests/indexes/test_interval.py::TestIntervalIndex::()::test_repr_max_seq_item_setting
not a valid repr as we use interval notation
XFAIL pandas/tests/indexes/test_interval.py::TestIntervalIndex::()::test_repr_roundtrip
not a valid repr as we use interval notation
XFAIL pandas/tests/io/test_excel.py::test_styler_to_excel[xlwt]
xlwt does not support openpyxl-compatible style dicts
XFAIL pandas/tests/io/test_excel.py::test_styler_to_excel[openpyxl]
reason: openpyxl1 does not support some openpyxl2-compatible style dicts

TomAugspurger · 2017-05-26T17:40:02Z

@WBare those are expected failures (x=expected). Nothing to worry about.

I'll review more thoroughly later, but at a glance I think this is a good approach.

WBare · 2017-05-30T12:57:20Z

Sounds good, @TomAugspurger . I see you just did a giant backport, so I know you have your hands full. Let me know if you need anything else on this.

jreback · 2017-05-30T23:20:58Z

doc/source/whatsnew/v0.21.0.txt

@@ -24,6 +24,7 @@ New features
  <https://www.python.org/dev/peps/pep-0519/>`_ on most readers and writers (:issue:`13823`)
 - Added `__fspath__` method to :class`:pandas.HDFStore`, :class:`pandas.ExcelFile`,
  and :class:`pandas.ExcelWriter` to work properly with the file system path protocol (:issue:`13823`)
+- Added `limit_area` parameter to `DataFrame.interpolate()` method allowing further control of which NaNs are replaced (:issue:`16284`)


show a small sub-section example here of why this parameter is useful (take the examples from the docs you wrote above). and provide a pointer to the docs (again which you wrote)

jreback · 2017-05-30T23:22:05Z

pandas/core/generic.py

+            * 'inside' Only fill NaNs surrounded by valid values (interpolate).
+            * 'outside' Only fill NaNs outside valid values (extrapolate).
+            * None: default fill inside and outside
+            .. versionadded:: 0.21.0


put the None one first

@jreback I also noticed and corrected the old .. versionadded tag on 3887 which was not being property replaced. It needed the blank lines to stop it from being combined with the normal paragraph above.

jreback · 2017-05-30T23:22:39Z

pandas/core/missing.py

@@ -155,28 +155,12 @@ def _interp_limit(invalid, fw_limit, bw_limit):
        raise ValueError('Invalid limit_direction: expecting one of %r, got '


can you use .format() here

WBare · 2017-05-31T19:35:25Z

I've committed the changes requested by @jreback , and we are back to green except the expected failures.

Ready for further review.

Thanks.

TomAugspurger

I pushed a small fix removing whitespace and fixing the whatsnew note. Looks good!

jreback · 2017-06-05T10:37:55Z

doc/source/missing_data.rst

@@ -330,6 +330,10 @@ Interpolation

  The ``limit_direction`` keyword argument was added.

+.. versionadded:: 0.21.0
+


this doesn't make sense w/o an example

Hi Jeff,

The examples for both limit_direction and limit_area are below in the "interpolation limits" sub-section.

I'm mostly trying to get the correct style from inference, so I basically reproduced what had been done in the past for limit_direction.

There is a location (.. _missing_data.interp_limits:) below these versionadded references to which both limit_direction and limit_area can be linked if that is the right style.

Honestly, since version added is part of the docstrings, I'm not sure it needs to be reproduced here at all, but again, that is a bigger style question above my pay grade. :-)

A link to below sounds good. You can make a new one specifically for _missing_data.interp_limit_area

Honestly, since version added is part of the docstrings, I'm not sure it needs to be reproduced here at all, but again, that is a bigger style question above my pay grade. :-)

I agree with this, I would just remove it here.

TomAugspurger · 2017-06-07T21:59:00Z

Looks like there's a conflict in pandas/core/missing.py. Mind rebasing? Sorry I couldn't get to this earlier.

jorisvandenbossche · 2017-06-07T22:29:31Z

doc/source/missing_data.rst

+   ser.interpolate(limit_direction='both')
+
+By default, ``NaN`` values are filled whether they are inside (surrounded by)
+existing valid values, or outside existing valid values. Introduced in v0.21


"Introduced in v0.21" -> "Introduced in pandas 0.21, "

jorisvandenbossche

I am not sure I fully like the limit_area keyword name. It is now clear what it does after reading the PR / docs in the PR, but when I just saw the title / keyword name, I wouldn't have guessed it would do this.

But not directly an idea for another name. Where there other suggestions in the issue?

jorisvandenbossche · 2017-06-07T22:30:12Z

doc/source/missing_data.rst

+.. ipython:: python
+
+   # fill one consecutive inside value in both directions
+   ser.interpolate(limit=1, limit_area='inside', limit_direction='both')


can you put limit_area here also after limit_direction (to have it consistent with the other examples)?

jorisvandenbossche · 2017-06-07T22:30:55Z

doc/source/whatsnew/v0.21.0.txt

@@ -24,6 +24,9 @@ New features
  <https://www.python.org/dev/peps/pep-0519/>`_ on most readers and writers (:issue:`13823`)
 - Added `__fspath__` method to :class`:pandas.HDFStore`, :class:`pandas.ExcelFile`,
  and :class:`pandas.ExcelWriter` to work properly with the file system path protocol (:issue:`13823`)
+- Added `limit_area` parameter to `DataFrame.interpolate()` method allowing further control of which NaNs are replaced.


Can you use double backticks around limit_area and DataFrame.interpolate ?

and just say .interpolate() as this will work on Series & DataFrame

or you can do a :func:DataFrame.clip and :func:`Series.clip``

jorisvandenbossche · 2017-06-07T22:31:27Z

doc/source/whatsnew/v0.21.0.txt

@@ -24,6 +24,9 @@ New features
  <https://www.python.org/dev/peps/pep-0519/>`_ on most readers and writers (:issue:`13823`)
 - Added `__fspath__` method to :class`:pandas.HDFStore`, :class:`pandas.ExcelFile`,
  and :class:`pandas.ExcelWriter` to work properly with the file system path protocol (:issue:`13823`)
+- Added `limit_area` parameter to `DataFrame.interpolate()` method allowing further control of which NaNs are replaced.
+  Use `limit_area='inside'` to fill only NaNs surrounded by valid values or use `limit_area='outside'` to fill only NaNs outside the existing valid values while preserving those inside.  (:issue:`16284`)


Same for the single backticks on this line.

jorisvandenbossche · 2017-06-07T22:31:52Z

pandas/core/generic.py

+            * None: (default) no fill restriction
+            * 'inside' Only fill NaNs surrounded by valid values (interpolate).
+            * 'outside' Only fill NaNs outside valid values (extrapolate).
+            .. versionadded:: 0.21.0


Can you put a blank line above this one

jorisvandenbossche · 2017-06-07T22:32:36Z

pandas/core/generic.py


            .. versionadded:: 0.17.0

+        limit_area : {'inside', 'outside'}, default None
+            * None: (default) no fill restriction
+            * 'inside' Only fill NaNs surrounded by valid values (interpolate).


I would put a colon (:) after 'inside' (same for the line below)

jreback

isn't limit_area essentially the difference between interpolate and extrapolate?

TomAugspurger · 2017-06-20T22:13:30Z

isn't limit_area essentially the difference between interpolate and extrapolate?

Essentially. I initially wanted a boolean extrapolate would would be like

extraploate=True is like limit_area=None
extrapolate=False is like limit_area='inside'

I think limit_area is fine, since it's more consistent with the other limit_ keywords, it has the option to only extrapolate (not sure why you would want to do that, personally, but w/e), and this allows you to pass extrapolate through to the scipy methods if you're using though, which has a bit more powerful / different behavior.

jreback

lgtm. some doc comments. ping on green.

jreback · 2017-06-21T10:27:00Z

doc/source/missing_data.rst

+
+By default, ``NaN`` values are filled whether they are inside (surrounded by)
+existing valid values, or outside existing valid values. Introduced in v0.21
+the ``limit_area`` parameter restricts filling to either inside or outside values.


maybe add some working about interpolation vs extrapolation here.

maybe also when you would want to use / do this.

jreback · 2017-06-21T10:28:19Z

doc/source/whatsnew/v0.21.0.txt

@@ -24,6 +24,9 @@ New features
  <https://www.python.org/dev/peps/pep-0519/>`_ on most readers and writers (:issue:`13823`)
 - Added `__fspath__` method to :class`:pandas.HDFStore`, :class:`pandas.ExcelFile`,
  and :class:`pandas.ExcelWriter` to work properly with the file system path protocol (:issue:`13823`)
+- Added `limit_area` parameter to `DataFrame.interpolate()` method allowing further control of which NaNs are replaced.


and just say .interpolate() as this will work on Series & DataFrame

jreback · 2017-06-21T10:28:58Z

pandas/tests/series/test_missing.py

@@ -959,6 +959,45 @@ def test_interp_limit_bad_direction(self):
        pytest.raises(ValueError, s.interpolate, method='linear',
                      limit_direction='abc')

+    # limit_area introduced GH #16284


can you put the comment inside the function

jreback · 2017-06-21T10:31:30Z

doc/source/whatsnew/v0.21.0.txt

@@ -24,6 +24,9 @@ New features
  <https://www.python.org/dev/peps/pep-0519/>`_ on most readers and writers (:issue:`13823`)
 - Added `__fspath__` method to :class`:pandas.HDFStore`, :class:`pandas.ExcelFile`,
  and :class:`pandas.ExcelWriter` to work properly with the file system path protocol (:issue:`13823`)
+- Added `limit_area` parameter to `DataFrame.interpolate()` method allowing further control of which NaNs are replaced.


or you can do a :func:DataFrame.clip and :func:`Series.clip``

jreback · 2017-07-07T10:36:03Z

if you can rebase and update for comments

jreback · 2017-07-19T10:35:11Z

can you rebase / update according to comments

jreback · 2017-08-18T01:00:00Z

can you rebase. this looked pretty good.

jreback · 2017-09-23T20:14:59Z

@WBare can you rebase this.

jreback · 2017-10-28T00:34:47Z

can you rebase and we can finally get this in!

jreback · 2017-11-12T14:49:43Z

@WBare can you rebase / update.

note that #8000 might be related here.

jreback · 2017-11-22T02:35:33Z

can you rebase / update

jreback · 2018-01-21T22:09:44Z

@TomAugspurger I rebased. prob needs a once over if you can.

jreback · 2018-01-21T22:10:17Z

doc/source/missing_data.rst

+   # fill all consecutive values in both directions
+   ser.interpolate(limit_direction='both')
+
+By default, ``NaN`` values are filled whether they are inside (surrounded by)


need to update this

jreback · 2018-01-21T22:10:42Z

also haven't addressed @jorisvandenbossche comments yet.

jreback · 2018-02-24T17:06:23Z

closed by 35812ea

WBare added 3 commits May 26, 2017 09:41

ENH limit_area added to interpolate1d

9852ec4

DOC: Added limit_area to whatsnew

4bacc45

Fix code style - is not

80d67b7

jreback requested changes May 30, 2017

View reviewed changes

jreback added Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels May 30, 2017

requested doc changes

d83246c

TomAugspurger added this to the 0.21.0 milestone Jun 2, 2017

Lint and doc fix

b24e488

TomAugspurger approved these changes Jun 2, 2017

View reviewed changes

jreback reviewed Jun 5, 2017

View reviewed changes

jorisvandenbossche reviewed Jun 7, 2017

View reviewed changes

jreback reviewed Jun 10, 2017

View reviewed changes

jreback requested changes Jun 21, 2017

View reviewed changes

jreback removed this from the 0.21.0 milestone Sep 23, 2017

jreback mentioned this pull request Nov 12, 2017

DataFrame.interpolate() extrapolates over trailing missing data #8000

Closed

jreback added 3 commits January 21, 2018 16:46

Merge branch 'master' into PR_TOOL_MERGE_PR_16513

61e808f

whatsnew

41af8e3

cleanup

7c53e78

jreback added this to the 0.23.0 milestone Jan 21, 2018

jreback requested changes Jan 21, 2018

View reviewed changes

jreback added 2 commits January 24, 2018 06:32

Merge branch 'master' into PR_TOOL_MERGE_PR_16513

e91cf4f

more docs

596f145

jreback closed this Feb 24, 2018

cchwala mentioned this pull request Jun 11, 2019

limit_area and limit_direction do not have an effect when interpolation method is 'pad' #26796

Closed

		@@ -155,28 +155,12 @@ def _interp_limit(invalid, fw_limit, bw_limit):
		raise ValueError('Invalid limit_direction: expecting one of %r, got '

		@@ -330,6 +330,10 @@ Interpolation

		The ``limit_direction`` keyword argument was added.

		.. versionadded:: 0.21.0

ENH: interpolate.limit_area() 16284 #16513

ENH: interpolate.limit_area() 16284 #16513

Conversation

WBare commented May 26, 2017

codecov bot commented May 26, 2017

Codecov Report

codecov bot commented May 26, 2017 • edited Loading

Codecov Report

WBare commented May 26, 2017

TomAugspurger commented May 26, 2017

WBare commented May 30, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WBare commented May 31, 2017

TomAugspurger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Jun 7, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

TomAugspurger commented Jun 20, 2017

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jul 7, 2017

jreback commented Jul 19, 2017

jreback commented Aug 18, 2017

jreback commented Sep 23, 2017

jreback commented Oct 28, 2017

jreback commented Nov 12, 2017

jreback commented Nov 22, 2017

jreback commented Jan 21, 2018

Choose a reason for hiding this comment

jreback commented Jan 21, 2018

jreback commented Feb 24, 2018

codecov bot commented May 26, 2017 •

edited

Loading