Skip to content

ENH: interpolate.limit_area() 16284 #16513

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from
Closed

ENH: interpolate.limit_area() 16284 #16513

wants to merge 10 commits into from

Conversation

WBare
Copy link
Contributor

@WBare WBare commented May 26, 2017

@codecov
Copy link

codecov bot commented May 26, 2017

Codecov Report

Merging #16513 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16513      +/-   ##
==========================================
+ Coverage   90.79%   90.79%   +<.01%     
==========================================
  Files         161      161              
  Lines       51063    51074      +11     
==========================================
+ Hits        46364    46375      +11     
  Misses       4699     4699
Flag Coverage Δ
#multiple 88.64% <100%> (ø) ⬆️
#single 40.14% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/generic.py 92.26% <ø> (ø) ⬆️
pandas/core/resample.py 96.09% <ø> (ø) ⬆️
pandas/core/internals.py 93.44% <ø> (ø) ⬆️
pandas/core/missing.py 84.8% <100%> (+0.52%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 75c8698...80d67b7. Read the comment docs.

@codecov
Copy link

codecov bot commented May 26, 2017

Codecov Report

Merging #16513 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16513      +/-   ##
==========================================
+ Coverage   91.57%   91.57%   +<.01%     
==========================================
  Files         150      150              
  Lines       48684    48695      +11     
==========================================
+ Hits        44583    44594      +11     
  Misses       4101     4101
Flag Coverage Δ
#multiple 89.94% <100%> (ø) ⬆️
#single 41.71% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/resample.py 96.43% <ø> (ø) ⬆️
pandas/core/internals.py 95.46% <ø> (ø) ⬆️
pandas/core/generic.py 95.91% <ø> (ø) ⬆️
pandas/core/missing.py 84.78% <100%> (+0.48%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b4662cd...596f145. Read the comment docs.

@WBare
Copy link
Contributor Author

WBare commented May 26, 2017

Hi Guys,

Looks like I got 7 tests failed, but they are not related to my code (as far as I can tell).

I see that we have some other active issues on testing right now, but I'm not sure if this issue is knows.

Do I need to do something about this?

Thanks.

XFAIL pandas/tests/test_window.py::TestExpanding::()::tests_empty_df_expanding[1s]
GH 16425 expanding with offset not supported
XFAIL pandas/tests/frame/test_analytics.py::TestDataFrameAnalytics::()::test_clip_mixed_numeric
clip on mixed integer or floats with integer clippers coerces to float
XFAIL pandas/tests/indexes/test_interval.py::TestIntervalIndex::()::test_repr
not a valid repr as we use interval notation
XFAIL pandas/tests/indexes/test_interval.py::TestIntervalIndex::()::test_repr_max_seq_item_setting
not a valid repr as we use interval notation
XFAIL pandas/tests/indexes/test_interval.py::TestIntervalIndex::()::test_repr_roundtrip
not a valid repr as we use interval notation
XFAIL pandas/tests/io/test_excel.py::test_styler_to_excel[xlwt]
xlwt does not support openpyxl-compatible style dicts
XFAIL pandas/tests/io/test_excel.py::test_styler_to_excel[openpyxl]
reason: openpyxl1 does not support some openpyxl2-compatible style dicts

@TomAugspurger
Copy link
Contributor

@WBare those are expected failures (x=expected). Nothing to worry about.

I'll review more thoroughly later, but at a glance I think this is a good approach.

@WBare
Copy link
Contributor Author

WBare commented May 30, 2017

Sounds good, @TomAugspurger . I see you just did a giant backport, so I know you have your hands full. Let me know if you need anything else on this.

@@ -24,6 +24,7 @@ New features
<https://www.python.org/dev/peps/pep-0519/>`_ on most readers and writers (:issue:`13823`)
- Added `__fspath__` method to :class`:pandas.HDFStore`, :class:`pandas.ExcelFile`,
and :class:`pandas.ExcelWriter` to work properly with the file system path protocol (:issue:`13823`)
- Added `limit_area` parameter to `DataFrame.interpolate()` method allowing further control of which NaNs are replaced (:issue:`16284`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

show a small sub-section example here of why this parameter is useful (take the examples from the docs you wrote above). and provide a pointer to the docs (again which you wrote)

* 'inside' Only fill NaNs surrounded by valid values (interpolate).
* 'outside' Only fill NaNs outside valid values (extrapolate).
* None: default fill inside and outside
.. versionadded:: 0.21.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put the None one first

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I also noticed and corrected the old .. versionadded tag on 3887 which was not being property replaced. It needed the blank lines to stop it from being combined with the normal paragraph above.

@@ -155,28 +155,12 @@ def _interp_limit(invalid, fw_limit, bw_limit):
raise ValueError('Invalid limit_direction: expecting one of %r, got '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use .format() here

@jreback jreback added Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels May 30, 2017
@WBare
Copy link
Contributor Author

WBare commented May 31, 2017

I've committed the changes requested by @jreback , and we are back to green except the expected failures.

Ready for further review.

Thanks.

@TomAugspurger TomAugspurger added this to the 0.21.0 milestone Jun 2, 2017
Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a small fix removing whitespace and fixing the whatsnew note. Looks good!

@@ -330,6 +330,10 @@ Interpolation

The ``limit_direction`` keyword argument was added.

.. versionadded:: 0.21.0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't make sense w/o an example

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Jeff,

The examples for both limit_direction and limit_area are below in the "interpolation limits" sub-section.

I'm mostly trying to get the correct style from inference, so I basically reproduced what had been done in the past for limit_direction.

There is a location (.. _missing_data.interp_limits:) below these versionadded references to which both limit_direction and limit_area can be linked if that is the right style.

Honestly, since version added is part of the docstrings, I'm not sure it needs to be reproduced here at all, but again, that is a bigger style question above my pay grade. :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A link to below sounds good. You can make a new one specifically for _missing_data.interp_limit_area

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, since version added is part of the docstrings, I'm not sure it needs to be reproduced here at all, but again, that is a bigger style question above my pay grade. :-)

I agree with this, I would just remove it here.

@TomAugspurger
Copy link
Contributor

Looks like there's a conflict in pandas/core/missing.py. Mind rebasing? Sorry I couldn't get to this earlier.

ser.interpolate(limit_direction='both')

By default, ``NaN`` values are filled whether they are inside (surrounded by)
existing valid values, or outside existing valid values. Introduced in v0.21
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Introduced in v0.21" -> "Introduced in pandas 0.21, "

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I fully like the limit_area keyword name. It is now clear what it does after reading the PR / docs in the PR, but when I just saw the title / keyword name, I wouldn't have guessed it would do this.

But not directly an idea for another name. Where there other suggestions in the issue?

.. ipython:: python

# fill one consecutive inside value in both directions
ser.interpolate(limit=1, limit_area='inside', limit_direction='both')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put limit_area here also after limit_direction (to have it consistent with the other examples)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -24,6 +24,9 @@ New features
<https://www.python.org/dev/peps/pep-0519/>`_ on most readers and writers (:issue:`13823`)
- Added `__fspath__` method to :class`:pandas.HDFStore`, :class:`pandas.ExcelFile`,
and :class:`pandas.ExcelWriter` to work properly with the file system path protocol (:issue:`13823`)
- Added `limit_area` parameter to `DataFrame.interpolate()` method allowing further control of which NaNs are replaced.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use double backticks around limit_area and DataFrame.interpolate ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and just say .interpolate() as this will work on Series & DataFrame

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or you can do a :func:DataFrame.clip and :func:`Series.clip``

@@ -24,6 +24,9 @@ New features
<https://www.python.org/dev/peps/pep-0519/>`_ on most readers and writers (:issue:`13823`)
- Added `__fspath__` method to :class`:pandas.HDFStore`, :class:`pandas.ExcelFile`,
and :class:`pandas.ExcelWriter` to work properly with the file system path protocol (:issue:`13823`)
- Added `limit_area` parameter to `DataFrame.interpolate()` method allowing further control of which NaNs are replaced.
Use `limit_area='inside'` to fill only NaNs surrounded by valid values or use `limit_area='outside'` to fill only NaNs outside the existing valid values while preserving those inside. (:issue:`16284`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for the single backticks on this line.

* None: (default) no fill restriction
* 'inside' Only fill NaNs surrounded by valid values (interpolate).
* 'outside' Only fill NaNs outside valid values (extrapolate).
.. versionadded:: 0.21.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put a blank line above this one


.. versionadded:: 0.17.0

limit_area : {'inside', 'outside'}, default None
* None: (default) no fill restriction
* 'inside' Only fill NaNs surrounded by valid values (interpolate).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put a colon (:) after 'inside' (same for the line below)

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't limit_area essentially the difference between interpolate and extrapolate?

@TomAugspurger
Copy link
Contributor

isn't limit_area essentially the difference between interpolate and extrapolate?

Essentially. I initially wanted a boolean extrapolate would would be like

  • extraploate=True is like limit_area=None
  • extrapolate=False is like limit_area='inside'

I think limit_area is fine, since it's more consistent with the other limit_ keywords, it has the option to only extrapolate (not sure why you would want to do that, personally, but w/e), and this allows you to pass extrapolate through to the scipy methods if you're using though, which has a bit more powerful / different behavior.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. some doc comments. ping on green.


By default, ``NaN`` values are filled whether they are inside (surrounded by)
existing valid values, or outside existing valid values. Introduced in v0.21
the ``limit_area`` parameter restricts filling to either inside or outside values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add some working about interpolation vs extrapolation here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe also when you would want to use / do this.

@@ -24,6 +24,9 @@ New features
<https://www.python.org/dev/peps/pep-0519/>`_ on most readers and writers (:issue:`13823`)
- Added `__fspath__` method to :class`:pandas.HDFStore`, :class:`pandas.ExcelFile`,
and :class:`pandas.ExcelWriter` to work properly with the file system path protocol (:issue:`13823`)
- Added `limit_area` parameter to `DataFrame.interpolate()` method allowing further control of which NaNs are replaced.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and just say .interpolate() as this will work on Series & DataFrame

@@ -959,6 +959,45 @@ def test_interp_limit_bad_direction(self):
pytest.raises(ValueError, s.interpolate, method='linear',
limit_direction='abc')

# limit_area introduced GH #16284
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put the comment inside the function

@@ -24,6 +24,9 @@ New features
<https://www.python.org/dev/peps/pep-0519/>`_ on most readers and writers (:issue:`13823`)
- Added `__fspath__` method to :class`:pandas.HDFStore`, :class:`pandas.ExcelFile`,
and :class:`pandas.ExcelWriter` to work properly with the file system path protocol (:issue:`13823`)
- Added `limit_area` parameter to `DataFrame.interpolate()` method allowing further control of which NaNs are replaced.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or you can do a :func:DataFrame.clip and :func:`Series.clip``

@jreback
Copy link
Contributor

jreback commented Jul 7, 2017

if you can rebase and update for comments

@jreback
Copy link
Contributor

jreback commented Jul 19, 2017

can you rebase / update according to comments

@jreback
Copy link
Contributor

jreback commented Aug 18, 2017

can you rebase. this looked pretty good.

@jreback
Copy link
Contributor

jreback commented Sep 23, 2017

@WBare can you rebase this.

@jreback jreback removed this from the 0.21.0 milestone Sep 23, 2017
@jreback
Copy link
Contributor

jreback commented Oct 28, 2017

can you rebase and we can finally get this in!

@jreback
Copy link
Contributor

jreback commented Nov 12, 2017

@WBare can you rebase / update.

note that #8000 might be related here.

@jreback
Copy link
Contributor

jreback commented Nov 22, 2017

can you rebase / update

@jreback jreback added this to the 0.23.0 milestone Jan 21, 2018
@jreback
Copy link
Contributor

jreback commented Jan 21, 2018

@TomAugspurger I rebased. prob needs a once over if you can.

# fill all consecutive values in both directions
ser.interpolate(limit_direction='both')

By default, ``NaN`` values are filled whether they are inside (surrounded by)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to update this

@jreback
Copy link
Contributor

jreback commented Jan 21, 2018

also haven't addressed @jorisvandenbossche comments yet.

@jreback
Copy link
Contributor

jreback commented Feb 24, 2018

closed by 35812ea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enhancement Request: control extrapolation on .interpolate
4 participants