-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: interpolate.limit_area() 16284 #16513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #16513 +/- ##
==========================================
+ Coverage 90.79% 90.79% +<.01%
==========================================
Files 161 161
Lines 51063 51074 +11
==========================================
+ Hits 46364 46375 +11
Misses 4699 4699
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #16513 +/- ##
==========================================
+ Coverage 91.57% 91.57% +<.01%
==========================================
Files 150 150
Lines 48684 48695 +11
==========================================
+ Hits 44583 44594 +11
Misses 4101 4101
Continue to review full report at Codecov.
|
Hi Guys, Looks like I got 7 tests failed, but they are not related to my code (as far as I can tell). I see that we have some other active issues on testing right now, but I'm not sure if this issue is knows. Do I need to do something about this? Thanks. XFAIL pandas/tests/test_window.py::TestExpanding::()::tests_empty_df_expanding[1s] |
@WBare those are expected failures (x=expected). Nothing to worry about. I'll review more thoroughly later, but at a glance I think this is a good approach. |
Sounds good, @TomAugspurger . I see you just did a giant backport, so I know you have your hands full. Let me know if you need anything else on this. |
doc/source/whatsnew/v0.21.0.txt
Outdated
@@ -24,6 +24,7 @@ New features | |||
<https://www.python.org/dev/peps/pep-0519/>`_ on most readers and writers (:issue:`13823`) | |||
- Added `__fspath__` method to :class`:pandas.HDFStore`, :class:`pandas.ExcelFile`, | |||
and :class:`pandas.ExcelWriter` to work properly with the file system path protocol (:issue:`13823`) | |||
- Added `limit_area` parameter to `DataFrame.interpolate()` method allowing further control of which NaNs are replaced (:issue:`16284`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
show a small sub-section example here of why this parameter is useful (take the examples from the docs you wrote above). and provide a pointer to the docs (again which you wrote)
pandas/core/generic.py
Outdated
* 'inside' Only fill NaNs surrounded by valid values (interpolate). | ||
* 'outside' Only fill NaNs outside valid values (extrapolate). | ||
* None: default fill inside and outside | ||
.. versionadded:: 0.21.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put the None
one first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback I also noticed and corrected the old .. versionadded tag on 3887 which was not being property replaced. It needed the blank lines to stop it from being combined with the normal paragraph above.
pandas/core/missing.py
Outdated
@@ -155,28 +155,12 @@ def _interp_limit(invalid, fw_limit, bw_limit): | |||
raise ValueError('Invalid limit_direction: expecting one of %r, got ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use .format()
here
I've committed the changes requested by @jreback , and we are back to green except the expected failures. Ready for further review. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a small fix removing whitespace and fixing the whatsnew note. Looks good!
doc/source/missing_data.rst
Outdated
@@ -330,6 +330,10 @@ Interpolation | |||
|
|||
The ``limit_direction`` keyword argument was added. | |||
|
|||
.. versionadded:: 0.21.0 | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't make sense w/o an example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Jeff,
The examples for both limit_direction and limit_area are below in the "interpolation limits" sub-section.
I'm mostly trying to get the correct style from inference, so I basically reproduced what had been done in the past for limit_direction.
There is a location (.. _missing_data.interp_limits:) below these versionadded references to which both limit_direction and limit_area can be linked if that is the right style.
Honestly, since version added is part of the docstrings, I'm not sure it needs to be reproduced here at all, but again, that is a bigger style question above my pay grade. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A link to below sounds good. You can make a new one specifically for _missing_data.interp_limit_area
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, since version added is part of the docstrings, I'm not sure it needs to be reproduced here at all, but again, that is a bigger style question above my pay grade. :-)
I agree with this, I would just remove it here.
Looks like there's a conflict in |
doc/source/missing_data.rst
Outdated
ser.interpolate(limit_direction='both') | ||
|
||
By default, ``NaN`` values are filled whether they are inside (surrounded by) | ||
existing valid values, or outside existing valid values. Introduced in v0.21 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Introduced in v0.21" -> "Introduced in pandas 0.21, "
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I fully like the limit_area
keyword name. It is now clear what it does after reading the PR / docs in the PR, but when I just saw the title / keyword name, I wouldn't have guessed it would do this.
But not directly an idea for another name. Where there other suggestions in the issue?
doc/source/missing_data.rst
Outdated
.. ipython:: python | ||
|
||
# fill one consecutive inside value in both directions | ||
ser.interpolate(limit=1, limit_area='inside', limit_direction='both') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put limit_area here also after limit_direction (to have it consistent with the other examples)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/source/whatsnew/v0.21.0.txt
Outdated
@@ -24,6 +24,9 @@ New features | |||
<https://www.python.org/dev/peps/pep-0519/>`_ on most readers and writers (:issue:`13823`) | |||
- Added `__fspath__` method to :class`:pandas.HDFStore`, :class:`pandas.ExcelFile`, | |||
and :class:`pandas.ExcelWriter` to work properly with the file system path protocol (:issue:`13823`) | |||
- Added `limit_area` parameter to `DataFrame.interpolate()` method allowing further control of which NaNs are replaced. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use double backticks around limit_area and DataFrame.interpolate ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and just say .interpolate()
as this will work on Series & DataFrame
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or you can do a :func:DataFrame.clip
and :func:`Series.clip``
doc/source/whatsnew/v0.21.0.txt
Outdated
@@ -24,6 +24,9 @@ New features | |||
<https://www.python.org/dev/peps/pep-0519/>`_ on most readers and writers (:issue:`13823`) | |||
- Added `__fspath__` method to :class`:pandas.HDFStore`, :class:`pandas.ExcelFile`, | |||
and :class:`pandas.ExcelWriter` to work properly with the file system path protocol (:issue:`13823`) | |||
- Added `limit_area` parameter to `DataFrame.interpolate()` method allowing further control of which NaNs are replaced. | |||
Use `limit_area='inside'` to fill only NaNs surrounded by valid values or use `limit_area='outside'` to fill only NaNs outside the existing valid values while preserving those inside. (:issue:`16284`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for the single backticks on this line.
* None: (default) no fill restriction | ||
* 'inside' Only fill NaNs surrounded by valid values (interpolate). | ||
* 'outside' Only fill NaNs outside valid values (extrapolate). | ||
.. versionadded:: 0.21.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put a blank line above this one
pandas/core/generic.py
Outdated
|
||
.. versionadded:: 0.17.0 | ||
|
||
limit_area : {'inside', 'outside'}, default None | ||
* None: (default) no fill restriction | ||
* 'inside' Only fill NaNs surrounded by valid values (interpolate). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put a colon (:
) after 'inside' (same for the line below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't limit_area essentially the difference between interpolate and extrapolate?
Essentially. I initially wanted a boolean
I think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. some doc comments. ping on green.
doc/source/missing_data.rst
Outdated
|
||
By default, ``NaN`` values are filled whether they are inside (surrounded by) | ||
existing valid values, or outside existing valid values. Introduced in v0.21 | ||
the ``limit_area`` parameter restricts filling to either inside or outside values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add some working about interpolation vs extrapolation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe also when you would want to use / do this.
doc/source/whatsnew/v0.21.0.txt
Outdated
@@ -24,6 +24,9 @@ New features | |||
<https://www.python.org/dev/peps/pep-0519/>`_ on most readers and writers (:issue:`13823`) | |||
- Added `__fspath__` method to :class`:pandas.HDFStore`, :class:`pandas.ExcelFile`, | |||
and :class:`pandas.ExcelWriter` to work properly with the file system path protocol (:issue:`13823`) | |||
- Added `limit_area` parameter to `DataFrame.interpolate()` method allowing further control of which NaNs are replaced. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and just say .interpolate()
as this will work on Series & DataFrame
@@ -959,6 +959,45 @@ def test_interp_limit_bad_direction(self): | |||
pytest.raises(ValueError, s.interpolate, method='linear', | |||
limit_direction='abc') | |||
|
|||
# limit_area introduced GH #16284 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put the comment inside the function
doc/source/whatsnew/v0.21.0.txt
Outdated
@@ -24,6 +24,9 @@ New features | |||
<https://www.python.org/dev/peps/pep-0519/>`_ on most readers and writers (:issue:`13823`) | |||
- Added `__fspath__` method to :class`:pandas.HDFStore`, :class:`pandas.ExcelFile`, | |||
and :class:`pandas.ExcelWriter` to work properly with the file system path protocol (:issue:`13823`) | |||
- Added `limit_area` parameter to `DataFrame.interpolate()` method allowing further control of which NaNs are replaced. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or you can do a :func:DataFrame.clip
and :func:`Series.clip``
if you can rebase and update for comments |
can you rebase / update according to comments |
can you rebase. this looked pretty good. |
@WBare can you rebase this. |
can you rebase and we can finally get this in! |
can you rebase / update |
@TomAugspurger I rebased. prob needs a once over if you can. |
# fill all consecutive values in both directions | ||
ser.interpolate(limit_direction='both') | ||
|
||
By default, ``NaN`` values are filled whether they are inside (surrounded by) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to update this
also haven't addressed @jorisvandenbossche comments yet. |
closed by 35812ea |
git diff upstream/master --name-only -- '*.py' | flake8 --diff