Skip to content

ENH: interpolate.limit_area() 16284 #16513

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 37 additions & 12 deletions doc/source/missing_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -330,6 +330,10 @@ Interpolation

The ``limit_direction`` keyword argument was added.

.. versionadded:: 0.21.0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't make sense w/o an example

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Jeff,

The examples for both limit_direction and limit_area are below in the "interpolation limits" sub-section.

I'm mostly trying to get the correct style from inference, so I basically reproduced what had been done in the past for limit_direction.

There is a location (.. _missing_data.interp_limits:) below these versionadded references to which both limit_direction and limit_area can be linked if that is the right style.

Honestly, since version added is part of the docstrings, I'm not sure it needs to be reproduced here at all, but again, that is a bigger style question above my pay grade. :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A link to below sounds good. You can make a new one specifically for _missing_data.interp_limit_area

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, since version added is part of the docstrings, I'm not sure it needs to be reproduced here at all, but again, that is a bigger style question above my pay grade. :-)

I agree with this, I would just remove it here.

The ``limit_area`` keyword argument was added.

Both Series and Dataframe objects have an ``interpolate`` method that, by default,
performs linear interpolation at missing datapoints.

Expand Down Expand Up @@ -454,33 +458,54 @@ at the new values.
.. _documentation: http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation
.. _guide: http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html

.. _missing_data.interp_limits:

Interpolation Limits
^^^^^^^^^^^^^^^^^^^^

Like other pandas fill methods, ``interpolate`` accepts a ``limit`` keyword
argument. Use this argument to limit the number of consecutive interpolations,
keeping ``NaN`` values for interpolations that are too far from the last valid
observation:
argument. Use this argument to limit the number of consecutive ``NaN`` values
filled since the last valid observation:

.. ipython:: python

ser = pd.Series([np.nan, np.nan, 5, np.nan, np.nan, np.nan, 13])
ser.interpolate(limit=2)
ser = pd.Series([np.nan, np.nan, 5, np.nan, np.nan, np.nan, 13, np.nan, np.nan])

By default, ``limit`` applies in a forward direction, so that only ``NaN``
values after a non-``NaN`` value can be filled. If you provide ``'backward'`` or
``'both'`` for the ``limit_direction`` keyword argument, you can fill ``NaN``
values before non-``NaN`` values, or both before and after non-``NaN`` values,
respectively:
# fill all consecutive values in a forward direction
ser.interpolate()

.. ipython:: python
# fill one consecutive value in a forward direction
ser.interpolate(limit=1)

By default, ``NaN`` values are filled in a ``forward`` direction. Use
``limit_direction`` parameter to fill ``backward`` or from ``both`` directions.

ser.interpolate(limit=1) # limit_direction == 'forward'
.. ipython:: python

# fill one consecutive value backwards
ser.interpolate(limit=1, limit_direction='backward')

# fill one consecutive value in both directions
ser.interpolate(limit=1, limit_direction='both')

# fill all consecutive values in both directions
ser.interpolate(limit_direction='both')

By default, ``NaN`` values are filled whether they are inside (surrounded by)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to update this

existing valid values, or outside existing valid values. Introduced in v0.21
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Introduced in v0.21" -> "Introduced in pandas 0.21, "

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

the ``limit_area`` parameter restricts filling to either inside or outside values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add some working about interpolation vs extrapolation here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe also when you would want to use / do this.


.. ipython:: python

# fill one consecutive inside value in both directions
ser.interpolate(limit=1, limit_area='inside', limit_direction='both')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put limit_area here also after limit_direction (to have it consistent with the other examples)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


# fill all consecutive outside values backward
ser.interpolate(limit_direction='backward', limit_area='outside')

# fill all consecutive outside values in both directions
ser.interpolate(limit_direction='both', limit_area='outside')

.. _missing_data.replace:

Replacing Generic Values
Expand Down
3 changes: 3 additions & 0 deletions doc/source/whatsnew/v0.21.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ New features
<https://www.python.org/dev/peps/pep-0519/>`_ on most readers and writers (:issue:`13823`)
- Added `__fspath__` method to :class`:pandas.HDFStore`, :class:`pandas.ExcelFile`,
and :class:`pandas.ExcelWriter` to work properly with the file system path protocol (:issue:`13823`)
- Added `limit_area` parameter to `DataFrame.interpolate()` method allowing further control of which NaNs are replaced.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use double backticks around limit_area and DataFrame.interpolate ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and just say .interpolate() as this will work on Series & DataFrame

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or you can do a :func:DataFrame.clip and :func:`Series.clip``

Use `limit-area='inside'` to fill only NaNs surrounded by valid values or use `limit-area='outside'` to fill only NaNs outside the existing valid values while preserving those inside. (:issue:`16284`)
Full documentation and examples are :ref:`here <missing_data.interp_limits>`.

.. _whatsnew_0210.enhancements.other:

Expand Down
15 changes: 11 additions & 4 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -3883,11 +3883,16 @@ def replace(self, to_replace=None, value=None, inplace=False, limit=None,
limit : int, default None.
Maximum number of consecutive NaNs to fill. Must be greater than 0.
limit_direction : {'forward', 'backward', 'both'}, default 'forward'
If limit is specified, consecutive NaNs will be filled in this
direction.

Consecutive NaNs will be filled in this direction.

.. versionadded:: 0.17.0

limit_area : {'inside', 'outside'}, default None
* None: (default) no fill restriction
* 'inside' Only fill NaNs surrounded by valid values (interpolate).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put a colon (:) after 'inside' (same for the line below)

* 'outside' Only fill NaNs outside valid values (extrapolate).
.. versionadded:: 0.21.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put the None one first

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I also noticed and corrected the old .. versionadded tag on 3887 which was not being property replaced. It needed the blank lines to stop it from being combined with the normal paragraph above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put a blank line above this one


inplace : bool, default False
Update the NDFrame in place if possible.
downcast : optional, 'infer' or None, defaults to None
Expand Down Expand Up @@ -3919,7 +3924,8 @@ def replace(self, to_replace=None, value=None, inplace=False, limit=None,

@Appender(_shared_docs['interpolate'] % _shared_doc_kwargs)
def interpolate(self, method='linear', axis=0, limit=None, inplace=False,
limit_direction='forward', downcast=None, **kwargs):
limit_direction='forward', limit_area=None,
downcast=None, **kwargs):
"""
Interpolate values according to different methods.
"""
Expand Down Expand Up @@ -3968,6 +3974,7 @@ def interpolate(self, method='linear', axis=0, limit=None, inplace=False,
new_data = data.interpolate(method=method, axis=ax, index=index,
values=_maybe_transposed_self, limit=limit,
limit_direction=limit_direction,
limit_area=limit_area,
inplace=inplace, downcast=downcast,
**kwargs)

Expand Down
10 changes: 6 additions & 4 deletions pandas/core/internals.py
Original file line number Diff line number Diff line change
Expand Up @@ -907,8 +907,8 @@ def putmask(self, mask, new, align=True, inplace=False, axis=0,

def interpolate(self, method='pad', axis=0, index=None, values=None,
inplace=False, limit=None, limit_direction='forward',
fill_value=None, coerce=False, downcast=None, mgr=None,
**kwargs):
limit_area=None, fill_value=None, coerce=False,
downcast=None, mgr=None, **kwargs):

inplace = validate_bool_kwarg(inplace, 'inplace')

Expand Down Expand Up @@ -949,6 +949,7 @@ def check_int_bool(self, inplace):
return self._interpolate(method=m, index=index, values=values,
axis=axis, limit=limit,
limit_direction=limit_direction,
limit_area=limit_area,
fill_value=fill_value, inplace=inplace,
downcast=downcast, mgr=mgr, **kwargs)

Expand Down Expand Up @@ -983,8 +984,8 @@ def _interpolate_with_fill(self, method='pad', axis=0, inplace=False,

def _interpolate(self, method=None, index=None, values=None,
fill_value=None, axis=0, limit=None,
limit_direction='forward', inplace=False, downcast=None,
mgr=None, **kwargs):
limit_direction='forward', limit_area=None,
inplace=False, downcast=None, mgr=None, **kwargs):
""" interpolate using scipy wrappers """

inplace = validate_bool_kwarg(inplace, 'inplace')
Expand Down Expand Up @@ -1012,6 +1013,7 @@ def func(x):
# i.e. not an arg to missing.interpolate_1d
return missing.interpolate_1d(index, x, method=method, limit=limit,
limit_direction=limit_direction,
limit_area=limit_area,
fill_value=fill_value,
bounds_error=False, **kwargs)

Expand Down
80 changes: 46 additions & 34 deletions pandas/core/missing.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ def clean_interp_method(method, **kwargs):


def interpolate_1d(xvalues, yvalues, method='linear', limit=None,
limit_direction='forward', fill_value=None,
limit_direction='forward', limit_area=None, fill_value=None,
bounds_error=False, order=None, **kwargs):
"""
Logic for the 1-d interpolation. The result should be 1-d, inputs
Expand Down Expand Up @@ -152,31 +152,15 @@ def _interp_limit(invalid, fw_limit, bw_limit):
valid_limit_directions = ['forward', 'backward', 'both']
limit_direction = limit_direction.lower()
if limit_direction not in valid_limit_directions:
raise ValueError('Invalid limit_direction: expecting one of %r, got '
'%r.' % (valid_limit_directions, limit_direction))
raise ValueError('Invalid limit_direction: expecting one of {}, got '
'{}.'.format(valid_limit_directions, limit_direction))

from pandas import Series
ys = Series(yvalues)
start_nans = set(range(ys.first_valid_index()))
end_nans = set(range(1 + ys.last_valid_index(), len(valid)))

# violate_limit is a list of the indexes in the series whose yvalue is
# currently NaN, and should still be NaN after the interpolation.
# Specifically:
#
# If limit_direction='forward' or None then the list will contain NaNs at
# the beginning of the series, and NaNs that are more than 'limit' away
# from the prior non-NaN.
#
# If limit_direction='backward' then the list will contain NaNs at
# the end of the series, and NaNs that are more than 'limit' away
# from the subsequent non-NaN.
#
# If limit_direction='both' then the list will contain NaNs that
# are more than 'limit' away from any non-NaN.
#
# If limit=None, then use default behavior of filling an unlimited number
# of NaNs in the direction specified by limit_direction
if limit_area is not None:
valid_limit_areas = ['inside', 'outside']
limit_area = limit_area.lower()
if limit_area not in valid_limit_areas:
raise ValueError('Invalid limit_area: expecting one of {}, got '
'{}.'.format(valid_limit_areas, limit_area))

# default limit is unlimited GH #16282
if limit is None:
Expand All @@ -186,15 +170,43 @@ def _interp_limit(invalid, fw_limit, bw_limit):
elif limit < 1:
raise ValueError('Limit must be greater than 0')

# each possible limit_direction
from pandas import Series
ys = Series(yvalues)

# These are sets of index pointers to invalid values... i.e. {0, 1, etc...
all_nans = set(np.flatnonzero(invalid))
start_nans = set(range(ys.first_valid_index()))
end_nans = set(range(1 + ys.last_valid_index(), len(valid)))
mid_nans = all_nans - start_nans - end_nans

# Like the sets above, preserve_nans contains indices of invalid values,
# but in this case, it is the final set of indices that need to be
# preserved as NaN after the interpolation.

# For example if limit_direction='forward' then preserve_nans will
# contain indices of NaNs at the beginning of the series, and NaNs that
# are more than'limit' away from the prior non-NaN.

# set preserve_nans based on direction using _interp_limit
if limit_direction == 'forward':
violate_limit = sorted(start_nans |
set(_interp_limit(invalid, limit, 0)))
preserve_nans = start_nans | set(_interp_limit(invalid, limit, 0))
elif limit_direction == 'backward':
violate_limit = sorted(end_nans |
set(_interp_limit(invalid, 0, limit)))
elif limit_direction == 'both':
violate_limit = sorted(_interp_limit(invalid, limit, limit))
preserve_nans = end_nans | set(_interp_limit(invalid, 0, limit))
else:
# both directions... just use _interp_limit
preserve_nans = set(_interp_limit(invalid, limit, limit))

# if limit_area is set, add either mid or outside indices
# to preserve_nans GH #16284
if limit_area == 'inside':
# preserve NaNs on the outside
preserve_nans |= start_nans | end_nans
elif limit_area == 'outside':
# preserve NaNs on the inside
preserve_nans |= mid_nans

# sort preserve_nans and covert to list
preserve_nans = sorted(preserve_nans)

xvalues = getattr(xvalues, 'values', xvalues)
yvalues = getattr(yvalues, 'values', yvalues)
Expand All @@ -211,7 +223,7 @@ def _interp_limit(invalid, fw_limit, bw_limit):
else:
inds = xvalues
result[invalid] = np.interp(inds[invalid], inds[valid], yvalues[valid])
result[violate_limit] = np.nan
result[preserve_nans] = np.nan
return result

sp_methods = ['nearest', 'zero', 'slinear', 'quadratic', 'cubic',
Expand All @@ -230,7 +242,7 @@ def _interp_limit(invalid, fw_limit, bw_limit):
fill_value=fill_value,
bounds_error=bounds_error,
order=order, **kwargs)
result[violate_limit] = np.nan
result[preserve_nans] = np.nan
return result


Expand Down
4 changes: 3 additions & 1 deletion pandas/core/resample.py
Original file line number Diff line number Diff line change
Expand Up @@ -487,7 +487,8 @@ def fillna(self, method, limit=None):

@Appender(_shared_docs['interpolate'] % _shared_docs_kwargs)
def interpolate(self, method='linear', axis=0, limit=None, inplace=False,
limit_direction='forward', downcast=None, **kwargs):
limit_direction='forward', limit_area=None,
downcast=None, **kwargs):
"""
Interpolate values according to different methods.

Expand All @@ -497,6 +498,7 @@ def interpolate(self, method='linear', axis=0, limit=None, inplace=False,
return result.interpolate(method=method, axis=axis, limit=limit,
inplace=inplace,
limit_direction=limit_direction,
limit_area=limit_area,
downcast=downcast, **kwargs)

def asfreq(self, fill_value=None):
Expand Down
39 changes: 39 additions & 0 deletions pandas/tests/series/test_missing.py
Original file line number Diff line number Diff line change
Expand Up @@ -959,6 +959,45 @@ def test_interp_limit_bad_direction(self):
pytest.raises(ValueError, s.interpolate, method='linear',
limit_direction='abc')

# limit_area introduced GH #16284
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put the comment inside the function

def test_interp_limit_area(self):
# These tests are for issue #9218 -- fill NaNs in both directions.
s = Series([nan, nan, 3, nan, nan, nan, 7, nan, nan])

expected = Series([nan, nan, 3., 4., 5., 6., 7., nan, nan])
result = s.interpolate(method='linear', limit_area='inside')
assert_series_equal(result, expected)

expected = Series([nan, nan, 3., 4., nan, nan, 7., nan, nan])
result = s.interpolate(method='linear', limit_area='inside',
limit=1)

expected = Series([nan, nan, 3., 4., nan, 6., 7., nan, nan])
result = s.interpolate(method='linear', limit_area='inside',
limit_direction='both', limit=1)
assert_series_equal(result, expected)

expected = Series([nan, nan, 3., nan, nan, nan, 7., 7., 7.])
result = s.interpolate(method='linear', limit_area='outside')
assert_series_equal(result, expected)

expected = Series([nan, nan, 3., nan, nan, nan, 7., 7., nan])
result = s.interpolate(method='linear', limit_area='outside',
limit=1)

expected = Series([nan, 3., 3., nan, nan, nan, 7., 7., nan])
result = s.interpolate(method='linear', limit_area='outside',
limit_direction='both', limit=1)
assert_series_equal(result, expected)

expected = Series([3., 3., 3., nan, nan, nan, 7., nan, nan])
result = s.interpolate(method='linear', limit_area='outside',
direction='backward')

# raises an error even if limit type is wrong.
pytest.raises(ValueError, s.interpolate, method='linear',
limit_area='abc')

def test_interp_limit_direction(self):
# These tests are for issue #9218 -- fill NaNs in both directions.
s = Series([1, 3, np.nan, np.nan, np.nan, 11])
Expand Down