-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH/REF: More options for interpolation and fillna #4915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -174,6 +174,8 @@ Improvements to existing features | |
- :meth:`~pandas.io.json.json_normalize` is a new method to allow you to create a flat table | ||
from semi-structured JSON data. :ref:`See the docs<io.json_normalize>` (:issue:`1067`) | ||
- ``DataFrame.from_records()`` will now accept generators (:issue:`4910`) | ||
- ``DataFrame.interpolate()`` and ``Series.interpolate()`` have been expanded to include | ||
interpolation methods from scipy. (:issue:`4915`) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. change this to issues 4434, 1892 as that's what this is actually closing |
||
|
||
API Changes | ||
~~~~~~~~~~~ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -614,6 +614,34 @@ Experimental | |
|
||
- Added PySide support for the qtpandas DataFrameModel and DataFrameWidget. | ||
|
||
- DataFrame has a new ``interpolate`` method, similar to Series: | ||
|
||
.. ipython:: python | ||
|
||
df = DataFrame({'A': [1, 2.1, np.nan, 4.7, 5.6, 6.8], | ||
'B': [.25, np.nan, np.nan, 4, 12.2, 14.4]}) | ||
df.interpolate() | ||
|
||
Additionally, the ``method`` argument to ``interpolate`` has been expanded | ||
to include 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', | ||
'barycentric', 'krogh', 'piecewise_polynomial', 'pchip' or "polynomial" or 'spline' | ||
and an integer representing the degree or order of the approximation. The new methods | ||
require scipy_. Consult the Scipy reference guide_ and documentation_ for more information | ||
about when the various methods are appropriate. See also the :ref:`pandas interpolation docs<missing_data.interpolate:>`. | ||
|
||
Interpolate now also accepts a ``limit`` keyword argument. | ||
This works similar to ``fillna``'s limit: | ||
|
||
.. ipython:: python | ||
|
||
ser = Series([1, 3, np.nan, np.nan, np.nan, 11]) | ||
ser.interpolate(limit=2) | ||
|
||
.. _scipy: http://www.scipy.org | ||
.. _documentation: http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation | ||
.. _guide: http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html | ||
|
||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. put closes issues 4434, 1892 here as well |
||
.. _whatsnew_0130.refactoring: | ||
|
||
Internal Refactoring | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1244,6 +1244,153 @@ def backfill_2d(values, limit=None, mask=None): | |
return values | ||
|
||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would move the these routines to core/algorithms.py (not that we have more than one). in case you need helper functions will be easier |
||
def _clean_interp_method(method, order=None, **kwargs): | ||
valid = ['linear', 'time', 'values', 'nearest', 'zero', 'slinear', | ||
'quadratic', 'cubic', 'barycentric', 'polynomial', | ||
'krogh', 'piecewise_polynomial', | ||
'pchip', 'spline'] | ||
if method in ('spline', 'polynomial') and order is None: | ||
raise ValueError("You must specify the order of the spline or " | ||
"polynomial.") | ||
if method not in valid: | ||
raise ValueError("method must be one of {0}." | ||
"Got '{1}' instead.".format(valid, method)) | ||
return method | ||
|
||
|
||
def interpolate_1d(xvalues, yvalues, method='linear', limit=None, | ||
fill_value=None, bounds_error=False, **kwargs): | ||
""" | ||
Logic for the 1-d interpolation. The result should be 1-d, inputs | ||
xvalues and yvalues will each be 1-d arrays of the same length. | ||
|
||
Bounds_error is currently hardcoded to False since non-scipy ones don't | ||
take it as an argumnet. | ||
""" | ||
# Treat the original, non-scipy methods first. | ||
|
||
invalid = isnull(yvalues) | ||
valid = ~invalid | ||
|
||
valid_y = yvalues[valid] | ||
valid_x = xvalues[valid] | ||
new_x = xvalues[invalid] | ||
|
||
if method == 'time': | ||
if not getattr(xvalues, 'is_all_dates', None): | ||
# if not issubclass(xvalues.dtype.type, np.datetime64): | ||
raise ValueError('time-weighted interpolation only works ' | ||
'on Series or DataFrames with a ' | ||
'DatetimeIndex') | ||
method = 'values' | ||
|
||
def _interp_limit(invalid, limit): | ||
"""mask off values that won't be filled since they exceed the limit""" | ||
all_nans = np.where(invalid)[0] | ||
violate = [invalid[x:x + limit + 1] for x in all_nans] | ||
violate = np.array([x.all() & (x.size > limit) for x in violate]) | ||
return all_nans[violate] + limit | ||
|
||
xvalues = getattr(xvalues, 'values', xvalues) | ||
yvalues = getattr(yvalues, 'values', yvalues) | ||
|
||
if limit: | ||
violate_limit = _interp_limit(invalid, limit) | ||
if valid.any(): | ||
firstIndex = valid.argmax() | ||
valid = valid[firstIndex:] | ||
invalid = invalid[firstIndex:] | ||
result = yvalues.copy() | ||
if valid.all(): | ||
return yvalues | ||
else: | ||
# have to call np.array(xvalues) since xvalues could be an Index | ||
# which cant be mutated | ||
result = np.empty_like(np.array(xvalues), dtype=np.float64) | ||
result.fill(np.nan) | ||
return result | ||
|
||
if method in ['linear', 'time', 'values']: | ||
if method in ('values', 'index'): | ||
inds = np.asarray(xvalues) | ||
# hack for DatetimeIndex, #1646 | ||
if issubclass(inds.dtype.type, np.datetime64): | ||
inds = inds.view(pa.int64) | ||
|
||
if inds.dtype == np.object_: | ||
inds = lib.maybe_convert_objects(inds) | ||
else: | ||
inds = xvalues | ||
|
||
inds = inds[firstIndex:] | ||
|
||
result[firstIndex:][invalid] = np.interp(inds[invalid], inds[valid], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can u do this without chaining assignment There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry I'm not sure what chaining assignment is. Is it doing the operation on the RHS and assigning it to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. by chaining i just mean that you're doing a slice then an index in one expression forget my first comment ... this (and the one below) is ok since firstindex will be an int and a slice in numpy always returns a view |
||
yvalues[firstIndex:][valid]) | ||
|
||
if limit: | ||
result[violate_limit] = np.nan | ||
return result | ||
|
||
sp_methods = ['nearest', 'zero', 'slinear', 'quadratic', 'cubic', | ||
'barycentric', 'krogh', 'spline', 'polynomial', | ||
'piecewise_polynomial', 'pchip'] | ||
if method in sp_methods: | ||
new_x = new_x[firstIndex:] | ||
xvalues = xvalues[firstIndex:] | ||
|
||
result[firstIndex:][invalid] = _interpolate_scipy_wrapper(valid_x, | ||
valid_y, new_x, method=method, fill_value=fill_value, | ||
bounds_error=bounds_error, **kwargs) | ||
if limit: | ||
result[violate_limit] = np.nan | ||
return result | ||
|
||
|
||
def _interpolate_scipy_wrapper(x, y, new_x, method, fill_value=None, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add |
||
bounds_error=False, order=None, **kwargs): | ||
""" | ||
passed off to scipy.interpolate.interp1d. method is scipy's kind. | ||
Returns an array interpolated at new_x. Add any new methods to | ||
the list in _clean_interp_method | ||
""" | ||
try: | ||
from scipy import interpolate | ||
except ImportError: | ||
raise ImportError('{0} interpolation requires Scipy'.format(method)) | ||
|
||
new_x = np.asarray(new_x) | ||
|
||
# ignores some kwargs that could be passed along. | ||
alt_methods = { | ||
'barycentric': interpolate.barycentric_interpolate, | ||
'krogh': interpolate.krogh_interpolate, | ||
'piecewise_polynomial': interpolate.piecewise_polynomial_interpolate, | ||
} | ||
|
||
try: | ||
alt_methods['pchip'] = interpolate.pchip_interpolate | ||
except AttributeError: | ||
if method == 'pchip': | ||
raise ImportError("Your version of scipy does not support " | ||
"PCHIP interpolation.") | ||
|
||
interp1d_methods = ['nearest', 'zero', 'slinear', 'quadratic', 'cubic', | ||
'polynomial'] | ||
if method in interp1d_methods: | ||
if method == 'polynomial': | ||
method = order | ||
terp = interpolate.interp1d(x, y, kind=method, fill_value=fill_value, | ||
bounds_error=bounds_error) | ||
new_y = terp(new_x) | ||
elif method == 'spline': | ||
terp = interpolate.UnivariateSpline(x, y, k=order) | ||
new_y = terp(new_x) | ||
else: | ||
method = alt_methods[method] | ||
new_y = method(x, y, new_x) | ||
return new_y | ||
|
||
|
||
def interpolate_2d(values, method='pad', axis=0, limit=None, fill_value=None): | ||
""" perform an actual interpolation of values, values will be make 2-d if needed | ||
fills inplace, returns the result """ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe put a 1-liner explaining the methods/uses?