Skip to content

Commit aff7346

Browse files
TomAugspurgerjreback
TomAugspurger
authored andcommitted
ENH/REF: Additional methods for interpolate
ENH: the interpolate method argument can take more values for various types of interpolation REF: Moves Series.interpolate to core/generic. DataFrame gets interpolate CLN: clean up interpolate to use blocks ENH: Add additonal 1-d scipy interpolaters. DOC: examples for df interpolate and a plot DOC: release notes DOC: Scipy links and more expanation API: Don't use fill_value BUG: Raise on panels. API: Raise on non monotonic indecies if it matters BUG: Raise on only mixed types. ENH/DOC: Add `spline` interpolation. DOC: naming consistency
1 parent 27e4fb1 commit aff7346

File tree

9 files changed

+863
-240
lines changed

9 files changed

+863
-240
lines changed

doc/source/missing_data.rst

+87-2
Original file line numberDiff line numberDiff line change
@@ -271,8 +271,13 @@ examined :ref:`in the API <api.dataframe.missing>`.
271271
Interpolation
272272
~~~~~~~~~~~~~
273273

274-
A linear **interpolate** method has been implemented on Series. The default
275-
interpolation assumes equally spaced points.
274+
.. versionadded:: 0.13.0
275+
276+
DataFrame now has the interpolation method.
277+
:meth:`~pandas.Series.interpolate` also gained some additional methods.
278+
279+
Both Series and Dataframe objects have an ``interpolate`` method that, by default,
280+
performs linear interpolation at missing datapoints.
276281

277282
.. ipython:: python
278283
:suppress:
@@ -328,6 +333,86 @@ For a floating-point index, use ``method='values'``:
328333
329334
ser.interpolate(method='values')
330335
336+
You can also interpolate with a DataFrame:
337+
338+
.. ipython:: python
339+
340+
df = DataFrame({'A': [1, 2.1, np.nan, 4.7, 5.6, 6.8],
341+
'B': [.25, np.nan, np.nan, 4, 12.2, 14.4]})
342+
df.interpolate()
343+
344+
The ``method`` argument gives access to fancier interpolation methods.
345+
If you have scipy_ installed, you can set pass the name of a 1-d interpolation routine to ``method``.
346+
You'll want to consult the full scipy interpolation documentation_ and reference guide_ for details.
347+
The appropriate interpolation method will depend on the type of data you are working with.
348+
For example, if you are dealing with a time series that is growing at an increasing rate,
349+
``method='quadratic'`` may be appropriate. If you have values approximating a cumulative
350+
distribution function, then ``method='pchip'`` should work well.
351+
352+
.. warning::
353+
354+
These methods require ``scipy``.
355+
356+
.. ipython:: python
357+
358+
df.interpolate(method='barycentric')
359+
360+
df.interpolate(method='pchip')
361+
362+
When interpolating via a polynomial or spline approximation, you must also specify
363+
the degree or order of the approximation:
364+
365+
.. ipython:: python
366+
367+
df.interpolate(method='spline', order=2)
368+
369+
df.interpolate(method='polynomial', order=2)
370+
371+
Compare several methods:
372+
373+
.. ipython:: python
374+
375+
np.random.seed(2)
376+
377+
ser = Series(np.arange(1, 10.1, .25)**2 + np.random.randn(37))
378+
bad = np.array([4, 13, 14, 15, 16, 17, 18, 20, 29, 34, 35, 36])
379+
ser[bad] = np.nan
380+
methods = ['linear', 'quadratic', 'cubic']
381+
382+
df = DataFrame({m: s.interpolate(method=m) for m in methods})
383+
@savefig compare_interpolations.png
384+
df.plot()
385+
386+
Another use case is interpolation at *new* values.
387+
Suppose you have 100 observations from some distribution. And let's suppose
388+
that you're particularly interested in what's happening around the middle.
389+
You can mix pandas' ``reindex`` and ``interpolate`` methods to interpolate
390+
at the new values.
391+
392+
.. ipython:: python
393+
394+
ser = Series(np.sort(np.random.uniform(size=100)))
395+
396+
# interpolate at new_index
397+
new_index = ser.index + Index([49.25, 49.5, 49.75, 50.25, 50.5, 50.75])
398+
399+
interp_s = ser.reindex(new_index).interpolate(method='pchip')
400+
401+
interp_s[49:51]
402+
403+
.. _scipy: http://www.scipy.org
404+
.. _documentation: http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation
405+
.. _guide: http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html
406+
407+
408+
Like other pandas fill methods, ``interpolate`` accepts a ``limit`` keyword argument.
409+
Use this to limit the number of consecutive interpolations, keeping ``NaN``s for interpolations that are too far from the last valid observation:
410+
411+
.. ipython:: python
412+
413+
ser = Series([1, 3, np.nan, np.nan, np.nan, 11])
414+
ser.interpolate(limit=2)
415+
331416
.. _missing_data.replace:
332417
333418
Replacing Generic Values

doc/source/release.rst

+3-1
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ Experimental Features
7878
- Add msgpack support via ``pd.read_msgpack()`` and ``pd.to_msgpack()`` / ``df.to_msgpack()`` for serialization
7979
of arbitrary pandas (and python objects) in a lightweight portable binary format (:issue:`686`)
8080
- Added PySide support for the qtpandas DataFrameModel and DataFrameWidget.
81-
- Added :mod:`pandas.io.gbq` for reading from (and writing to) Google BigQuery into a DataFrame. (:issue:`4140`)
81+
- Added :mod:`pandas.io.gbq` for reading from (and writing to) Google BigQuery into a DataFrame. (:issue:`4140`)
8282

8383
Improvements to existing features
8484
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -174,6 +174,8 @@ Improvements to existing features
174174
- :meth:`~pandas.io.json.json_normalize` is a new method to allow you to create a flat table
175175
from semi-structured JSON data. :ref:`See the docs<io.json_normalize>` (:issue:`1067`)
176176
- ``DataFrame.from_records()`` will now accept generators (:issue:`4910`)
177+
- ``DataFrame.interpolate()`` and ``Series.interpolate()`` have been expanded to include
178+
interpolation methods from scipy. (:issue:`4434`, :issue:`1892`)
177179

178180
API Changes
179181
~~~~~~~~~~~

doc/source/v0.13.0.txt

+28
Original file line numberDiff line numberDiff line change
@@ -614,6 +614,34 @@ Experimental
614614

615615
- Added PySide support for the qtpandas DataFrameModel and DataFrameWidget.
616616

617+
- DataFrame has a new ``interpolate`` method, similar to Series (:issue:`4434`, :issue:`1892`)
618+
619+
.. ipython:: python
620+
621+
df = DataFrame({'A': [1, 2.1, np.nan, 4.7, 5.6, 6.8],
622+
'B': [.25, np.nan, np.nan, 4, 12.2, 14.4]})
623+
df.interpolate()
624+
625+
Additionally, the ``method`` argument to ``interpolate`` has been expanded
626+
to include 'nearest', 'zero', 'slinear', 'quadratic', 'cubic',
627+
'barycentric', 'krogh', 'piecewise_polynomial', 'pchip' or "polynomial" or 'spline'
628+
and an integer representing the degree or order of the approximation. The new methods
629+
require scipy_. Consult the Scipy reference guide_ and documentation_ for more information
630+
about when the various methods are appropriate. See also the :ref:`pandas interpolation docs<missing_data.interpolate:>`.
631+
632+
Interpolate now also accepts a ``limit`` keyword argument.
633+
This works similar to ``fillna``'s limit:
634+
635+
.. ipython:: python
636+
637+
ser = Series([1, 3, np.nan, np.nan, np.nan, 11])
638+
ser.interpolate(limit=2)
639+
640+
.. _scipy: http://www.scipy.org
641+
.. _documentation: http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation
642+
.. _guide: http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html
643+
644+
617645
.. _whatsnew_0130.refactoring:
618646

619647
Internal Refactoring

pandas/core/common.py

+147
Original file line numberDiff line numberDiff line change
@@ -1244,6 +1244,153 @@ def backfill_2d(values, limit=None, mask=None):
12441244
return values
12451245

12461246

1247+
def _clean_interp_method(method, order=None, **kwargs):
1248+
valid = ['linear', 'time', 'values', 'nearest', 'zero', 'slinear',
1249+
'quadratic', 'cubic', 'barycentric', 'polynomial',
1250+
'krogh', 'piecewise_polynomial',
1251+
'pchip', 'spline']
1252+
if method in ('spline', 'polynomial') and order is None:
1253+
raise ValueError("You must specify the order of the spline or "
1254+
"polynomial.")
1255+
if method not in valid:
1256+
raise ValueError("method must be one of {0}."
1257+
"Got '{1}' instead.".format(valid, method))
1258+
return method
1259+
1260+
1261+
def interpolate_1d(xvalues, yvalues, method='linear', limit=None,
1262+
fill_value=None, bounds_error=False, **kwargs):
1263+
"""
1264+
Logic for the 1-d interpolation. The result should be 1-d, inputs
1265+
xvalues and yvalues will each be 1-d arrays of the same length.
1266+
1267+
Bounds_error is currently hardcoded to False since non-scipy ones don't
1268+
take it as an argumnet.
1269+
"""
1270+
# Treat the original, non-scipy methods first.
1271+
1272+
invalid = isnull(yvalues)
1273+
valid = ~invalid
1274+
1275+
valid_y = yvalues[valid]
1276+
valid_x = xvalues[valid]
1277+
new_x = xvalues[invalid]
1278+
1279+
if method == 'time':
1280+
if not getattr(xvalues, 'is_all_dates', None):
1281+
# if not issubclass(xvalues.dtype.type, np.datetime64):
1282+
raise ValueError('time-weighted interpolation only works '
1283+
'on Series or DataFrames with a '
1284+
'DatetimeIndex')
1285+
method = 'values'
1286+
1287+
def _interp_limit(invalid, limit):
1288+
"""mask off values that won't be filled since they exceed the limit"""
1289+
all_nans = np.where(invalid)[0]
1290+
violate = [invalid[x:x + limit + 1] for x in all_nans]
1291+
violate = np.array([x.all() & (x.size > limit) for x in violate])
1292+
return all_nans[violate] + limit
1293+
1294+
xvalues = getattr(xvalues, 'values', xvalues)
1295+
yvalues = getattr(yvalues, 'values', yvalues)
1296+
1297+
if limit:
1298+
violate_limit = _interp_limit(invalid, limit)
1299+
if valid.any():
1300+
firstIndex = valid.argmax()
1301+
valid = valid[firstIndex:]
1302+
invalid = invalid[firstIndex:]
1303+
result = yvalues.copy()
1304+
if valid.all():
1305+
return yvalues
1306+
else:
1307+
# have to call np.array(xvalues) since xvalues could be an Index
1308+
# which cant be mutated
1309+
result = np.empty_like(np.array(xvalues), dtype=np.float64)
1310+
result.fill(np.nan)
1311+
return result
1312+
1313+
if method in ['linear', 'time', 'values']:
1314+
if method in ('values', 'index'):
1315+
inds = np.asarray(xvalues)
1316+
# hack for DatetimeIndex, #1646
1317+
if issubclass(inds.dtype.type, np.datetime64):
1318+
inds = inds.view(pa.int64)
1319+
1320+
if inds.dtype == np.object_:
1321+
inds = lib.maybe_convert_objects(inds)
1322+
else:
1323+
inds = xvalues
1324+
1325+
inds = inds[firstIndex:]
1326+
1327+
result[firstIndex:][invalid] = np.interp(inds[invalid], inds[valid],
1328+
yvalues[firstIndex:][valid])
1329+
1330+
if limit:
1331+
result[violate_limit] = np.nan
1332+
return result
1333+
1334+
sp_methods = ['nearest', 'zero', 'slinear', 'quadratic', 'cubic',
1335+
'barycentric', 'krogh', 'spline', 'polynomial',
1336+
'piecewise_polynomial', 'pchip']
1337+
if method in sp_methods:
1338+
new_x = new_x[firstIndex:]
1339+
xvalues = xvalues[firstIndex:]
1340+
1341+
result[firstIndex:][invalid] = _interpolate_scipy_wrapper(valid_x,
1342+
valid_y, new_x, method=method, fill_value=fill_value,
1343+
bounds_error=bounds_error, **kwargs)
1344+
if limit:
1345+
result[violate_limit] = np.nan
1346+
return result
1347+
1348+
1349+
def _interpolate_scipy_wrapper(x, y, new_x, method, fill_value=None,
1350+
bounds_error=False, order=None, **kwargs):
1351+
"""
1352+
passed off to scipy.interpolate.interp1d. method is scipy's kind.
1353+
Returns an array interpolated at new_x. Add any new methods to
1354+
the list in _clean_interp_method
1355+
"""
1356+
try:
1357+
from scipy import interpolate
1358+
except ImportError:
1359+
raise ImportError('{0} interpolation requires Scipy'.format(method))
1360+
1361+
new_x = np.asarray(new_x)
1362+
1363+
# ignores some kwargs that could be passed along.
1364+
alt_methods = {
1365+
'barycentric': interpolate.barycentric_interpolate,
1366+
'krogh': interpolate.krogh_interpolate,
1367+
'piecewise_polynomial': interpolate.piecewise_polynomial_interpolate,
1368+
}
1369+
1370+
try:
1371+
alt_methods['pchip'] = interpolate.pchip_interpolate
1372+
except AttributeError:
1373+
if method == 'pchip':
1374+
raise ImportError("Your version of scipy does not support "
1375+
"PCHIP interpolation.")
1376+
1377+
interp1d_methods = ['nearest', 'zero', 'slinear', 'quadratic', 'cubic',
1378+
'polynomial']
1379+
if method in interp1d_methods:
1380+
if method == 'polynomial':
1381+
method = order
1382+
terp = interpolate.interp1d(x, y, kind=method, fill_value=fill_value,
1383+
bounds_error=bounds_error)
1384+
new_y = terp(new_x)
1385+
elif method == 'spline':
1386+
terp = interpolate.UnivariateSpline(x, y, k=order)
1387+
new_y = terp(new_x)
1388+
else:
1389+
method = alt_methods[method]
1390+
new_y = method(x, y, new_x)
1391+
return new_y
1392+
1393+
12471394
def interpolate_2d(values, method='pad', axis=0, limit=None, fill_value=None):
12481395
""" perform an actual interpolation of values, values will be make 2-d if needed
12491396
fills inplace, returns the result """

0 commit comments

Comments
 (0)