Series.interpolate with index of type float gives wrong result #1206

GlenHertz · 2012-05-07T19:24:40Z

Hi,

The logic for Series.interpolate assumes the indexes are equally spaced. With a floating point index this is not the desired interpolation. For example:

x1 = np.array([0, 0.25, 0.77, 1.2, 1.4, 2.6, 3.1])
y1 = np.array([0, 1.1, 0.5, 1.5, 1.2, 2.1, 2.4])
x2 = np.array([0, 0.25, 0.66, 1.0, 1.2, 1.4, 3.1])
y2 = np.array([0, 0.2, 0.8, 1.1, 2.2, 0.1, 2.4])

df1 = DataFrame(data=y1, index=x1, columns=['A'])
df1.plot(marker='o')

df2 = DataFrame(data=y2, index=x2, columns=['A'])
df2.plot(marker='o')

df3=df1 - df2
df3.plot(marker='o')
print df3

def resample(signals):
    aligned_x_vals = reduce(lambda s1, s2: s1.index.union(s2.index), signals)
    return map(lambda s: s.reindex(aligned_x_vals).apply(Series.interpolate), signals)

sig1, sig2 = resample([df1, df2])
sig3 = sig1 - sig2
plt.plot(df1.index, df1.values, marker='D')
plt.plot(sig1.index, sig1.values, marker='o')
plt.grid()
plt.figure()
plt.plot(df2.index, df2.values, marker='o')
plt.plot(sig2.index ,sig2.values, marker='o')
plt.grid()

I expect sig1 and sig2 to have more points than df1 and df2 but with the values interpolated. There are a few points that are not overlapping because it is assumed they are equally spaced. In my opinion if the index is a floating point the user wants to interpolate by the index's value and don't assume they are equally spaced. It should do something like this:

import numpy as np
from pandas import *

def interpolate(serie):
    try:
        inds = np.array([float(d) for d in serie.index])
    except ValueError:
        inds = np.arange(len(serie))

    values = serie.values

    invalid = isnull(values)
    valid = -invalid

    firstIndex = valid.argmax()
    valid = valid[firstIndex:]
    invalid = invalid[firstIndex:]
    inds = inds[firstIndex:]

    result = values.copy()
    result[firstIndex:][invalid] = np.interp(inds[invalid], inds[valid],
                                             values[firstIndex:][valid])

    return Series(result, index=serie.index, name=serie.name)

Thanks

wesm · 2012-05-29T00:55:18Z

This is implemented in git master now and will be part of the 0.8.0 release

Version 0.8.0 beta 1 * tag 'v0.8.0b1': (703 commits) RLS: 0.8.0 beta 1 RLS: 0.8.0beta RLS: release notes ENH: add option to use Series.values to interpolate, close pandas-dev#1206 TST: testing to close pandas-dev#1331 DOC: groupby drop duplicate index pandas-dev#1312 ENH: tz_convert for DataFrame pandas-dev#1330 ENH: add NA handling to scatter_matrix, close pandas-dev#1297 BUG: display localtime in DatetimeIndex.__repr__, close pandas-dev#1336 DOC: draft of timeseries section of docs. Added Period related documentation and examples DOC: timezone handling and started on Period DOC: rough draft of DatetimeIndex, date_range, shifting/resampling etc DOC: more ts docs. Need to do resampling then PeriodIndex DOC: starting deeper revamp of ts docs for 0.8 BUG: raise exception for unintellible frequency strings, close pandas-dev#1328 ENH: construct PeriodIndex from arrays of fields, allow negative ordinals. close pandas-dev#1333 and pandas-dev#1264 BUG: tsplot fix with business freq pandas-dev#1332 BUG: DatetimeIndex partial slicing bug, tsplot kludge around pandas-dev#1332 BUG: alias W to W-SUN, add test for buglet close pandas-dev#1327 ENH: mix arrays and scalars in DataFrame constructor, close pandas-dev#1329 ...

nlsn · 2012-07-21T13:57:54Z

This seems to still be a problem as of 0.8.1.dev-e2633d4.

import pandas
import numpy as np
import pylab as pl
from scipy.interpolate import interp1d

time_fast = np.arange(50000.,50010.,.4) +.1
time_slow = np.arange(50000.,50010.,1.)

x_fast = np.sin(time_fast)
x_slow = np.sin(time_slow)

df_fast = pandas.DataFrame(x_fast, index=time_fast, columns=['fast'])
df_slow = pandas.DataFrame(x_slow, index=time_slow, columns=['slow'])

df_joined = df_fast.join(df_slow, how='outer')

df_joined['pandas interpolate'] = df_joined['slow'].interpolate()

f = interp1d(df_slow.index, df_slow['slow'], bounds_error=False)
df_joined['scipy interp1d'] = f(df_joined.index)

df_joined['pandas interpolate'].plot(style='o')
df_joined['scipy interp1d'].plot(style='o')
df_slow['slow'].plot(style='r.:')

pl.title('Linearly interpolated points are expected to lie on the dotted red lines.')

pl.legend()
pl.show()

nlsn · 2012-07-21T14:23:47Z

I also observed this with indices that are Datetime objects. The title of this issue may be too narrow.

wesm · 2012-07-21T16:51:10Z

@nlsn you have to do:

df_joined['slow'].interpolate(method='values')

The default of interpolate assumes that each value is evenly spaced, while method='values' uses the index values

wesm mentioned this issue May 18, 2012

Improve Series.interpolate to use index values #1255

Closed

wesm closed this as completed in 8526264 May 29, 2012

wesm reopened this Jul 21, 2012

wesm closed this as completed Jul 21, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Series.interpolate with index of type float gives wrong result #1206

Series.interpolate with index of type float gives wrong result #1206

GlenHertz commented May 7, 2012

wesm commented May 29, 2012

Uh oh!

nlsn commented Jul 21, 2012

Uh oh!

nlsn commented Jul 21, 2012

Uh oh!

wesm commented Jul 21, 2012

Uh oh!

Uh oh!

Series.interpolate with index of type float gives wrong result #1206

Series.interpolate with index of type float gives wrong result #1206

Comments

GlenHertz commented May 7, 2012

wesm commented May 29, 2012

Uh oh!

nlsn commented Jul 21, 2012

Uh oh!

nlsn commented Jul 21, 2012

Uh oh!

wesm commented Jul 21, 2012

Uh oh!