pandas resample weekly and interpolate - wrong results #16381

den-run-ai · 2017-05-17T23:35:03Z

import pandas as pd

pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.20.1
pytest: 2.9.2
pip: 8.1.2
setuptools: 34.4.1
Cython: 0.24.1
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.4.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.6.1
blosc: None
bottleneck: 1.2.0
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.0
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.7.2
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

pd.date_range("1900/01/01","1900/12/31",freq='M')

DatetimeIndex(['1900-01-31', '1900-02-28', '1900-03-31', '1900-04-30',
               '1900-05-31', '1900-06-30', '1900-07-31', '1900-08-31',
               '1900-09-30', '1900-10-31', '1900-11-30', '1900-12-31'],
              dtype='datetime64[ns]', freq='M')

pdtest=pd.DataFrame(data=list(range(12,0,-1)),index=pd.date_range("1900/01/01","1900/12/31",freq='M'))
pdtest

	0
1900-01-31	12
1900-02-28	11
1900-03-31	10
1900-04-30	9
1900-05-31	8
1900-06-30	7
1900-07-31	6
1900-08-31	5
1900-09-30	4
1900-10-31	3
1900-11-30	2
1900-12-31	1

pdtest.resample('D').interpolate()[:15]

	0
1900-01-31	12.000000
1900-02-01	11.964286
1900-02-02	11.928571
1900-02-03	11.892857
1900-02-04	11.857143
1900-02-05	11.821429
1900-02-06	11.785714
1900-02-07	11.750000
1900-02-08	11.714286
1900-02-09	11.678571
1900-02-10	11.642857
1900-02-11	11.607143
1900-02-12	11.571429
1900-02-13	11.535714
1900-02-14	11.500000

pdtest.resample('W-MON').interpolate()

	0
1900-02-05	NaN
1900-02-12	NaN
1900-02-19	NaN
1900-02-26	NaN
1900-03-05	NaN
1900-03-12	NaN
1900-03-19	NaN
1900-03-26	NaN
1900-04-02	NaN
1900-04-09	NaN
1900-04-16	NaN
1900-04-23	NaN
1900-04-30	9.000000
1900-05-07	8.771429
1900-05-14	8.542857
1900-05-21	8.314286
1900-05-28	8.085714
1900-06-04	7.857143
1900-06-11	7.628571
1900-06-18	7.400000
1900-06-25	7.171429
1900-07-02	6.942857
1900-07-09	6.714286
1900-07-16	6.485714
1900-07-23	6.257143
1900-07-30	6.028571
1900-08-06	5.800000
1900-08-13	5.571429
1900-08-20	5.342857
1900-08-27	5.114286
1900-09-03	4.885714
1900-09-10	4.657143
1900-09-17	4.428571
1900-09-24	4.200000
1900-10-01	3.971429
1900-10-08	3.742857
1900-10-15	3.514286
1900-10-22	3.285714
1900-10-29	3.057143
1900-11-05	2.828571
1900-11-12	2.600000
1900-11-19	2.371429
1900-11-26	2.142857
1900-12-03	1.914286
1900-12-10	1.685714
1900-12-17	1.457143
1900-12-24	1.228571
1900-12-31	1.000000

The text was updated successfully, but these errors were encountered:

jreback · 2017-05-17T23:39:06Z

pls replace the top of the issue with a copy-pastable example and pd.show_versions() as indicated in the issue request page.

den-run-ai · 2017-05-18T02:36:47Z

@jreback done!

jreback · 2017-05-18T10:20:15Z

can you remove the rendered frames. simply run this in ipython and paste the results.

den-run-ai · 2017-05-18T14:18:29Z

@jreback what is wrong with frames? i don't work with pandas in ipython terminal.

TomAugspurger · 2017-05-18T14:34:46Z

@denfromufa can you post your expected output?

I think you're getting tripped up by the endpoints. When you do pdtest.resample('W-MON').interpolate(), the array is upsampled

In [42]: pdtest.resample("W-MON")._upsample(None).head()
Out[42]:
             0
1900-02-05 NaN
1900-02-12 NaN
1900-02-19 NaN
1900-02-26 NaN
1900-03-05 NaN

and then interpolated.

Since the original left-endpoint doesn't align with a W-MON freq, you get NaN and then .interpolate will be NaN for everything before the first valid (upsampled) observation.

If you resample at a freq that does align with your original first point, it will be I think what you expect:

In [43]: pdtest.resample("W-WED")._upsample(None).head()
Out[43]:
               0
1900-01-31  12.0
1900-02-07   NaN
1900-02-14   NaN
1900-02-21   NaN
1900-02-28  11.0

In [45]: pdtest.resample("W-WED")._upsample(None).interpolate().head()
Out[45]:
                0
1900-01-31  12.00
1900-02-07  11.75
1900-02-14  11.50
1900-02-21  11.25
1900-02-28  11.00

den-run-ai · 2017-05-18T14:45:53Z

@TomAugspurger this is good explanation, but I expected interpolation even for mis-aligned data. I think for weekly interpolation the safest option to use is like this:

pdtest.resample('D').interpolate()[::7]

But the most and only upvoted answer on SO suggests what I did originally:
http://stackoverflow.com/a/14531149/2230844

Anyway I'm having even a bigger problem with original weekly interpolation method, let me open another issue for it.

TomAugspurger · 2017-05-18T15:44:41Z

Agreed that it's a surprising output, unless your familiar with how it's implemented. I'm not sure there's much we can do though... Potentially we could fill the endpoints of the upsampled DataFrame with the original endpoints?

# would have to handle dataframe's properly, but this is the main idea
In [34]: up.squeeze().fillna({up.index[0]: pdtest.iloc[0, 0]}).interpolate().head()
Out[34]:
1900-02-04    12.000000
1900-02-11    11.764706
1900-02-18    11.529412
1900-02-25    11.294118
1900-03-04    11.058824
Freq: W-SUN, Name: 0, dtype: float64

In [35]: w = pdtest.resample("W")

In [36]: up = w._upsample(None)

In [37]: up.squeeze().fillna({up.index[0]: pdtest.iloc[0, 0]}).interpolate().head()
Out[37]:
1900-02-04    12.000000
1900-02-11    11.764706
1900-02-18    11.529412
1900-02-25    11.294118
1900-03-04    11.058824
Freq: W-SUN, Name: 0, dtype: float64

we would want to look at if that breaks in violations upsampling.

den-run-ai · 2017-05-18T15:55:38Z

@TomAugspurger i think one problem here is that syntax with keyword for fill_method='interpolate' is deprecated in .resample(). But the resampling can be dependent on the method like in this case.

den-run-ai · 2017-08-06T06:01:07Z

related?

#14297

eromoe · 2018-11-12T06:35:28Z

I found this problem too... when it can be fixed?

den-run-ai · 2019-01-11T15:14:34Z

@eromoe here is a workaround:

https://stackoverflow.com/a/44053092/2230844

Make42 mentioned this issue Nov 10, 2017

BUG: (linear) interpolation after resampling #18189

Closed

toobaz added Datetime Datetime data dtype Resample resample method labels Jan 7, 2019

mroeschke added the Bug label Mar 31, 2020

mroeschke added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label Jun 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pandas resample weekly and interpolate - wrong results #16381

pandas resample weekly and interpolate - wrong results #16381

den-run-ai commented May 17, 2017 •

edited

Loading

jreback commented May 17, 2017 •

edited

Loading

den-run-ai commented May 18, 2017

jreback commented May 18, 2017

den-run-ai commented May 18, 2017

TomAugspurger commented May 18, 2017 •

edited

Loading

den-run-ai commented May 18, 2017 •

edited

Loading

TomAugspurger commented May 18, 2017

den-run-ai commented May 18, 2017 •

edited

Loading

den-run-ai commented Aug 6, 2017

eromoe commented Nov 12, 2018

den-run-ai commented Jan 11, 2019

pandas resample weekly and interpolate - wrong results #16381

pandas resample weekly and interpolate - wrong results #16381

Comments

den-run-ai commented May 17, 2017 • edited Loading

jreback commented May 17, 2017 • edited Loading

den-run-ai commented May 18, 2017

jreback commented May 18, 2017

den-run-ai commented May 18, 2017

TomAugspurger commented May 18, 2017 • edited Loading

den-run-ai commented May 18, 2017 • edited Loading

TomAugspurger commented May 18, 2017

den-run-ai commented May 18, 2017 • edited Loading

den-run-ai commented Aug 6, 2017

eromoe commented Nov 12, 2018

den-run-ai commented Jan 11, 2019

den-run-ai commented May 17, 2017 •

edited

Loading

jreback commented May 17, 2017 •

edited

Loading

TomAugspurger commented May 18, 2017 •

edited

Loading

den-run-ai commented May 18, 2017 •

edited

Loading

den-run-ai commented May 18, 2017 •

edited

Loading