-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
pandas resample weekly and interpolate - wrong results #16381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
pls replace the top of the issue with a copy-pastable example and |
@jreback done! |
can you remove the rendered frames. simply run this in ipython and paste the results. |
@jreback what is wrong with frames? i don't work with pandas in ipython terminal. |
@denfromufa can you post your expected output? I think you're getting tripped up by the endpoints. When you do In [42]: pdtest.resample("W-MON")._upsample(None).head()
Out[42]:
0
1900-02-05 NaN
1900-02-12 NaN
1900-02-19 NaN
1900-02-26 NaN
1900-03-05 NaN and then interpolated. Since the original left-endpoint doesn't align with a W-MON freq, you get NaN and then If you resample at a freq that does align with your original first point, it will be I think what you expect: In [43]: pdtest.resample("W-WED")._upsample(None).head()
Out[43]:
0
1900-01-31 12.0
1900-02-07 NaN
1900-02-14 NaN
1900-02-21 NaN
1900-02-28 11.0
In [45]: pdtest.resample("W-WED")._upsample(None).interpolate().head()
Out[45]:
0
1900-01-31 12.00
1900-02-07 11.75
1900-02-14 11.50
1900-02-21 11.25
1900-02-28 11.00 |
@TomAugspurger this is good explanation, but I expected interpolation even for mis-aligned data. I think for weekly interpolation the safest option to use is like this:
But the most and only upvoted answer on SO suggests what I did originally: Anyway I'm having even a bigger problem with original weekly interpolation method, let me open another issue for it. |
Agreed that it's a surprising output, unless your familiar with how it's implemented. I'm not sure there's much we can do though... Potentially we could fill the endpoints of the upsampled DataFrame with the original endpoints? # would have to handle dataframe's properly, but this is the main idea
In [34]: up.squeeze().fillna({up.index[0]: pdtest.iloc[0, 0]}).interpolate().head()
Out[34]:
1900-02-04 12.000000
1900-02-11 11.764706
1900-02-18 11.529412
1900-02-25 11.294118
1900-03-04 11.058824
Freq: W-SUN, Name: 0, dtype: float64
In [35]: w = pdtest.resample("W")
In [36]: up = w._upsample(None)
In [37]: up.squeeze().fillna({up.index[0]: pdtest.iloc[0, 0]}).interpolate().head()
Out[37]:
1900-02-04 12.000000
1900-02-11 11.764706
1900-02-18 11.529412
1900-02-25 11.294118
1900-03-04 11.058824
Freq: W-SUN, Name: 0, dtype: float64 we would want to look at if that breaks in violations upsampling. |
@TomAugspurger i think one problem here is that syntax with keyword for |
related? |
I found this problem too... when it can be fixed? |
@eromoe here is a workaround: |
The text was updated successfully, but these errors were encountered: