Skip to content

resample().interpolate() should not fill pre-existing NaNs #17868

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
shoyer opened this issue Oct 13, 2017 · 0 comments
Open

resample().interpolate() should not fill pre-existing NaNs #17868

shoyer opened this issue Oct 13, 2017 · 0 comments
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Resample resample method

Comments

@shoyer
Copy link
Member

shoyer commented Oct 13, 2017

Currently, interpolate() as part of resample() currently fills in all existing NaN values:

import pandas as pd

all_times = pd.date_range('2016-01-01', '2016-01-8')
times = all_times[1:3].append(all_times[4:-2])  # time coord with some missing days
s = pd.Series(range(len(times)), index=times)
>>> s
2016-01-02    0
2016-01-03    1
2016-01-05    2
2016-01-06    3
dtype: int64

>>> s.reindex(all_times)
2016-01-01    NaN
2016-01-02    0.0
2016-01-03    1.0
2016-01-04    NaN
2016-01-05    2.0
2016-01-06    3.0
2016-01-07    NaN
2016-01-08    NaN
Freq: D, dtype: float64

>>> s.reindex(all_times).resample('12H').interpolate()
2016-01-01 00:00:00     NaN
2016-01-01 12:00:00     NaN
2016-01-02 00:00:00    0.00
2016-01-02 12:00:00    0.50
2016-01-03 00:00:00    1.00
2016-01-03 12:00:00    1.25
2016-01-04 00:00:00    1.50
2016-01-04 12:00:00    1.75
2016-01-05 00:00:00    2.00
2016-01-05 12:00:00    2.50
2016-01-06 00:00:00    3.00
2016-01-06 12:00:00    3.00
2016-01-07 00:00:00    3.00
2016-01-07 12:00:00    3.00
2016-01-08 00:00:00    3.00
Freq: 12H, dtype: float64

This is inconsistent with the other fill methods, which only fill in NaNs introduced by upsampling:

>>> s.reindex(all_times).resample('12H').ffill() 
2016-01-01 00:00:00    NaN
2016-01-01 12:00:00    NaN
2016-01-02 00:00:00    0.0
2016-01-02 12:00:00    0.0
2016-01-03 00:00:00    1.0
2016-01-03 12:00:00    1.0
2016-01-04 00:00:00    NaN
2016-01-04 12:00:00    NaN
2016-01-05 00:00:00    2.0
2016-01-05 12:00:00    2.0
2016-01-06 00:00:00    3.0
2016-01-06 12:00:00    3.0
2016-01-07 00:00:00    NaN
2016-01-07 12:00:00    NaN
2016-01-08 00:00:00    NaN
Freq: 12H, dtype: float64

I'd like to see resample's interpolate() switch its behavior to no longer fill pre-existing NaNs. If those NaNs are not meaningful, it is straightforward to .drop() them first if necessary. This behavior violates the model that upsampling should only depend on what index values are present, not the data values.

Some variation of this have come up in several other issues:

@gfyoung gfyoung added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label Oct 14, 2017
@mroeschke mroeschke added Bug Resample resample method labels Jun 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Resample resample method
Projects
None yet
Development

No branches or pull requests

3 participants