-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: interpolate with limit keyword partially fills gaps larger than limit #36352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hey, thanks for your report. That is the documented behavior. |
Thanks! The documentation does indeed say consecutive nans, but I still think the more expected behavior from the documentation is that gaps larger than the limit is not filled at all. Take for example a time series with several small and large gaps, it can make sense to fill smaller gaps by interpolating between them, but larger gaps do often not make sense interpolate at all. The method now leaves some strange values at the beginning/end of these large gaps. Anyway, I see that fillna(), which also has a limit keyword, does elaborate on the behavior in the documentation - perhaps include that in interpolate as well? I also see now that there was some effort adding a max_gap keyword (#25141), hopefully that continues. |
Would you like to contribute a PR to enhance the docs? We have an example showing that behavior, maybe add a comment to elaborate furhter? If you would like to discuss the enhancement of the function, you could open an enhancement issue and describe your wishes. |
@phofl I will look at the contributing guide when I have time and hopefully figure out how to do a PR - it will be a good learning experience! |
This is very surprising behavior, and I would argue that it should not be the default, especially for linear interpolation. I can see why some might want this behavior for Thus, I would argue, for the non-fill cases, this is a bug, not merely an enhancement. Example: series = pd.Series([1,1,1,np.nan,np.nan,np.nan,np.nan,5,6,np.nan,7])
series.interpolate(method='linear', limit=2, limit_direction='forward') output:
|
Sorry to bump this issue, but I find this rather frustrating too, as I either understand @phofl wrong or it does not work as he said.
results in:
If I understand this correctly, the My recommendation is to either adjust the docs to reflect this behaviour or to prevent filling gaps larger than the limit. |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Problem description
When passing a limit this is expected to be respected and gaps larger than this should not be interpolated at all. Partially filling the beginning (or end, depending on limit_direction) of the gaps is not reasonable behavior
Expected Output
See output in example
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : 2a7d332
python : 3.7.8.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.18362
machine : AMD64
processor : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en
LOCALE : None.None
pandas : 1.1.2
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.3
setuptools : 49.6.0.post20200814
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.6 (dt dec pq3 ext lo64)
jinja2 : None
IPython : 7.18.1
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.1
numexpr : None
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.19
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
numba : None
The text was updated successfully, but these errors were encountered: