Skip to content

limit_area and limit_direction do not have an effect when interpolation method is 'pad' #26796

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cchwala opened this issue Jun 11, 2019 · 4 comments · Fixed by #38106
Closed
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone

Comments

@cchwala
Copy link
Contributor

cchwala commented Jun 11, 2019

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd                                                                                                                                                                                                        

In [2]: import numpy as np                                                                                                                                                                                                         

In [3]: s = pd.Series([ 
   ...:     np.nan, 
   ...:     1., np.nan, 
   ...:     2., np.nan, np.nan, 
   ...:     5., np.nan, np.nan, np.nan, 
   ...:     -1., np.nan, np.nan 
   ...: ])                                                                                                                                                                                                                         

In [4]: s.interpolate(method='pad', limit_area='inside')                                                                                                                                                                           
Out[4]: 
0     NaN
1     1.0
2     1.0
3     2.0
4     2.0
5     2.0
6     5.0
7     5.0
8     5.0
9     5.0
10   -1.0
11   -1.0
12   -1.0
dtype: float64

# For method='linear' the `limit_area` kwarg works as expected

In [5]: s.interpolate(method='linear', limit_area='inside')                                                                                                                                                                        
Out[5]: 
0     NaN
1     1.0
2     1.5
3     2.0
4     3.0
5     4.0
6     5.0
7     3.5
8     2.0
9     0.5
10   -1.0
11    NaN
12    NaN
dtype: float64

Problem description

The kwargs limit_area and limit_direction for interpolate(), introduce in #16513 do not seem to have an effect when using the method pad. They work with other interpolation methods, e.g. linear.

There are two different pathways for interpolate depending on the selected method.

if m is not None:
r = check_int_bool(self, inplace)
if r is not None:
return r
return self._interpolate_with_fill(method=m, axis=axis,
inplace=inplace, limit=limit,
fill_value=fill_value,
coerce=coerce,
downcast=downcast)
# validate the interp method
m = missing.clean_interp_method(method, **kwargs)
r = check_int_bool(self, inplace)
if r is not None:
return r
return self._interpolate(method=m, index=index, values=values,
axis=axis, limit=limit,
limit_direction=limit_direction,
limit_area=limit_area,
fill_value=fill_value, inplace=inplace,
downcast=downcast, **kwargs)

For pad, ffill and bfill _interpolate_with_fill() is used which calls missing.interpolate_2d which does not seem to recognize the keywords limit_direction and limit_area. They are silently ignored.

I might be able to fix that during #25141, but maybe a fix needs more fundamental changes.

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

commit: eaacefd
python: 3.7.2.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: de_DE.UTF-8

pandas: 0.25.0.dev0+725.geaacefd09
pytest: 4.1.1
pip: 18.1
setuptools: 40.6.3
Cython: 0.29.2
numpy: 1.15.4
scipy: 1.2.0
pyarrow: 0.11.1
xarray: 0.11.0
IPython: 7.2.0
sphinx: 1.8.2
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.9
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.9
feather: None
matplotlib: 3.0.2
openpyxl: 2.5.12
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.2
lxml.etree: 4.3.0
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.2.16
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: 0.2.0
fastparquet: 0.2.1
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@TomAugspurger
Copy link
Contributor

cc @WBare (haven't looked at the issue, not sure what the expected behavior is here).

@simonjayhawkins
Copy link
Member

I might be able to fix that during #25141, but maybe a fix needs more fundamental changes.

i would recommend having those changes in a separate PR. (as a precursor)

@simonjayhawkins simonjayhawkins added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label Jun 12, 2019
@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone Jun 12, 2019
@cchwala
Copy link
Contributor Author

cchwala commented Jun 12, 2019

Okay. A separate PR makes sense.

My problem is that I do not fully understand why the distinction between _interpolate_with_fill() (for 'pad', 'ffill' and 'bfill') and _interpolate() (for the other methods) is required in the code I referenced above.

I see that _interpolate() calls missing.interpolate_1d which does not support the methods 'pad' and 'bfill'.

_interpolate_with_fill() calls missing.interpolate_2d which only supports 'pad' and 'bfill'.

I do, however, not understand why the 2d-component is required here because interpolation and padding is only meant to be done along a certain axis in pandas, isn't it. Hence, I would say that the methods 'pad' and 'bfill' could also be integrated in missing.interpolate_1d. But I have not fully understood all relevant code.

If somebody could clarify on that, I could start thinking about a solution.

cchwala added a commit to cchwala/pandas that referenced this issue Nov 20, 2019
@cchwala
Copy link
Contributor Author

cchwala commented Nov 20, 2019

One note on pad and limit_direction:

Since pad, ffill, backfill and bfill are, according to their name, meant to only work in one direction it does not makes sense to provided a limit_direction when using them in .interpolate().

#25141 will handle that and throw an error in these cases.

cchwala added a commit to cchwala/pandas that referenced this issue Jan 15, 2020
This test currently only test `limit_area`.

For `limit_direction` the implementation should later raise an error,
because `pad` and `bfill` both already define a direction. But let's
now first do the implementation of the `limit_area` for `pad`
and `bfill`.
@jreback jreback modified the milestones: Contributions Welcome, 1.1 Mar 19, 2020
@jreback jreback added the Bug label Mar 19, 2020
@jreback jreback modified the milestones: 1.1, Contributions Welcome Jul 10, 2020
@simonjayhawkins simonjayhawkins modified the milestones: Contributions Welcome, 1.2 Sep 15, 2020
@jreback jreback modified the milestones: 1.2, Contributions Welcome Nov 24, 2020
@jreback jreback modified the milestones: Contributions Welcome, 1.2 Nov 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
4 participants