-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
FIX: fix interpolate with kwarg limit area and limit direction using pad or bfill #31048
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX: fix interpolate with kwarg limit area and limit direction using pad or bfill #31048
Conversation
This test currently only test `limit_area`. For `limit_direction` the implementation should later raise an error, because `pad` and `bfill` both already define a direction. But let's now first do the implementation of the `limit_area` for `pad` and `bfill`.
Since methods `pad` and `bfill` in `blocks.interpolate` end up using `missing.interpolate_2d` which can not (easily) be extended to support `limit_area`, I introduce the new function `missing.interpolate_1d_fill`. It is a modified copy of `interpolate_2d` but only works for 1d data and uses newly introduced function `_derive_indices_of_nans_to_preserve`, which is now also used in `missing.interpolate_1d`. It works the same way as the 1D-interpolation functions which are based on scipy-interpolation which are applied via np.apply_along_axis.
@cchwala so IIUC you are doing a precursor to solve the issue with pad and |
…values` also was changed via appliying `func`
…e used Test for all forbidden combos of `pad` and `backfill` is included
…d `limit_direction`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good. If you can annotate any new functions would be helpful. Added what I think should be
pandas/core/missing.py
Outdated
@@ -477,6 +496,65 @@ def _akima_interpolate(xi, yi, x, der=0, axis=0): | |||
else: | |||
return [P(x, nu) for nu in der] | |||
|
|||
def interpolate_1d_fill( | |||
values, | |||
method="pad", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
method="pad", | |
method: str = "pad", |
pandas/core/missing.py
Outdated
def interpolate_1d_fill( | ||
values, | ||
method="pad", | ||
axis=0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
axis=0, | |
axis: Axis = 0, |
(import from pandas._typing)
pandas/core/missing.py
Outdated
values, | ||
method="pad", | ||
axis=0, | ||
limit=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
limit=None, | |
limit: Optional[int] = None, |
pandas/core/missing.py
Outdated
method="pad", | ||
axis=0, | ||
limit=None, | ||
limit_area=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
limit_area=None, | |
limit_area: Optional[str] = None, |
pandas/core/missing.py
Outdated
limit=None, | ||
limit_area=None, | ||
fill_value=None, | ||
dtype=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dtype=None, | |
dtype: Optional[Dtype] = None, |
(import from pandas._typing)
pandas/core/missing.py
Outdated
axis=0, | ||
limit=None, | ||
limit_area=None, | ||
fill_value=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fill_value=None, | |
fill_value: Optional[Hashable] = None, |
Should be ready now. I am offline for some hours of sleep, though. |
@WillAyd Any further requests for changes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty good
pandas/core/generic.py
Outdated
@@ -6720,6 +6722,28 @@ def interpolate( | |||
"column to a numeric dtype." | |||
) | |||
|
|||
# Set `limit_direction` depending on `method` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason this logic does not simply belong in interpolate_2d?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or call something like clean_fill_method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
limit_direction
has no effect in interpolate_2d
.
My idea was to detect conflicting user input, e.g. pad
with limit_direction='backward'
as early as possible in the chain of calls to the many functions that are involved.
But I can move the logic to a function like clean_fill_method
, e.g. called set_limit_direction
. Should this function belong to generic.py or missing.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interpolate_2d is also called from fillna. only NDFrame.interpolate allows limit_direction so does make sense to validate here.
clean_fill_method checks method. probably want to avoid passing more parameters around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls make a method similr to clean_fill_method then, e.g this logic should living pandas/core/missing.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what to do here. after merge with master, there is similar code at this location, but it is not mine...
pandas/core/missing.py
Outdated
@@ -478,6 +500,66 @@ def _akima_interpolate(xi, yi, x, der=0, axis=0): | |||
return [P(x, nu) for nu in der] | |||
|
|||
|
|||
def interpolate_1d_fill( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move this under interpolate_1d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interpolate_1d_fill
is a special case needed by blocks._interpolate_with_fill()
to support the limit kwargs with method pad
. As you requested in the comment above interpolate_1d_fill
is now called from within interpolate_2d
which is what blocks._interpolate_with_fill()
calls.
You request, moving it to missing.interpolate_1d
will, in my opinion, not work, because missing.interpolate_1d
is not reached in the call sequence for interpolating with method pad
since blocks.Block.interpolate()
splits up into blocks.Block._interpolate
(for scipy wrappers) and blocks.Block._interpolate_with_fill
(for pad methods). missing.interpolate_1d
is only called in the former case, i.e. for methods using the scipy wrappers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cchwala interpolate_2d will work on a 1d array. did you investigate applying it along an axis (with masking logic) rather than creating a 1d version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see #34749 for alternative implementation calling interpolate_2d instead. interpolate_2d already has the limit logic so no need to use the preserve_nans set based logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the nice solution from #34749 as suggested above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a consequence there is no interpolate_1d_fill
anymore
…ments Test on my local machine are not affected by removing the unncessery arguments `valid` and `invalid`, which are now derived within the function.
@cchwala can you address comments? |
I did break some bits off this PR. so does need a merge of master to see what's left (I did try this #31048 (comment)) |
Okay. I will do the merge with master today to see what is left. I guess, at least the tests will still be of value. |
…imit_area_and_limit_direction_with_pad I removed most of the code that I have added in this branch and will add it back in step by step in the next commits
Tests should be red now
I am now looking into the suggestion from @simonjayhawkins to use |
provided by pandas-dev#34749 Test are green now
@jreback Sorry to ping you again, but it would be great if I could finish this PR with (hopefully) final adjustments tomorrow (Friday). Otherwise things would have to wait again another two weeks because I will be offline. |
will have to wait as i am away |
Okay. No problem. |
pandas/core/internals/blocks.py
Outdated
limit=limit, | ||
fill_value=fill_value, | ||
dtype=self.dtype, | ||
) | ||
|
||
if limit_area is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put this in a function in missing
and just call here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. But if I do this I could also directly integrate it into missing.interpolate_2d
, e.g. like in this branch. Advantage, no need to introduce yet another function that does some interpolation. Disadvantage, obfuscates missing.interpolate_2d
. For this, adding some comments would help for sure.
If you prefer the separate function in missing
, what function name would you choose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. like in this branch.
I would merge recursive solution here for review (so that ci has been run against it, i think needs changes to pandas/core/arrays/categorical.py see master...simonjayhawkins:fix_interpolate_limit_area_and_limit_direction_with_pad)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Merged and CI is running.
I have also pushed a branch with an example implementation using a separate function in missing
. The name of the new function is not good, but the implementation would probably look something like this.
For both solutions, tests for interpolation pass, but I did not run the full test suite. Will see what changes are required when CI is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test are green.
If you find the recursive solution appropriate, I would add a comment to the function that briefly explains the reasoning. Anything else that needs improvement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cchwala can you add the type annotations as requested by @WillAyd https://github.com/pandas-dev/pandas/pull/31048/files#r367123208
@cchwala can you resolve merge conflicts and can take another look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks ok, some comments, can you merge master and we can get this in.
# and `limit_area` logic along a certain axis. | ||
if limit_area is not None: | ||
|
||
def func(values): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make this a module level function and just call it here
moving off 1.2 need merge master and update to comments. |
Closing in favor of #38106 |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
This separates a large part of the code changes from #25141 to keep things comprehendible.