Skip to content

fix bfill, ffill and pad when calling with df.interpolate with column… #33959

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jun 2, 2020

Conversation

CloseChoice
Copy link
Member

@CloseChoice CloseChoice commented May 3, 2020

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CloseChoice for the PR.

not sure we should be adding tests for incorrect/undoumented behaviour. see #33956 (comment)

@simonjayhawkins
Copy link
Member

xref #12918 maybe the tests could test for equality with either DataFrame.ffill or DataFrame.fillna(method='ffill) and update the DataFrame.interpolate docstring, see #12918 (comment) and #12918 (comment)

@jreback jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Bug labels May 12, 2020
@jreback jreback added this to the 1.1 milestone May 12, 2020
@jreback
Copy link
Contributor

jreback commented May 12, 2020

can you merge master

@CloseChoice
Copy link
Member Author

CloseChoice commented May 13, 2020

xref #12918 maybe the tests could test for equality with either DataFrame.ffill or DataFrame.fillna(method='ffill) and update the DataFrame.interpolate docstring, see #12918 (comment) and #12918 (comment)

I went to the interpolate function back and forth and I come to the conclusion that we totally should fix this first. When we want to ensure that fillna(method='ffill') and interpolate(method='ffill') behave the same way, why shouldn't we just call fillna inside of interpolate in this case?

The whole interpolate function looks pretty much like things have grown over time and the quick fix is prefered to the generic, clean solution (with transposing back and forth, and so on). I am working on a cleaner solution. Am curious for your feedback.

EDIT:
I found a quick solution which fixes the referenced tickets (#33956 and #12918 and #29146). I am not totally satisfied with this solution since it is adding complexity (though only a little) to an already complex code. I would still suggest that a refactoring of the total interpolate logic would make things much clearer.

@pep8speaks
Copy link

pep8speaks commented May 13, 2020

Hello @CloseChoice! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-05-29 16:26:02 UTC

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CloseChoice for the updates, is it feasible for the fixes for #12918 and #29146 be split off into a pre-cursor PR. smaller PRs with more atomic changes are easier to review/quicker to merge. also needs a whatsnew.

@@ -296,3 +297,45 @@ def test_interp_string_axis(self, axis_name, axis_number):
result = df.interpolate(method="linear", axis=axis_name)
expected = df.interpolate(method="linear", axis=axis_number)
tm.assert_frame_equal(result, expected)

@pytest.mark.parametrize("axis", [0, 1])
def test_interp_ffill(self, axis):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you parameterise these tests @pytest.mark.parametrize("method", ["ffill", "bfill", "pad"])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"Only `method=linear` interpolation is supported on MultiIndexes."
)

if method in ["backfill", "bfill", "pad", "ffill"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment here that for these methods, limit_direction and limit_area are being ignored and include a link to #26796

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@CloseChoice
Copy link
Member Author

Thanks @CloseChoice for the updates, is it feasible for the fixes for #12918 and #29146 be split off into a pre-cursor PR. smaller PRs with more atomic changes are easier to review/quicker to merge. also needs a whatsnew.

Thanks for your review. Actually fixing #12918 and #29146 has the side effect of fixing #33956, since the fillna-methods don't have the issue described in #33956. So I don't see possibility to split this into two PRs though I totally agree with you to have small, atomic PRs.


# todo: change interpolation so that no transposing is necessary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO

@@ -202,7 +202,8 @@ def test_interp_leading_nans(self, check_scipy):
result = df.interpolate(method="polynomial", order=1)
tm.assert_frame_equal(result, expected)

def test_interp_raise_on_only_mixed(self):
@pytest.mark.parametrize("axis", [0, 1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the axis fixture (e.g. just pass axis as the arg)

@@ -213,7 +214,7 @@ def test_interp_raise_on_only_mixed(self):
}
)
with pytest.raises(TypeError):
df.interpolate(axis=1)
df.astype("object").interpolate(axis=axis)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this changed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check the error message here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -296,3 +297,18 @@ def test_interp_string_axis(self, axis_name, axis_number):
result = df.interpolate(method="linear", axis=axis_name)
expected = df.interpolate(method="linear", axis=axis_number)
tm.assert_frame_equal(result, expected)

@pytest.mark.parametrize("method", ["ffill", "bfill", "pad"])
@pytest.mark.parametrize("axis", [0, 1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

axis fixture

)

# for the methods backfill, bfill, pad, ffill limit_direction and limit_area
# are being ignored, see #26796 for more information
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list as gh-26796

@CloseChoice CloseChoice requested a review from jreback May 27, 2020 22:18
@jreback jreback merged commit b405904 into pandas-dev:master Jun 2, 2020
@jreback
Copy link
Contributor

jreback commented Jun 2, 2020

thanks @CloseChoice very nice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
4 participants