Skip to content

BUG: Fix for fillna ignoring axis=1 parameter (issues #17399 #17409) #28441

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

jorvis
Copy link

@jorvis jorvis commented Sep 14, 2019

@pep8speaks
Copy link

pep8speaks commented Sep 14, 2019

Hello @jorvis! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 675:9: E265 block comment should start with '# '
Line 677:20: E231 missing whitespace after ','
Line 677:22: E231 missing whitespace after ','
Line 677:24: E231 missing whitespace after ','
Line 678:20: E231 missing whitespace after ','
Line 678:27: E231 missing whitespace after ','
Line 678:29: E231 missing whitespace after ','
Line 679:20: E231 missing whitespace after ','
Line 679:23: E231 missing whitespace after ','
Line 679:26: E231 missing whitespace after ','
Line 682:20: E231 missing whitespace after ','
Line 682:22: E231 missing whitespace after ','
Line 682:24: E231 missing whitespace after ','
Line 683:20: E231 missing whitespace after ','
Line 683:22: E231 missing whitespace after ','
Line 683:24: E231 missing whitespace after ','
Line 684:20: E231 missing whitespace after ','
Line 684:23: E231 missing whitespace after ','
Line 684:26: E231 missing whitespace after ','

Comment last updated at 2019-09-19 21:56:54 UTC

@jorvis jorvis changed the title Fix for fillna ignoring axis=1 parameter (issues #17399 #17409) BUG: Fix for fillna ignoring axis=1 parameter (issues #17399 #17409) Sep 14, 2019
@jorvis
Copy link
Author

jorvis commented Sep 14, 2019

Here's an example of this issue in Jupyter:

https://imgur.com/a/3ZrXqSv

@gfyoung gfyoung added Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Sep 15, 2019
@@ -386,7 +386,7 @@ def apply(self, func, **kwargs):

return result

def fillna(self, value, limit=None, inplace=False, downcast=None):
def fillna(self, value, axis=0, limit=None, inplace=False, downcast=None):
Copy link
Member

@gfyoung gfyoung Sep 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As is, this is a breaking API change. To be safe, you would need to put axis at the end.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add an annotation for axis? Should be importable from pandas._typing

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As is, this is a breaking API change. To be safe, you would need to put axis at the end.

I have just done this, thanks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add an annotation for axis? Should be importable from pandas._typing

Sorry, I'm not sure what that means. Could you direct me to an example or docs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea so a variable annotation. You'd essentially do:

from pandas._typing import Axis

def fillna(..., axis: Axis = 0):

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks. I have pushed this.

@gfyoung gfyoung requested a review from jreback September 15, 2019 03:47
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a whatsnew for v1.0.0?

@@ -671,6 +671,14 @@ def test_fillna_columns(self):
expected = df.astype(float).fillna(method="ffill", axis=1)
assert_frame_equal(result, expected)

def test_fillna_columns(self):
df = DataFrame(np.random.randn(10, 4))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add GH 17399 as a comment here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also generally for test you can simplify by just constructing a DataFrame with literal values instead of random data - should make smaller and help readability

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the comment (and properly fixed the test function name). I'm afraid I found this bug on the first day of my learning to use pandas, so I'm not quite there yet to write a custom test. Rather, I used/modified the existing one from test_fillna_columns()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this test works on master for me without these changes - is this capturing the changed behavior?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorvis can you respond here? Need to make sure we're actually fixing a real issue.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to make a more direct test case. This is the demo that made me find it:

https://imgur.com/a/3ZrXqSv

After that I found an existing PR which had grown stale, and @jreback asked me to update it, so I thought I'd give it a try.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated it now to reflect a test similar to that Jupyter image above.

@@ -386,7 +386,7 @@ def apply(self, func, **kwargs):

return result

def fillna(self, value, limit=None, inplace=False, downcast=None):
def fillna(self, value, axis=0, limit=None, inplace=False, downcast=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add an annotation for axis? Should be importable from pandas._typing

@@ -671,6 +671,14 @@ def test_fillna_columns(self):
expected = df.astype(float).fillna(method="ffill", axis=1)
assert_frame_equal(result, expected)

def test_fillna_columns(self):
df = DataFrame(np.random.randn(10, 4))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this test works on master for me without these changes - is this capturing the changed behavior?

@@ -233,7 +233,7 @@ Other
- Trying to set the ``display.precision``, ``display.max_rows`` or ``display.max_columns`` using :meth:`set_option` to anything but a ``None`` or a positive int will raise a ``ValueError`` (:issue:`23348`)
- Using :meth:`DataFrame.replace` with overlapping keys in a nested dictionary will no longer raise, now matching the behavior of a flat dictionary (:issue:`27660`)
- :meth:`DataFrame.to_csv` and :meth:`Series.to_csv` now support dicts as ``compression`` argument with key ``'method'`` being the compression method and others as additional compression options when the compression method is ``'zip'``. (:issue:`26023`)
-
- :meth:`DataFrame.fillna` when using axis=1 previously failed to replace NaN values (:issue:`17399` and :issue:`17409`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true in general, or was it just under specific conditions?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to make a more direct test case. This is the demo that made me find it:

https://imgur.com/a/3ZrXqSv

After that I found an existing PR which had grown stale, and @jreback asked me to update it, so I thought I'd give it a try.

@@ -671,6 +671,14 @@ def test_fillna_columns(self):
expected = df.astype(float).fillna(method="ffill", axis=1)
assert_frame_equal(result, expected)

def test_fillna_columns(self):
df = DataFrame(np.random.randn(10, 4))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorvis can you respond here? Need to make sure we're actually fixing a real issue.

"b": [5,6,7,8],
"c": [9,10,11,6]})

result = df.fillna(df.mean(axis=1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see axis passed to fillna here? You pass it to mean but not fillna.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should back up and make sure the behavior I'm seeing is expected then an not a bug. In this notebook I first try to replace NaNs with the column average, then the row average. Doing the first works, but the second silently does nothing.

Setting axis=1 on fillna when using the mean where axis=1 causes an error: Currently only can fill with dict/Series column by column.

It seems like calling fillna on a frame with NaNs which doesn't fill them is an issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I also think this test might be missing the point. Can you try getting the issue directly from #17399 to work? I think that is relatively clearly laid out

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Sep 20, 2019 via email

@jorvis
Copy link
Author

jorvis commented Sep 20, 2019

Can you give an example of the syntax to use to fill using the row means?

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you merge master and address comment(s)?

"b": [5,6,7,8],
"c": [9,10,11,6]})

result = df.fillna(df.mean(axis=1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I also think this test might be missing the point. Can you try getting the issue directly from #17399 to work? I think that is relatively clearly laid out

@WillAyd
Copy link
Member

WillAyd commented Nov 7, 2019

Thanks for the PR @jorvis but I think this has gone stale. If not and you'd like to pick back up just ping and can continue

@WillAyd WillAyd closed this Nov 7, 2019
@scottboston
Copy link

scottboston commented Feb 3, 2020

Question. I reported this bug (#17399) back in 2017, I was going to work on fixing this back then but it seems that pratapvardhan worked on the problem. Again someone else found this issue and worked on it. I am not clear what needs to be done to close this issue. It appears that the last test case missed the mark to address the original problem.

What do we need to close this? Write a new test case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fillna ignoring axis=1 parameter
6 participants