-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Fix for fillna ignoring axis=1 parameter (issues #17399 #17409) #28441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @jorvis! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2019-09-19 21:56:54 UTC |
Here's an example of this issue in Jupyter: |
pandas/core/internals/blocks.py
Outdated
@@ -386,7 +386,7 @@ def apply(self, func, **kwargs): | |||
|
|||
return result | |||
|
|||
def fillna(self, value, limit=None, inplace=False, downcast=None): | |||
def fillna(self, value, axis=0, limit=None, inplace=False, downcast=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As is, this is a breaking API change. To be safe, you would need to put axis
at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add an annotation for axis
? Should be importable from pandas._typing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As is, this is a breaking API change. To be safe, you would need to put
axis
at the end.
I have just done this, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add an annotation for
axis
? Should be importable frompandas._typing
Sorry, I'm not sure what that means. Could you direct me to an example or docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea so a variable annotation. You'd essentially do:
from pandas._typing import Axis
def fillna(..., axis: Axis = 0):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks. I have pushed this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a whatsnew for v1.0.0?
pandas/tests/frame/test_missing.py
Outdated
@@ -671,6 +671,14 @@ def test_fillna_columns(self): | |||
expected = df.astype(float).fillna(method="ffill", axis=1) | |||
assert_frame_equal(result, expected) | |||
|
|||
def test_fillna_columns(self): | |||
df = DataFrame(np.random.randn(10, 4)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add GH 17399 as a comment here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also generally for test you can simplify by just constructing a DataFrame with literal values instead of random data - should make smaller and help readability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the comment (and properly fixed the test function name). I'm afraid I found this bug on the first day of my learning to use pandas, so I'm not quite there yet to write a custom test. Rather, I used/modified the existing one from test_fillna_columns()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this test works on master for me without these changes - is this capturing the changed behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorvis can you respond here? Need to make sure we're actually fixing a real issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to make a more direct test case. This is the demo that made me find it:
After that I found an existing PR which had grown stale, and @jreback asked me to update it, so I thought I'd give it a try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated it now to reflect a test similar to that Jupyter image above.
pandas/core/internals/blocks.py
Outdated
@@ -386,7 +386,7 @@ def apply(self, func, **kwargs): | |||
|
|||
return result | |||
|
|||
def fillna(self, value, limit=None, inplace=False, downcast=None): | |||
def fillna(self, value, axis=0, limit=None, inplace=False, downcast=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add an annotation for axis
? Should be importable from pandas._typing
…test_fillna_rows(), added comment to reference bug
pandas/tests/frame/test_missing.py
Outdated
@@ -671,6 +671,14 @@ def test_fillna_columns(self): | |||
expected = df.astype(float).fillna(method="ffill", axis=1) | |||
assert_frame_equal(result, expected) | |||
|
|||
def test_fillna_columns(self): | |||
df = DataFrame(np.random.randn(10, 4)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this test works on master for me without these changes - is this capturing the changed behavior?
@@ -233,7 +233,7 @@ Other | |||
- Trying to set the ``display.precision``, ``display.max_rows`` or ``display.max_columns`` using :meth:`set_option` to anything but a ``None`` or a positive int will raise a ``ValueError`` (:issue:`23348`) | |||
- Using :meth:`DataFrame.replace` with overlapping keys in a nested dictionary will no longer raise, now matching the behavior of a flat dictionary (:issue:`27660`) | |||
- :meth:`DataFrame.to_csv` and :meth:`Series.to_csv` now support dicts as ``compression`` argument with key ``'method'`` being the compression method and others as additional compression options when the compression method is ``'zip'``. (:issue:`26023`) | |||
- | |||
- :meth:`DataFrame.fillna` when using axis=1 previously failed to replace NaN values (:issue:`17399` and :issue:`17409`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this true in general, or was it just under specific conditions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to make a more direct test case. This is the demo that made me find it:
After that I found an existing PR which had grown stale, and @jreback asked me to update it, so I thought I'd give it a try.
pandas/tests/frame/test_missing.py
Outdated
@@ -671,6 +671,14 @@ def test_fillna_columns(self): | |||
expected = df.astype(float).fillna(method="ffill", axis=1) | |||
assert_frame_equal(result, expected) | |||
|
|||
def test_fillna_columns(self): | |||
df = DataFrame(np.random.randn(10, 4)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorvis can you respond here? Need to make sure we're actually fixing a real issue.
"b": [5,6,7,8], | ||
"c": [9,10,11,6]}) | ||
|
||
result = df.fillna(df.mean(axis=1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see axis
passed to fillna here? You pass it to mean but not fillna.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should back up and make sure the behavior I'm seeing is expected then an not a bug. In this notebook I first try to replace NaNs with the column average, then the row average. Doing the first works, but the second silently does nothing.
Setting axis=1 on fillna when using the mean where axis=1 causes an error: Currently only can fill with dict/Series column by column.
It seems like calling fillna on a frame with NaNs which doesn't fill them is an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I also think this test might be missing the point. Can you try getting the issue directly from #17399 to work? I think that is relatively clearly laid out
When you do `DataFrame.fillna(Series)` (with no `axis=1`) the indices are
aligned. In your test example,
nothing is filled because the index of the Series is the original columns
from the DataFrame.
…On Thu, Sep 19, 2019 at 5:52 PM Joshua Orvis ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pandas/tests/frame/test_missing.py
<#28441 (comment)>:
> @@ -671,6 +671,21 @@ def test_fillna_columns(self):
expected = df.astype(float).fillna(method="ffill", axis=1)
assert_frame_equal(result, expected)
+ def test_fillna_rows(self):
+ #GH17399
+ df = pd.DataFrame({
+ "a": [1,2,3,4],
+ "b": [5,np.nan,7,8],
+ "c": [9,10,11,np.nan]})
+
+ expected = pd.DataFrame({
+ "a": [1,2,3,4],
+ "b": [5,6,7,8],
+ "c": [9,10,11,6]})
+
+ result = df.fillna(df.mean(axis=1))
Maybe we should back up and make sure the behavior I'm seeing is expected
then an not a bug. In this notebook <https://imgur.com/a/3ZrXqSv> I first
try to replace NaNs with the column average, then the row average. Doing
the first works, but the second silently does nothing.
Setting axis=1 on fillna when using the mean where axis=1 causes an error:
Currently only can fill with dict/Series column by column.
It seems like calling fillna on a frame with NaNs which doesn't fill them
is an issue.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#28441?email_source=notifications&email_token=AAKAOIVGDMDSGU4CO7NQQBTQKP7CJA5CNFSM4IWVUUTKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCFLIMWA#discussion_r326414395>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOIWKL2FSS47XJ5KLIELQKP7CJANCNFSM4IWVUUTA>
.
|
Can you give an example of the syntax to use to fill using the row means? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you merge master and address comment(s)?
"b": [5,6,7,8], | ||
"c": [9,10,11,6]}) | ||
|
||
result = df.fillna(df.mean(axis=1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I also think this test might be missing the point. Can you try getting the issue directly from #17399 to work? I think that is relatively clearly laid out
Thanks for the PR @jorvis but I think this has gone stale. If not and you'd like to pick back up just ping and can continue |
Question. I reported this bug (#17399) back in 2017, I was going to work on fixing this back then but it seems that pratapvardhan worked on the problem. Again someone else found this issue and worked on it. I am not clear what needs to be done to close this issue. It appears that the last test case missed the mark to address the original problem. What do we need to close this? Write a new test case? |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff