-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG/TST: Fillna limit Error Reporting #27074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG/TST: Fillna limit Error Reporting #27074
Conversation
fillna method was not throwing an ValueError with a dataframe full of integers (no NAs) and limit keyword set to a negative number. Issue can be seen in GH27042. Begun with writing tests for checking the behaviour of fillna and values of limit in test/frame/test_missing.py for both negative and non-integer values of limit. Then wrote new test file tests/util/test_validate_fillna_kwargs.py, where the function would now also take limit as an argument and perform necassary checks in there. Made changes to util/_validators.py to correctly check values of limit as was already done in core/internal/blocks.py Updated calls to validate_fillna_kwargs in various files to include the limit argument.
Hello @Inevitable-Marzipan! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2019-06-27 14:19:09 UTC |
Codecov Report
@@ Coverage Diff @@
## master #27074 +/- ##
===========================================
- Coverage 92.04% 41.86% -50.18%
===========================================
Files 180 180
Lines 50714 50715 +1
===========================================
- Hits 46679 21234 -25445
- Misses 4035 29481 +25446
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #27074 +/- ##
==========================================
- Coverage 92.04% 92.03% -0.01%
==========================================
Files 180 180
Lines 50714 50715 +1
==========================================
- Hits 46679 46677 -2
- Misses 4035 4038 +3
Continue to review full report at Codecov.
|
from pandas.util._validators import validate_fillna_kwargs | ||
|
||
|
||
def test_no_value_or_method(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this not tested before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find any mention of this specific function in any test files after doing a search for it. There may be higher level tests that check the behaviour of the fillna method for this though.
@@ -322,7 +322,8 @@ def validate_axis_style_args(data, args, kwargs, arg_name, method_name): | |||
return out | |||
|
|||
|
|||
def validate_fillna_kwargs(value, method, validate_scalar_dict_value=True): | |||
def validate_fillna_kwargs(value, method, limit, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add limit
to the parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice change
@@ -352,10 +352,6 @@ def fillna(self, value, limit=None, inplace=False, downcast=None): | |||
|
|||
mask = isna(self.values) | |||
if limit is not None: | |||
if not is_integer(limit): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is removing validation in this method right? Shouldn't we replace with a call to validate_fillna_kwargs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Digging into the fillna callstack, it first calls super().fillna() in core/frame.py, which then goes to core/generic.py where validate_fillna_kwargs is called (along with validate_bool_kwarg on inplace
), before it then goes into core/internalls/blocks.py. Then in fillna in blocks.py it calls validate_bool_kwarg on inplace
again, but originally never did the validate_fillna_kwargs call again. I'm not sure if the checks are just being repeated needlessly or not, first in frame and then in blocks. But at the start of the function validate_fillna_kwargs(value, None, limit) could be called I guess.
@@ -516,6 +516,22 @@ def test_fillna_skip_certain_blocks(self): | |||
# it works! | |||
df.fillna(np.nan) | |||
|
|||
@pytest.mark.parametrize("type", [int, float]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these need to be parametrized - the dtype of the calling object is unrelated to the change here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dtype was exactly what was causing the issue though, this test failed before my change. It has to do with IntBlock and FloatBlock having _can_hold_na set to different values, causing a return in blocks.py's fillna method before it checked the validity of limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe np.random.uniform should be used here instead of np.random.rand which returns all zeros when cast to int, although there isn't much need for the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we already have checks in algos.pyx, we should consildate these either there or higher level.
closing in favor of #27077 |
fillna method was not throwing an ValueError with a dataframe full of
integers (no NAs) and limit keyword set to a negative number. Issue can
be seen in GH27042.
Begun with writing tests for checking the behaviour of fillna
and values of limit in test/frame/test_missing.py for both negative and
non-integer values of limit.
Then wrote new test file tests/util/test_validate_fillna_kwargs.py,
where the function would now also take limit as an argument and
perform necassary checks in there.
Made changes to util/_validators.py to correctly check values of limit
as was already done in core/internal/blocks.py
Updated calls to validate_fillna_kwargs in various files
to include the limit argument.
git diff upstream/master -u -- "*.py" | flake8 --diff