Skip to content

BUG/TST: Fillna limit Error Reporting #27074

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

Inevitable-Marzipan
Copy link
Contributor

fillna method was not throwing an ValueError with a dataframe full of
integers (no NAs) and limit keyword set to a negative number. Issue can
be seen in GH27042.

Begun with writing tests for checking the behaviour of fillna
and values of limit in test/frame/test_missing.py for both negative and
non-integer values of limit.

Then wrote new test file tests/util/test_validate_fillna_kwargs.py,
where the function would now also take limit as an argument and
perform necassary checks in there.

Made changes to util/_validators.py to correctly check values of limit
as was already done in core/internal/blocks.py

Updated calls to validate_fillna_kwargs in various files
to include the limit argument.

TimJ added 2 commits June 27, 2019 11:08
fillna method was not throwing an ValueError with a dataframe full of
integers (no NAs) and limit keyword set to a negative number. Issue can
be seen in GH27042.

Begun with writing tests for checking the behaviour of fillna
and values of limit in test/frame/test_missing.py for both negative and
non-integer values of limit.

Then wrote new test file tests/util/test_validate_fillna_kwargs.py,
where the function would now also take limit as an argument and
perform necassary checks in there.

Made changes to util/_validators.py to correctly check values of limit
as was already done in core/internal/blocks.py

Updated calls to validate_fillna_kwargs in various files
to include the limit argument.
@pep8speaks
Copy link

pep8speaks commented Jun 27, 2019

Hello @Inevitable-Marzipan! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-06-27 14:19:09 UTC

@codecov
Copy link

codecov bot commented Jun 27, 2019

Codecov Report

Merging #27074 into master will decrease coverage by 50.17%.
The diff coverage is 38.46%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #27074       +/-   ##
===========================================
- Coverage   92.04%   41.86%   -50.18%     
===========================================
  Files         180      180               
  Lines       50714    50715        +1     
===========================================
- Hits        46679    21234    -25445     
- Misses       4035    29481    +25446
Flag Coverage Δ
#multiple ?
#single 41.86% <38.46%> (-0.09%) ⬇️
Impacted Files Coverage Δ
pandas/core/internals/blocks.py 52.36% <ø> (-42.79%) ⬇️
pandas/core/arrays/base.py 58.98% <0%> (-40.45%) ⬇️
pandas/core/arrays/datetimelike.py 41.79% <0%> (-56.14%) ⬇️
pandas/core/arrays/categorical.py 41.73% <0%> (-54.2%) ⬇️
pandas/core/arrays/numpy_.py 41.28% <0%> (-53.22%) ⬇️
pandas/core/generic.py 38.45% <100%> (-55.75%) ⬇️
pandas/util/_validators.py 35.35% <50%> (-61.46%) ⬇️
pandas/io/formats/latex.py 0% <0%> (-100%) ⬇️
pandas/io/gcs.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
... and 139 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e955515...ee72cd7. Read the comment docs.

@codecov
Copy link

codecov bot commented Jun 27, 2019

Codecov Report

Merging #27074 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #27074      +/-   ##
==========================================
- Coverage   92.04%   92.03%   -0.01%     
==========================================
  Files         180      180              
  Lines       50714    50715       +1     
==========================================
- Hits        46679    46677       -2     
- Misses       4035     4038       +3
Flag Coverage Δ
#multiple 90.67% <100%> (ø) ⬆️
#single 41.86% <38.46%> (-0.1%) ⬇️
Impacted Files Coverage Δ
pandas/core/internals/blocks.py 95.13% <ø> (-0.02%) ⬇️
pandas/core/arrays/base.py 99.43% <100%> (ø) ⬆️
pandas/core/arrays/datetimelike.py 97.92% <100%> (ø) ⬆️
pandas/util/_validators.py 96.96% <100%> (+0.16%) ⬆️
pandas/core/arrays/categorical.py 95.92% <100%> (ø) ⬆️
pandas/core/arrays/numpy_.py 94.49% <100%> (ø) ⬆️
pandas/core/generic.py 94.2% <100%> (ø) ⬆️
pandas/io/gbq.py 88.88% <0%> (-11.12%) ⬇️
pandas/core/frame.py 96.89% <0%> (-0.12%) ⬇️
pandas/util/testing.py 90.94% <0%> (+0.1%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e955515...c9c39c4. Read the comment docs.

@Inevitable-Marzipan Inevitable-Marzipan changed the title Fillna limit er BUG/TST: Fillna limit Error Reporting Jun 27, 2019
from pandas.util._validators import validate_fillna_kwargs


def test_no_value_or_method():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this not tested before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find any mention of this specific function in any test files after doing a search for it. There may be higher level tests that check the behaviour of the fillna method for this though.

@@ -322,7 +322,8 @@ def validate_axis_style_args(data, args, kwargs, arg_name, method_name):
return out


def validate_fillna_kwargs(value, method, validate_scalar_dict_value=True):
def validate_fillna_kwargs(value, method, limit,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add limit to the parameters.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change

@@ -352,10 +352,6 @@ def fillna(self, value, limit=None, inplace=False, downcast=None):

mask = isna(self.values)
if limit is not None:
if not is_integer(limit):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is removing validation in this method right? Shouldn't we replace with a call to validate_fillna_kwargs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Digging into the fillna callstack, it first calls super().fillna() in core/frame.py, which then goes to core/generic.py where validate_fillna_kwargs is called (along with validate_bool_kwarg on inplace), before it then goes into core/internalls/blocks.py. Then in fillna in blocks.py it calls validate_bool_kwarg on inplace again, but originally never did the validate_fillna_kwargs call again. I'm not sure if the checks are just being repeated needlessly or not, first in frame and then in blocks. But at the start of the function validate_fillna_kwargs(value, None, limit) could be called I guess.

@@ -516,6 +516,22 @@ def test_fillna_skip_certain_blocks(self):
# it works!
df.fillna(np.nan)

@pytest.mark.parametrize("type", [int, float])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these need to be parametrized - the dtype of the calling object is unrelated to the change here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dtype was exactly what was causing the issue though, this test failed before my change. It has to do with IntBlock and FloatBlock having _can_hold_na set to different values, causing a return in blocks.py's fillna method before it checked the validity of limit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe np.random.uniform should be used here instead of np.random.rand which returns all zeros when cast to int, although there isn't much need for the change.

@WillAyd WillAyd added the Error Reporting Incorrect or improved errors from pandas label Jun 27, 2019
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we already have checks in algos.pyx, we should consildate these either there or higher level.

@jreback
Copy link
Contributor

jreback commented Jun 27, 2019

closing in favor of #27077

@jreback jreback closed this Jun 27, 2019
@Inevitable-Marzipan Inevitable-Marzipan deleted the fillna_limit_er branch June 27, 2019 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fillna not throwing error when limit is not positive
5 participants