-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: DataFrame.dropna must not allow to set both how= and thresh= parameters #46644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 7 commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
925c815
BUG attempt to fix GH46575
arnaudlegout b3200dc
fixed PEP8 formatting issues
arnaudlegout db2992e
fixed a typing issue
arnaudlegout a82000a
implemented a pytest and fixed a side effect of the bug fix
arnaudlegout ec4b812
Merge branch 'main' into fix-dropna-signature
arnaudlegout f7f44fd
Merge branch 'main' of https://github.com/pandas-dev/pandas into fix-…
arnaudlegout f84645b
Merge branch 'fix-dropna-signature' of https://github.com/arnaudlegou…
arnaudlegout 875b958
changed the raised exception after suggestion in code review
arnaudlegout f7cd5a6
added an entry for GH46575 in what's new
arnaudlegout c05b954
removed a comment as requested by code review
arnaudlegout b2e9a5f
Merge remote-tracking branch 'upstream/main' into fix-dropna-signature
arnaudlegout 950cdc7
used no_default instead of None as a default value
arnaudlegout 03f598d
Merge remote-tracking branch 'upstream/main' into fix-dropna-signature
arnaudlegout File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if users aren't supposed to pass
None
, this should probably belib.no_default
, e.g.pandas/pandas/core/generic.py
Line 7935 in 361021b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure
no_default
is appropriate? Most of its usage is for a method parameter that is deprecated. So if this parameteris not no_default
a warning could be raised. This is not the case here. Users aren't supposed to pass anything but "any" or "all", and the code correctly raises ifhow
is not "any" or "all".Using
no_default
seems semantically wrong. If you confirmhow
must default tono_default
I am fine with it, but I want to cross check with you.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's check with @simonjayhawkins
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the following use cases cover lib.no_default
None
whereNone
is a valid value, i.e. hashable index labelNone
default and we need to detect if a user passed any value if the keyword arg is depecated so that we can raise a warning.so at first glance it appears this issues falls into the third category.
so it really depends on the order of validation on whether we need to use lib.no_default.
I would have though that the exception for explicit passing of both
how
andthresh
regardless of the values passed should take precedence over the exception for passing anything but "any" or "all" tohow
?so the question is should
df.dropna(how=None, thresh=2)
raise? With this PR it gives an empty DataFrame (the same as 1.4.2). But alsodf.dropna(how=object(), thresh=2)
returns an empty DataFrame for 1.4.2 and now correctly raisesTypeError: You cannot set both the how and thresh arguments at the same time.
Also,
df.dropna(how=None)
returns the same asdf.dropna(how="any")
whereas on 1.4.2 this gaveTypeError: must specify how or thresh
. This was tested behavior and the test for this has been removed in this PR.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I reworked this PR in light of the third use case (I also defaulted thresh to no_default and added typing)
I removed the test for
TypeError: must specify how or thresh
as it is not correct. We can call dropna without specifyinghow
orthresh
.df.dropna(how=None)
not raising was an "acceptable" side effect of my PR, but usingno_default
should fix it and make it raises, admittedly better.By the way, would it make sense to add your three cases directly as a comment in the source code defining no_default. I would personally add this comment as it helps a lot developers to understand the usage, but I noticed that the pandas code is not that much commented, most likely for legacy reasons. If you think it makes sense, I can make a separate bug report and make the corresponding PR.