-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Allow dropna to accept floats [0, 1] as thresh values #40676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This seems like a reasonably sensible and simple extension, albeit, there might need to be careful coding around 1 int and 1.00 float (0 int and 0.00 float are the same). I don't necessarily recommend this but while I'm looking at it there may be a case for more advanced missing data pipelining for |
this is a duplicate issue - pls search (prior one is closed as this is not a good api) |
duplicate of #35299 if you want to propose a new api go ahead, though am loath to add any additional keywords. |
I don't see why the existing alternative mentioned here is not sufficient.
That shouldn't go in the |
@rhshadrach yes probably right, I am not upto date with discussions on this, more a high level view that more technical missing data routines are becoming more necessary (particularly in my field) and pandas might have requests for flexibility in this regard. As for where there should go, I can agree dropna might serve a more basic purpose and best to keep it basic. |
Is your feature request related to a problem?
I wish I could use pandas to drop NaN values using a threshold that is a fraction of the total column/row, not an absolute number. See detailed example below.
Describe the solution you'd like
thresh : int or float, optional
Where int, requires that many non-NA values and where float, require that fraction of non-NA values.
API breaking implications
Only needs to extend it to accept floats as well as ints.
Describe alternatives you've considered
None. IMHO this solution is too simple and effective to consider other options.
Additional context
Example
It should be obviously that this is very useful in cases where the df axis size is liable to change, and where using piped functionality (saves the extra line of code calculating
thresh_int = 0.4 * Xy.shape[1]
).The text was updated successfully, but these errors were encountered: