-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: check for string and convert to list in DataFrame.dropna subset argument #41022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: check for string and convert to list in DataFrame.dropna subset argument #41022
Conversation
erfannariman
commented
Apr 18, 2021
•
edited
Loading
edited
- closes ENH: dropna should accept single column as string in subset argument #41021
- tests added / passed
- Ensure all linting tests pass, see here for how to run them
- whatsnew entry
@phofl @mzeitlin11 the test edit: my branch is up to date with master, so that cannot be the problem. |
You can ignore the |
pandas/core/frame.py
Outdated
@@ -5803,7 +5805,7 @@ def dropna( | |||
axis: Axis = 0, | |||
how: str = "any", | |||
thresh=None, | |||
subset=None, | |||
subset: Optional[Union[Hashable, Sequence[Hashable]]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you type using new syntax | for Union and | None for Optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually saw we have an internal type for this if I'm not mistaken:IndexLabel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonjayhawkins would be the person to ask here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonjayhawkins any chance you can have a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, small request
doc/source/whatsnew/v1.2.5.rst
Outdated
@@ -35,7 +35,6 @@ Bug fixes | |||
Other | |||
~~~~~ | |||
|
|||
- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meaning your change, sorry if not clear
doc/source/whatsnew/v1.3.0.rst
Outdated
@@ -219,7 +219,7 @@ Other enhancements | |||
- :meth:`pandas.read_csv` and :meth:`pandas.read_json` expose the argument ``encoding_errors`` to control how encoding errors are handled (:issue:`39450`) | |||
- :meth:`.GroupBy.any` and :meth:`.GroupBy.all` use Kleene logic with nullable data types (:issue:`37506`) | |||
- :meth:`.GroupBy.any` and :meth:`.GroupBy.all` return a ``BooleanDtype`` for columns with nullable data types (:issue:`33449`) | |||
- | |||
- :meth:`DataFrame.dropna` now accepts a single label as ``subset`` along with array-like. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the issue number (or this PR number if no issue)
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
Could you please reopen, will finish this one. |
I get a typing error since subset is |
doc/source/whatsnew/v1.3.4.rst
Outdated
@@ -42,7 +42,7 @@ Bug fixes | |||
|
|||
Other | |||
~~~~~ | |||
- | |||
- :meth:`DataFrame.dropna` now accepts a single label as ``subset`` along with array-like. (:issue:`41021`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be targeted for 1.4 since it's not a regression. Can go in enhancements there (we try to not use the Other section).
pandas/core/frame.py
Outdated
@@ -5990,6 +5990,8 @@ def dropna( | |||
|
|||
agg_obj = self | |||
if subset is not None: | |||
# subset needs to be list | |||
subset = com.maybe_make_list(subset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the implementation of maybe_make_list, this will only function correctly if the sequence is a list or tuple (all other sequences will be packed as a single element list). Seems to me we either need make the API more restrictive (lists and tuples only), or improve the implementation to handle generic sequences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g.
df = pd.DataFrame({'a': [1, 1, np.nan], 'b': [1, np.nan, 1], 'c': [np.nan, 1, 1]})
print(df.dropna(subset=np.array(['b', 'c'])))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, reverted back to using is_list_like
, which passes the tests. Also added a new test with np.array
.
@jreback this means I am not using com.maybe_make_list
Hello @erfannariman! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-10-18 14:37:09 UTC |
@rhshadrach @jreback think this one is good to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic looks good - just two nit picks and a question.
doc/source/whatsnew/v1.4.0.rst
Outdated
@@ -129,6 +129,7 @@ Other enhancements | |||
- :meth:`DataFrame.__pos__`, :meth:`DataFrame.__neg__` now retain ``ExtensionDtype`` dtypes (:issue:`43883`) | |||
- The error raised when an optional dependency can't be imported now includes the original exception, for easier investigation (:issue:`43882`) | |||
- Added :meth:`.ExponentialMovingWindow.sum` (:issue:`13297`) | |||
- :meth:`DataFrame.dropna` now accepts a single label as ``subset`` along with array-like. (:issue:`41021`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No period at the end
doc/source/whatsnew/v1.3.4.rst
Outdated
@@ -43,7 +43,6 @@ Bug fixes | |||
Other | |||
~~~~~ | |||
- | |||
- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you revert this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
over to you @rhshadrach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Thanks @erfannariman! |