Skip to content

ENH: check for string and convert to list in DataFrame.dropna subset argument #41022

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Oct 18, 2021
Merged

ENH: check for string and convert to list in DataFrame.dropna subset argument #41022

merged 26 commits into from
Oct 18, 2021

Conversation

erfannariman
Copy link
Member

@erfannariman erfannariman commented Apr 18, 2021

@erfannariman erfannariman changed the title check for string and convert to list ENH: check for string and convert to list in DataFrame.dropna subset argument Apr 18, 2021
@erfannariman
Copy link
Member Author

erfannariman commented Apr 19, 2021

@phofl @mzeitlin11 the test pandas/tests/reductions/test_reductions.py::TestSeriesReductions::test_all_any_params runs succesfully on my machine, but fails here in CI. So cannot reproduce the error to check what is wrong.

edit: my branch is up to date with master, so that cannot be the problem.

@phofl
Copy link
Member

phofl commented Apr 19, 2021

You can ignore the ResourceWarning tests. But pre commit and ci checks is real

@@ -5803,7 +5805,7 @@ def dropna(
axis: Axis = 0,
how: str = "any",
thresh=None,
subset=None,
subset: Optional[Union[Hashable, Sequence[Hashable]]] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you type using new syntax | for Union and | None for Optional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually saw we have an internal type for this if I'm not mistaken:IndexLabel

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonjayhawkins would be the person to ask here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonjayhawkins any chance you can have a look?

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, small request

@rhshadrach rhshadrach added the API - Consistency Internal Consistency of API/Behavior label Apr 20, 2021
@rhshadrach rhshadrach added this to the 1.3 milestone Apr 20, 2021
@@ -35,7 +35,6 @@ Bug fixes
Other
~~~~~

-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meaning your change, sorry if not clear

@@ -219,7 +219,7 @@ Other enhancements
- :meth:`pandas.read_csv` and :meth:`pandas.read_json` expose the argument ``encoding_errors`` to control how encoding errors are handled (:issue:`39450`)
- :meth:`.GroupBy.any` and :meth:`.GroupBy.all` use Kleene logic with nullable data types (:issue:`37506`)
- :meth:`.GroupBy.any` and :meth:`.GroupBy.all` return a ``BooleanDtype`` for columns with nullable data types (:issue:`33449`)
-
- :meth:`DataFrame.dropna` now accepts a single label as ``subset`` along with array-like.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number (or this PR number if no issue)

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label May 21, 2021
@erfannariman
Copy link
Member Author

@erfannariman closing as stale. ping when ready to continue.

Could you please reopen, will finish this one.

@jreback jreback reopened this Oct 9, 2021
@erfannariman
Copy link
Member Author

I get a typing error since subset is IndexLabel, while np.compress expects other type. The functionality should still work, since com.maybe_make_list returns a list.

@@ -42,7 +42,7 @@ Bug fixes

Other
~~~~~
-
- :meth:`DataFrame.dropna` now accepts a single label as ``subset`` along with array-like. (:issue:`41021`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be targeted for 1.4 since it's not a regression. Can go in enhancements there (we try to not use the Other section).

@@ -5990,6 +5990,8 @@ def dropna(

agg_obj = self
if subset is not None:
# subset needs to be list
subset = com.maybe_make_list(subset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the implementation of maybe_make_list, this will only function correctly if the sequence is a list or tuple (all other sequences will be packed as a single element list). Seems to me we either need make the API more restrictive (lists and tuples only), or improve the implementation to handle generic sequences.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g.

df = pd.DataFrame({'a': [1, 1, np.nan], 'b': [1, np.nan, 1], 'c': [np.nan, 1, 1]})
print(df.dropna(subset=np.array(['b', 'c'])))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, reverted back to using is_list_like, which passes the tests. Also added a new test with np.array.

@jreback this means I am not using com.maybe_make_list

@pep8speaks
Copy link

pep8speaks commented Oct 10, 2021

Hello @erfannariman! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-10-18 14:37:09 UTC

@erfannariman
Copy link
Member Author

@rhshadrach @jreback think this one is good to go.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic looks good - just two nit picks and a question.

@@ -129,6 +129,7 @@ Other enhancements
- :meth:`DataFrame.__pos__`, :meth:`DataFrame.__neg__` now retain ``ExtensionDtype`` dtypes (:issue:`43883`)
- The error raised when an optional dependency can't be imported now includes the original exception, for easier investigation (:issue:`43882`)
- Added :meth:`.ExponentialMovingWindow.sum` (:issue:`13297`)
- :meth:`DataFrame.dropna` now accepts a single label as ``subset`` along with array-like. (:issue:`41021`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No period at the end

@@ -43,7 +43,6 @@ Bug fixes
Other
~~~~~
-
-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you revert this change

@lithomas1 lithomas1 removed the Stale label Oct 16, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

over to you @rhshadrach

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@rhshadrach rhshadrach merged commit 58de58f into pandas-dev:master Oct 18, 2021
@rhshadrach
Copy link
Member

Thanks @erfannariman!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: dropna should accept single column as string in subset argument
8 participants