Skip to content

ENH: check for string and convert to list in DataFrame.dropna subset argument #41022

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Oct 18, 2021
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
6afcbcd
check for string and convert to list
erfannariman Apr 18, 2021
86d2cd8
add test for missing key in axis
erfannariman Apr 18, 2021
03d3e0d
add match pytest
erfannariman Apr 18, 2021
f7a1847
add whatsnew entry
erfannariman Apr 18, 2021
0326c7a
add correct typing and checks
erfannariman Apr 18, 2021
63a78ef
rename typing in docstring:
erfannariman Apr 18, 2021
93fb0b3
fix typo whatsnew
erfannariman Apr 18, 2021
103bd35
use is list like
erfannariman Apr 18, 2021
365c6c0
fix tests
erfannariman Apr 18, 2021
cace369
fix typing
erfannariman Apr 19, 2021
66d447d
use pandas typing
erfannariman Apr 19, 2021
7be5fe7
more clear whatsnew entry
erfannariman Apr 20, 2021
1dd0a64
move whatsnew entry to 1.3
erfannariman Apr 20, 2021
56ef81e
revert whatsnew 1.2.5 to master
erfannariman Apr 20, 2021
ae633cf
merge master and fix conflicts, plus add gh issue to whatsnew
erfannariman Oct 10, 2021
4b905e5
use com.maybe_make_list
erfannariman Oct 10, 2021
c585c8f
remove is list like
erfannariman Oct 10, 2021
1d5f208
remove list like
erfannariman Oct 10, 2021
2bf5ac3
move whatsnew entry to 1.4
erfannariman Oct 10, 2021
9ea97ff
revert to is_list_like and add test with np.array
erfannariman Oct 10, 2021
d09c6ba
remove blank line
erfannariman Oct 10, 2021
c09d147
use basic indexing instead of np.compress
erfannariman Oct 10, 2021
e95200b
merge master
erfannariman Oct 16, 2021
df5ca75
remove dot
erfannariman Oct 16, 2021
17fe9bc
use list instead of tuple
erfannariman Oct 16, 2021
b471701
Merge branch 'master' into dropna-accept-string
erfannariman Oct 18, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.2.5.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ Bug fixes
Other
~~~~~

-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meaning your change, sorry if not clear

- :meth:`DataFrame.dropna` now accepts a single column / string as ``subset`` instead of array-like
-

.. ---------------------------------------------------------------------------
Expand Down
7 changes: 5 additions & 2 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -5803,7 +5803,7 @@ def dropna(
axis: Axis = 0,
how: str = "any",
thresh=None,
subset=None,
subset: IndexLabel = None,
inplace: bool = False,
):
"""
Expand Down Expand Up @@ -5835,7 +5835,7 @@ def dropna(

thresh : int, optional
Require that many non-NA values.
subset : array-like, optional
subset : column label or sequence of labels, optional
Labels along other axis to consider, e.g. if you are dropping rows
these would be a list of columns to include.
inplace : bool, default False
Expand Down Expand Up @@ -5919,6 +5919,9 @@ def dropna(

agg_obj = self
if subset is not None:
if not is_list_like(subset):
# subset needs to be list like
subset = (subset,)
ax = self._get_axis(agg_axis)
indices = ax.get_indexer_for(subset)
check = indices == -1
Expand Down
17 changes: 17 additions & 0 deletions pandas/tests/frame/methods/test_dropna.py
Original file line number Diff line number Diff line change
Expand Up @@ -231,3 +231,20 @@ def test_dropna_with_duplicate_columns(self):

result = df.dropna(subset=["A", "C"], how="all")
tm.assert_frame_equal(result, expected)

def test_set_single_column_subset(self):
# GH 41021
df = DataFrame({"A": [1, 2, 3], "B": list("abc"), "C": [4, np.NaN, 5]})
expected = DataFrame(
{"A": [1, 3], "B": list("ac"), "C": [4.0, 5.0]}, index=[0, 2]
)
result = df.dropna(subset="C")
tm.assert_frame_equal(result, expected)

def test_single_column_not_present_in_axis(self):
# GH 41021
df = DataFrame({"A": [1, 2, 3]})

# Column not present
with pytest.raises(KeyError, match="['D']"):
df.dropna(subset="D", axis=0)