Skip to content

BUG: nunique not ignoring both None and np.nan #37607

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Nov 10, 2020
Merged

BUG: nunique not ignoring both None and np.nan #37607

merged 11 commits into from
Nov 10, 2020

Conversation

GYHHAHA
Copy link
Contributor

@GYHHAHA GYHHAHA commented Nov 3, 2020

@pep8speaks
Copy link

pep8speaks commented Nov 3, 2020

Hello @GYHHAHA! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-11-10 01:20:07 UTC

Copy link
Member

@arw2019 arw2019 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @GYHHAHA for the PR!

Some comments

@@ -1032,10 +1032,11 @@ def nunique(self, dropna: bool = True) -> int:
>>> s.nunique()
4
"""
uniqs = self.unique()
if dropna:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd do

obj = self.dropna() if dropna else self
return len(obj.unique())



def test_nunique_dropna():
# test for #37566
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do # GH37566

@@ -121,3 +121,11 @@ def test_unique_bad_unicode(idx_or_series_w_bad_unicode):
else:
expected = np.array(["\ud83d"], dtype=object)
tm.assert_numpy_array_equal(result, expected)


def test_nunique_dropna():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parametrize on dropna (True/False)

def test_nunique_dropna():
# test for #37566
s = pd.Series(['yes', 'yes', pd.NA, np.nan, None, pd.NaT])
res = s.nunique(dropna=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just do assert res == 1

@@ -562,6 +562,7 @@ Other
- Bug in :meth:`Index.union` behaving differently depending on whether operand is a :class:`Index` or other list-like (:issue:`36384`)
- Passing an array with 2 or more dimensions to the :class:`Series` constructor now raises the more specific ``ValueError``, from a bare ``Exception`` previously (:issue:`35744`)
- Bug in ``accessor.DirNamesMixin``, where ``dir(obj)`` wouldn't show attributes defined on the instance (:issue:`37173`).
- Bug in :meth:`nunique` with ``dropna = True`` returns wrong result when different kinds of NA-like values exist (:issue:`37566`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug in :meth:`Series.nunique` with ``dropna = True`` was returning incorrect results when both ``NA`` and ``None`` missing values were present (:issue:`37566`)

@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate and removed Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Nov 4, 2020
def test_nunique_dropna(dropna):
# GH37566
s = pd.Series(["yes", "yes", pd.NA, np.nan, None, pd.NaT])
res = s.nunique(dropna=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

res = s.nunique(dropna) here, no?

Copy link
Member

@arw2019 arw2019 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more comment, o/w lgtm

@@ -562,6 +562,7 @@ Other
- Bug in :meth:`Index.union` behaving differently depending on whether operand is a :class:`Index` or other list-like (:issue:`36384`)
- Passing an array with 2 or more dimensions to the :class:`Series` constructor now raises the more specific ``ValueError``, from a bare ``Exception`` previously (:issue:`35744`)
- Bug in ``accessor.DirNamesMixin``, where ``dir(obj)`` wouldn't show attributes defined on the instance (:issue:`37173`).
- Bug in :meth:`Series.nunique` with ``dropna = True`` was returning incorrect results when both ``NA`` and ``None`` missing values were present (:issue:`37566`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you delete spaces around =?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jreback jreback added this to the 1.2 milestone Nov 10, 2020
@jreback
Copy link
Contributor

jreback commented Nov 10, 2020

looks good, just @phofl comment i think is left, ping on green.

@GYHHAHA
Copy link
Contributor Author

GYHHAHA commented Nov 10, 2020

cc @jreback

@jreback jreback merged commit c156b12 into pandas-dev:master Nov 10, 2020
@jreback
Copy link
Contributor

jreback commented Nov 10, 2020

thanks @GYHHAHA very nice

if dropna and isna(uniqs).any():
n -= 1
return n
obj = remove_na_arraylike(self) if dropna else self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Late comment, but would it be more performant if we removed the NAs after calculating the uniques? (assume the uniques is typically a much smaller array, and not this does an extra scan and creates a copy of the full array)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: nunique not ignoring both None and np.nan
6 participants