-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: nunique not ignoring both None and np.nan #37607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @GYHHAHA for the PR!
Some comments
pandas/core/base.py
Outdated
@@ -1032,10 +1032,11 @@ def nunique(self, dropna: bool = True) -> int: | |||
>>> s.nunique() | |||
4 | |||
""" | |||
uniqs = self.unique() | |||
if dropna: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd do
obj = self.dropna() if dropna else self
return len(obj.unique())
pandas/tests/base/test_unique.py
Outdated
|
||
|
||
def test_nunique_dropna(): | ||
# test for #37566 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do # GH37566
pandas/tests/base/test_unique.py
Outdated
@@ -121,3 +121,11 @@ def test_unique_bad_unicode(idx_or_series_w_bad_unicode): | |||
else: | |||
expected = np.array(["\ud83d"], dtype=object) | |||
tm.assert_numpy_array_equal(result, expected) | |||
|
|||
|
|||
def test_nunique_dropna(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parametrize on dropna
(True/False)
pandas/tests/base/test_unique.py
Outdated
def test_nunique_dropna(): | ||
# test for #37566 | ||
s = pd.Series(['yes', 'yes', pd.NA, np.nan, None, pd.NaT]) | ||
res = s.nunique(dropna=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just do assert res == 1
doc/source/whatsnew/v1.2.0.rst
Outdated
@@ -562,6 +562,7 @@ Other | |||
- Bug in :meth:`Index.union` behaving differently depending on whether operand is a :class:`Index` or other list-like (:issue:`36384`) | |||
- Passing an array with 2 or more dimensions to the :class:`Series` constructor now raises the more specific ``ValueError``, from a bare ``Exception`` previously (:issue:`35744`) | |||
- Bug in ``accessor.DirNamesMixin``, where ``dir(obj)`` wouldn't show attributes defined on the instance (:issue:`37173`). | |||
- Bug in :meth:`nunique` with ``dropna = True`` returns wrong result when different kinds of NA-like values exist (:issue:`37566`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug in :meth:`Series.nunique` with ``dropna = True`` was returning incorrect results when both ``NA`` and ``None`` missing values were present (:issue:`37566`)
pandas/tests/base/test_unique.py
Outdated
def test_nunique_dropna(dropna): | ||
# GH37566 | ||
s = pd.Series(["yes", "yes", pd.NA, np.nan, None, pd.NaT]) | ||
res = s.nunique(dropna=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
res = s.nunique(dropna)
here, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more comment, o/w lgtm
doc/source/whatsnew/v1.2.0.rst
Outdated
@@ -562,6 +562,7 @@ Other | |||
- Bug in :meth:`Index.union` behaving differently depending on whether operand is a :class:`Index` or other list-like (:issue:`36384`) | |||
- Passing an array with 2 or more dimensions to the :class:`Series` constructor now raises the more specific ``ValueError``, from a bare ``Exception`` previously (:issue:`35744`) | |||
- Bug in ``accessor.DirNamesMixin``, where ``dir(obj)`` wouldn't show attributes defined on the instance (:issue:`37173`). | |||
- Bug in :meth:`Series.nunique` with ``dropna = True`` was returning incorrect results when both ``NA`` and ``None`` missing values were present (:issue:`37566`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you delete spaces around =
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
looks good, just @phofl comment i think is left, ping on green. |
cc @jreback |
thanks @GYHHAHA very nice |
if dropna and isna(uniqs).any(): | ||
n -= 1 | ||
return n | ||
obj = remove_na_arraylike(self) if dropna else self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Late comment, but would it be more performant if we removed the NAs after calculating the uniques? (assume the uniques is typically a much smaller array, and not this does an extra scan and creates a copy of the full array)
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff