Skip to content

pd.NA TypeError in drop_duplicates with object dtype #32992

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
WillAyd opened this issue Mar 24, 2020 · 5 comments · Fixed by #33751
Closed

pd.NA TypeError in drop_duplicates with object dtype #32992

WillAyd opened this issue Mar 24, 2020 · 5 comments · Fixed by #33751
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@WillAyd
Copy link
Member

WillAyd commented Mar 24, 2020

Code Sample, a copy-pastable example if possible

>>> pd.DataFrame([[1, pd.NA], [2, "a"]]).drop_duplicates()
Traceback (most recent call last):
   ...
  File "/Users/williamayd/miniconda3/envs/sitka/lib/python3.8/site-packages/pandas/core/frame.py", line 4859, in f
    labels, shape = algorithms.factorize(
  File "/Users/williamayd/miniconda3/envs/sitka/lib/python3.8/site-packages/pandas/core/algorithms.py", line 629, in factorize
    codes, uniques = _factorize_array(
  File "/Users/williamayd/miniconda3/envs/sitka/lib/python3.8/site-packages/pandas/core/algorithms.py", line 478, in _factorize_array
    uniques, codes = table.factorize(values, na_sentinel=na_sentinel, na_value=na_value)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1806, in pandas._libs.hashtable.PyObjectHashTable.factorize
  File "pandas/_libs/hashtable_class_helper.pxi", line 1728, in pandas._libs.hashtable.PyObjectHashTable._unique
  File "pandas/_libs/missing.pyx", line 360, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous

This same failure isn't present when using an extension type:

>>> df = pd.DataFrame([[1, pd.NA], [2, "a"]], columns=list("ab"))
>>> df["b"] = df["b"].astype("string")
>>> df.drop_duplicates()
   a     b
0  1  <NA>
1  2     a
@WillAyd WillAyd changed the title pd.NA raises in drop_duplicates with object dtype pd.NA TypeError in drop_duplicates with object dtype Mar 24, 2020
@WillAyd WillAyd added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label Mar 24, 2020
@AnnaDaglis
Copy link
Contributor

I cannot reproduce the error. Has it been fixed already?

@WillAyd
Copy link
Member Author

WillAyd commented Mar 26, 2020

Thanks @AnnaDaglis and well spotted. @jbrockmendel any idea where this might have been fixed?

@jbrockmendel
Copy link
Member

any idea where this might have been fixed?

no idea off the top of my head.

@jorisvandenbossche jorisvandenbossche added the Needs Tests Unit test(s) needed to prevent regressions label Mar 27, 2020
@JochenFromm
Copy link

I can not reproduce it either, but it could be related to #15752

@simonjayhawkins
Copy link
Member

Thanks @AnnaDaglis and well spotted. @jbrockmendel any idea where this might have been fixed?

fixed in #31939 (i.e. 1.0.2)

41bc226 is the first new commit
commit 41bc226
Author: Daniel Saxton [email protected]
Date: Sun Feb 23 08:57:07 2020 -0600

BUG: Fix construction of Categorical from pd.NA (#31939)

@simonjayhawkins simonjayhawkins added good first issue and removed Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Apr 23, 2020
@jreback jreback added this to the 1.1 milestone Apr 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants