Skip to content

BooleanArray, StringArray value_counts na_value #30685

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jbrockmendel opened this issue Jan 4, 2020 · 3 comments · Fixed by #30824
Closed

BooleanArray, StringArray value_counts na_value #30685

jbrockmendel opened this issue Jan 4, 2020 · 3 comments · Fixed by #30824
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.
Milestone

Comments

@jbrockmendel
Copy link
Member

arr = pd.array([True, False, np.nan])

>>> arr
<BooleanArray>
[True, False, NA]
Length: 3, dtype: boolean

>>> arr.value_counts(dropna=False)
True     1
False    1
False    1
dtype: int64

Instead of two False entries, one of them should be pd.NA right? cc @TomAugspurger

arr = pd.array(['foo', 'bar', None])

>>> arr.value_counts(dropna=False)
foo    1
bar    1
NaN    1
dtype: int64

The NaN should be pd.NA right?

Side-note: the defaults for dropna are not consistent across our EAs

@jorisvandenbossche jorisvandenbossche added this to the 1.0 milestone Jan 6, 2020
@jorisvandenbossche jorisvandenbossche added the ExtensionArray Extending pandas with custom dtypes or arrays. label Jan 6, 2020
@jorisvandenbossche
Copy link
Member

That's clearly a bug in the BooleanArray.value_counts implementation. And indeed, we should use pd.NA instead of np.nan in the object index.

@jorisvandenbossche
Copy link
Member

@jbrockmendel you have an open PR that relates to EA.value_counts. Does that in some way also fixes this? (otherwise I can have a look at this issue)

@jbrockmendel
Copy link
Member Author

Does that in some way also fixes this? (otherwise I can have a look at this issue)

No, that shouldn't change any behavior.

I think the fix is fairly straightforward for BooleanArray, on L572 np.concatenate([index, np.array([np.nan], dtype=object)]), dtype=object) changing the np.nan to pd.NA should do the trick. Or possibly use self.dtype.na_value like IntegerArray.

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Jan 8, 2020
TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Jan 8, 2020
TomAugspurger added a commit that referenced this issue Jan 9, 2020
* BUG: BooleanArray.value_counts dropna

Closes #30685
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants