Skip to content

ENH: Add dropna argument to pd.DataFrame.value_counts() #41325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
connesy opened this issue May 5, 2021 · 0 comments · Fixed by #41334
Closed

ENH: Add dropna argument to pd.DataFrame.value_counts() #41325

connesy opened this issue May 5, 2021 · 0 comments · Fixed by #41334
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff API - Consistency Internal Consistency of API/Behavior Enhancement
Milestone

Comments

@connesy
Copy link
Contributor

connesy commented May 5, 2021

Is your feature request related to a problem?

With pd.Series.value_counts() it is possible to specify dropna=False, but that argument does not exist in pd.DataFrame.value_counts(). As a consequence, all rows that contain at least one NA element is dropped when using df.value_counts().

Describe the solution you'd like

It should be possible to call df.value_counts() with dropna=False and get a count for each unique row, including rows that have NAs in them.

API breaking implications

Like with pd.Series.value_counts() the default should be dropna=True. This will keep consistency between the two implementations, and leave current behavior unchanged.

Describe alternatives you've considered

Additional context

>>> import pandas as pd
>>> s1 = pd.Series([1, 2, 3, pd.NA, 3])
>>> s2 = pd.Series([pd.NA, 1, pd.NA, 4, 2])
>>> s1.value_counts(dropna=False)
3.0    2
NaN    1
1.0    1
2.0    1
dtype: int64
>>> df = pd.DataFrame(zip(s1, s2), columns=['s1', 's2'])
>>> df
     s1    s2
0     1  <NA>
1     2     1
2     3  <NA>
3  <NA>     4
4     3     2
>>> df.value_counts()
s1  s2
2   1     1
3   2     1
dtype: int64
@connesy connesy added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels May 5, 2021
@lithomas1 lithomas1 added API - Consistency Internal Consistency of API/Behavior Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 5, 2021
@jreback jreback added this to the 1.3 milestone May 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff API - Consistency Internal Consistency of API/Behavior Enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants