-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: DataFrameGroupby.value_counts #43564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @corriebar I think this is a good request, thanks for including clear input + output + workaround @rhshadrach any thoughts on this? |
Thanks for reporting this, and agree with @MarcoGorelli - well written.
#6540 was asking for counts of the grouping columns, rather than the rest of the columns in the DataFrame. This is very much different from the request here. For #39938, while size was mentioned in the comments, this issue was closed because it lacked a reproducible example demonstrating the request. In general I agree with having a consistent API between Series / DataFrame/ SeriesGroupBy / DataFrameGroupby when the operation makes sense, and that seems to me to be the case here. The oddity is that value_counts doesn't fit into one of the agg / transform / filter buckets that most groupby ops do. The index should be the groups for agg, the original index for transform, and a subset of the original index for a filter - but because value_counts doesn't fit into one of these, I think it's okay that the resulting index be the grouping columns combined with the other columns in the DataFrame. This then agrees with Looking at the code for
Timing:
|
take |
@rhshadrach I have implemented #44267 loosely based on your suggestion. |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the master branch of pandas.
Reproducible Example
Issue Description
Since
value_counts()
is defined both for aSeries
and aDataFrame
, I also expect it to work on both aSeriesGroupBy
and aDataFrameGroupBy
.There are some related Issues: #39938 and #6540, these have been dismissed so far with the argument that
size()
already does that, butsize()
alone is not enough to get the proportions per group.Expected Behavior
I managed to get the desired behaviour by applying
value_counts()
to each group:Installed Versions
INSTALLED VERSIONS
commit : 73c6825
python : 3.9.7.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Tue Jun 22 19:49:55 PDT 2021; root:xnu-6153.141.35~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8
pandas : 1.3.3
numpy : 1.21.2
pytz : 2021.1
dateutil : 2.8.2
pip : 21.2.4
setuptools : 58.0.4
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.27.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None
The text was updated successfully, but these errors were encountered: