-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DOC: value_counts
description doesn't match code logic
#48635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Let me work on this issue. |
Cool. I am reporting this as I have seen quite a few people confused by the behaviour. Let me know if I can be of any help (review, discuss). |
@maciejskorski |
I don't understand the issue here. Why do you think that value_counts does not count unique rows? That this is done via groupby is an implementation detail that should not be visible in the docs |
Thanks for the report! I think we should stick to a description that doesn't rely on other parts of pandas (e.g. groupby) as much as possible. Otherwise it may be difficult for new users to understand what this method should do.
This is not how I read the docstring, but I can see that it can be interpreted that way! To me, What about something like:
|
Better to my taste! This wording better separates “unique row” from “occurrences”. Maybe even "unique row -> unique row value". To emphasize counting of row values (tuples), rather than rows themselves (row can be seen unique merely by indexing). Also, row value is a well-known term (from SQL standard). |
But I don't believe it is used much in pandas - the two occurrences I see in the docs both explain the term. It seems better to me to indicate the index is ignored rather than using this terminology. |
I don't insist. I acknowledge that So please re-read my request as asking for decoupling “unique” and “count”, like proposed above, that would do the job. Sorry for that verbose communication, but I come with research background where the choice of words means a lot :-) |
I would like to take on this issue |
Take, if still available |
Take |
Pandas version checks
main
hereLocation of the documentation
dev docs
last release docs
Documentation problem
value_counts
utility incorrectly claims counting "unique rows", respectively "unique combinations".In reality, it only does
groupby
followed bysize
, nonunique
:pandas/pandas/core/frame.py
Lines 6468 to 6592 in ca60aab
Suggested fix for documentation
Align the doc string to match the code logic.
I suggest not to mention "unique rows" or "unique combinations" and maybe mention the equivalence to
groupby
+size
with the optionalnormalization
.Will be happy to submit a PR for that, or share my two cents in the discussion.
The text was updated successfully, but these errors were encountered: