-
-
Notifications
You must be signed in to change notification settings - Fork 46.6k
Solving the Top k most frequent words
problem using a max-heap
#8125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Click here to look at the relevant links ⬇️
🔗 Relevant Links
Repository:
Python:
Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.
algorithms-keeper
commands and options
algorithms-keeper actions can be triggered by commenting on this PR:
@algorithms-keeper review
to trigger the checks for only added pull request files@algorithms-keeper review-all
to trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very well written algorithm, great stuff!
Hello @cclauss, I'm tagging you as I saw your involvement in several PRs. |
All of this seems like it could be done in just a few lines using only a |
Thanks for the feedback. This is using There is already a Thank you |
from collections import Counter
def top_k_frequent_words(words, k_value):
return [x[0] for x in Counter(words).most_common(k_value)] |
Please put the text of our last two messages into the file’s docstring and then we can merge this. That will show why this work was done and that there also that the Python standard library provides a far more straightforward way to solve the same problem. |
Thanks for the feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Describe your change:
This PR aims to add an algorithm to identify the top
k
most frequent strings given a provided string list of elements.To do this, the algorithm is using a max-heap implementation already existing in this repository (a generic type was introduced to allow the usage).
Time complexity is
O(n)
, wheren
is the number of words:O(n)
for building the max-heapk*O(logn)
for extracting thek
most frequent stringsChecklist:
Fixes: #{$ISSUE_NO}
.