Solving the `Top k most frequent words` problem using a max-heap #8125

aparibocci · 2023-02-07T19:44:15Z

Describe your change:

This PR aims to add an algorithm to identify the top k most frequent strings given a provided string list of elements.
To do this, the algorithm is using a max-heap implementation already existing in this repository (a generic type was introduced to allow the usage).

Time complexity is O(n), where n is the number of words:

O(n) for building the max-heap
k*O(logn) for extracting the k most frequent strings

Add an algorithm?
Fix a bug or typo in an existing algorithm?
Documentation change?

Checklist:

algorithms-keeper

Click here to look at the relevant links ⬇️

🔗 Relevant Links

Repository:

Contributing guidelines

Project Euler solution guidelines

Python:

Formatted string literals (f-strings)

Type hints

doctest

unittest

pytest

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper commands and options

algorithms-keeper actions can be triggered by commenting on this PR:

@algorithms-keeper review to trigger the checks for only added pull request files

@algorithms-keeper review-all to trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.

NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.

strings/top_k_frequent_words.py

CaedenPH

Very well written algorithm, great stuff!

aparibocci · 2023-04-23T15:09:25Z

Hello @cclauss, I'm tagging you as I saw your involvement in several PRs.
Do you know who can I ask to for a review? Is there anything I should still change in the PR?
Thank you!

cclauss · 2023-04-23T15:24:48Z

All of this seems like it could be done in just a few lines using only a collections.Counter so what are the advantages of this approach?

aparibocci · 2023-04-23T15:39:57Z

All of this seems like it could be done in just a few lines using only a collections.Counter so what are the advantages of this approach?

Thanks for the feedback.
I had a look to collections.Counter and noticed that, indeed, most_common solves exactly this problem: https://github.com/python/cpython/blob/3.11/Lib/collections/__init__.py#L608

This is using heapq.nlargest behind the scenes: https://github.com/python/cpython/blob/3.11/Lib/heapq.py#L523

There is already a Heap class in this repository, I imagined it would be useful to show a typical usage of heaps (i.e. finding out order statistics). Do you think I should close this PR entirely, or do you have other suggestions?

Thank you

cclauss · 2023-04-23T16:38:38Z

from collections import Counter
def top_k_frequent_words(words, k_value):
    return [x[0] for x in Counter(words).most_common(k_value)]

cclauss · 2023-04-23T17:19:54Z

Please put the text of our last two messages into the file’s docstring and then we can merge this. That will show why this work was done and that there also that the Python standard library provides a far more straightforward way to solve the same problem.

aparibocci · 2023-04-23T17:32:48Z

Please put the text of our last two messages into the file’s docstring and then we can merge this. That will show why this work was done and that there also that the Python standard library provides a far more straightforward way to solve the same problem.

Thanks for the feedback.
I have updated the docstring mentioning the (preferable) Python standard library solution.

cclauss

Nice!

algorithms-keeper bot reviewed Feb 7, 2023

View reviewed changes

strings/top_k_frequent_words.py Outdated Show resolved Hide resolved

strings/top_k_frequent_words.py Outdated Show resolved Hide resolved

strings/top_k_frequent_words.py Outdated Show resolved Hide resolved

strings/top_k_frequent_words.py Outdated Show resolved Hide resolved

aparibocci force-pushed the master branch from 059a7c0 to 26815ce Compare February 7, 2023 19:56

algorithms-keeper bot removed require descriptive names This PR needs descriptive function and/or variable names require tests Tests [doctest/unittest/pytest] are required require type hints https://docs.python.org/3/library/typing.html labels Feb 7, 2023

CaedenPH approved these changes Feb 7, 2023

View reviewed changes

algorithms-keeper bot added the tests are failing Do not merge until tests pass label Apr 23, 2023

cclauss approved these changes Apr 23, 2023

View reviewed changes

cclauss closed this Apr 23, 2023

cclauss force-pushed the master branch from c643127 to 1158294 Compare April 23, 2023 22:07

cclauss mentioned this pull request Apr 23, 2023

Solving the Top k most frequent words problem using a max-heap #8685

Merged

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Solving the `Top k most frequent words` problem using a max-heap #8125

Solving the `Top k most frequent words` problem using a max-heap #8125

aparibocci commented Feb 7, 2023 •

edited

Loading

algorithms-keeper bot left a comment

CaedenPH left a comment

aparibocci commented Apr 23, 2023

cclauss commented Apr 23, 2023 •

edited

Loading

aparibocci commented Apr 23, 2023

cclauss commented Apr 23, 2023

cclauss commented Apr 23, 2023 •

edited

Loading

aparibocci commented Apr 23, 2023

cclauss left a comment

Solving the Top k most frequent words problem using a max-heap #8125

Solving the Top k most frequent words problem using a max-heap #8125

Conversation

aparibocci commented Feb 7, 2023 • edited Loading

Describe your change:

Checklist:

algorithms-keeper bot left a comment

Choose a reason for hiding this comment

🔗 Relevant Links

Repository:

Python:

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper actions can be triggered by commenting on this PR:

CaedenPH left a comment

Choose a reason for hiding this comment

aparibocci commented Apr 23, 2023

cclauss commented Apr 23, 2023 • edited Loading

aparibocci commented Apr 23, 2023

cclauss commented Apr 23, 2023

cclauss commented Apr 23, 2023 • edited Loading

aparibocci commented Apr 23, 2023

cclauss left a comment

Choose a reason for hiding this comment

Solving the `Top k most frequent words` problem using a max-heap #8125

Solving the `Top k most frequent words` problem using a max-heap #8125

aparibocci commented Feb 7, 2023 •

edited

Loading

cclauss commented Apr 23, 2023 •

edited

Loading

cclauss commented Apr 23, 2023 •

edited

Loading