Skip to content

Suffix Tree Data Structure Implementation #11555

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 42 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
0d6985c
Implemented KD-Tree Data Structure
Ramy-Badr-Ahmed Aug 28, 2024
6665d23
Implemented KD-Tree Data Structure. updated DIRECTORY.md.
Ramy-Badr-Ahmed Aug 28, 2024
6b3d47e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 28, 2024
4203cda
Create __init__.py
Ramy-Badr-Ahmed Aug 28, 2024
3222bd3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 28, 2024
a41ae5b
Replaced legacy `np.random.rand` call with `np.random.Generator` in k…
Ramy-Badr-Ahmed Aug 28, 2024
1668d73
Replaced legacy `np.random.rand` call with `np.random.Generator` in k…
Ramy-Badr-Ahmed Aug 28, 2024
81d6917
added typehints and docstrings
Ramy-Badr-Ahmed Aug 28, 2024
6cddcbd
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 28, 2024
8b238d1
docstring for search()
Ramy-Badr-Ahmed Aug 28, 2024
cd1dd9f
Merge remote-tracking branch 'origin/feature/kd-tree-implementation' …
Ramy-Badr-Ahmed Aug 28, 2024
ead2838
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 28, 2024
543584c
Added tests. Updated docstrings/typehints
Ramy-Badr-Ahmed Aug 28, 2024
ad31f83
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 28, 2024
1322921
updated tests and used | for type annotations
Ramy-Badr-Ahmed Aug 28, 2024
4608a9f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 28, 2024
7c1aa7e
E501 for build_kdtree.py, hypercube_points.py, nearest_neighbour_sear…
Ramy-Badr-Ahmed Aug 29, 2024
ba24e75
I001 for example_usage.py and test_kdtree.py
Ramy-Badr-Ahmed Aug 29, 2024
05975a3
I001 for example_usage.py and test_kdtree.py
Ramy-Badr-Ahmed Aug 29, 2024
31782d1
Update data_structures/kd_tree/build_kdtree.py
Ramy-Badr-Ahmed Sep 3, 2024
6a9b3e1
Update data_structures/kd_tree/example/hypercube_points.py
Ramy-Badr-Ahmed Sep 3, 2024
2fd24d4
Update data_structures/kd_tree/example/hypercube_points.py
Ramy-Badr-Ahmed Sep 3, 2024
2cf9d92
Added new test cases requested in Review. Refactored the test_build_k…
Ramy-Badr-Ahmed Sep 3, 2024
a3803ee
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 3, 2024
f1f5862
Considered ruff errors
Ramy-Badr-Ahmed Sep 3, 2024
ec6559d
Merge remote-tracking branch 'origin/feature/kd-tree-implementation' …
Ramy-Badr-Ahmed Sep 3, 2024
5c07a1a
Considered ruff errors
Ramy-Badr-Ahmed Sep 3, 2024
3c09ac1
Apply suggestions from code review
cclauss Sep 3, 2024
bab43e7
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 3, 2024
a10ff15
Update kd_node.py
cclauss Sep 3, 2024
d77a285
imported annotations from __future__
Ramy-Badr-Ahmed Sep 3, 2024
0426806
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 3, 2024
9c4cbd4
Implementation of the suffix tree data structure
Ramy-Badr-Ahmed Sep 7, 2024
95ae328
Adding data to DIRECTORY.md
Ramy-Badr-Ahmed Sep 7, 2024
1454bb2
Minor file renaming
Ramy-Badr-Ahmed Sep 7, 2024
c559de9
Merge branch 'TheAlgorithms:master' into master
Ramy-Badr-Ahmed Sep 7, 2024
a2b3a86
minor correction
Ramy-Badr-Ahmed Sep 7, 2024
323d53a
Merge branch 'feature/suffix-tree-implementation'
Ramy-Badr-Ahmed Sep 7, 2024
4fd5c3e
Merge branch 'TheAlgorithms:master' into feature/kd-tree-implementation
Ramy-Badr-Ahmed Sep 7, 2024
51832af
renaming in DIRECTORY.md
Ramy-Badr-Ahmed Sep 7, 2024
5097923
Merge branch 'feature/suffix-tree-implementation'
Ramy-Badr-Ahmed Sep 7, 2024
283a9b7
Merge branch 'feature/kd-tree-implementation'
Ramy-Badr-Ahmed Sep 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions DIRECTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -291,6 +291,10 @@
* [Nearest Neighbour Search](data_structures/kd_tree/nearest_neighbour_search.py)
* [Hypercibe Points](data_structures/kd_tree/example/hypercube_points.py)
* [Example Usage](data_structures/kd_tree/example/example_usage.py)
* Suffix Tree
* [Suffix Tree Node](data_structures/suffix_tree/suffix_tree_node.py)
* [Suffix Tree](data_structures/suffix_tree/suffix_tree.py)
* [Example Usage](data_structures/suffix_tree/example/example_usage.py)

## Digital Image Processing
* [Change Brightness](digital_image_processing/change_brightness.py)
Expand Down
Empty file.
Empty file.
29 changes: 29 additions & 0 deletions data_structures/suffix_tree/example/example_usage.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
from data_structures.suffix_tree.suffix_tree import SuffixTree


def main() -> None:
"""
Demonstrate the usage of the SuffixTree class.

- Initializes a SuffixTree with a predefined text.
- Defines a list of patterns to search for within the suffix tree.
- Searches for each pattern in the suffix tree.

Patterns tested:
- "ana" (found) --> True
- "ban" (found) --> True
- "na" (found) --> True
- "xyz" (not found) --> False
- "mon" (found) --> True
"""
text = "monkey banana"
suffix_tree = SuffixTree(text)

patterns = ["ana", "ban", "na", "xyz", "mon"]
for pattern in patterns:
found = suffix_tree.search(pattern)
print(f"Pattern '{pattern}' found: {found}")


if __name__ == "__main__":
main()
58 changes: 58 additions & 0 deletions data_structures/suffix_tree/suffix_tree.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
from data_structures.suffix_tree.suffix_tree_node import SuffixTreeNode


class SuffixTree:
def __init__(self, text: str) -> None:
"""
Initializes the suffix tree with the given text.

Args:
text (str): The text for which the suffix tree is to be built.
"""
self.text: str = text
self.root: SuffixTreeNode = SuffixTreeNode()
self.build_suffix_tree()

def build_suffix_tree(self) -> None:
"""
Builds the suffix tree for the given text by adding all suffixes.
"""
text = self.text
n = len(text)
for i in range(n):
suffix = text[i:]
self._add_suffix(suffix, i)

def _add_suffix(self, suffix: str, index: int) -> None:
"""
Adds a suffix to the suffix tree.

Args:
suffix (str): The suffix to add.
index (int): The starting index of the suffix in the original text.
"""
node = self.root
for char in suffix:
if char not in node.children:
node.children[char] = SuffixTreeNode()
node = node.children[char]
node.is_end_of_string = True
node.start = index
node.end = index + len(suffix) - 1

def search(self, pattern: str) -> bool:
"""
Searches for a pattern in the suffix tree.

Args:
pattern (str): The pattern to search for.

Returns:
bool: True if the pattern is found, False otherwise.
"""
node = self.root
for char in pattern:
if char not in node.children:
return False
node = node.children[char]
return True
26 changes: 26 additions & 0 deletions data_structures/suffix_tree/suffix_tree_node.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
from __future__ import annotations
from typing import Dict, Optional

Check failure on line 2 in data_structures/suffix_tree/suffix_tree_node.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (UP035)

data_structures/suffix_tree/suffix_tree_node.py:2:1: UP035 `typing.Dict` is deprecated, use `dict` instead

Check failure on line 2 in data_structures/suffix_tree/suffix_tree_node.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (F401)

data_structures/suffix_tree/suffix_tree_node.py:2:26: F401 `typing.Optional` imported but unused


class SuffixTreeNode:

Check failure on line 5 in data_structures/suffix_tree/suffix_tree_node.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (I001)

data_structures/suffix_tree/suffix_tree_node.py:1:1: I001 Import block is un-sorted or un-formatted
def __init__(self,
children: Dict[str, 'SuffixTreeNode'] = None,

Check failure on line 7 in data_structures/suffix_tree/suffix_tree_node.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (RUF013)

data_structures/suffix_tree/suffix_tree_node.py:7:28: RUF013 PEP 484 prohibits implicit `Optional`

Check failure on line 7 in data_structures/suffix_tree/suffix_tree_node.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (UP006)

data_structures/suffix_tree/suffix_tree_node.py:7:28: UP006 Use `dict` instead of `Dict` for type annotation

Check failure on line 7 in data_structures/suffix_tree/suffix_tree_node.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (UP037)

data_structures/suffix_tree/suffix_tree_node.py:7:38: UP037 Remove quotes from type annotation
is_end_of_string: bool = False,
start: int | None = None,
end: int | None = None,
suffix_link: SuffixTreeNode | None = None) -> None:
"""
Initializes a suffix tree node.

Parameters:
children (Dict[str, SuffixTreeNode], optional): The children of this node. Defaults to an empty dictionary.

Check failure on line 16 in data_structures/suffix_tree/suffix_tree_node.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (E501)

data_structures/suffix_tree/suffix_tree_node.py:16:89: E501 Line too long (119 > 88)
is_end_of_string (bool, optional): Indicates if this node represents the end of a string. Defaults to False.

Check failure on line 17 in data_structures/suffix_tree/suffix_tree_node.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (E501)

data_structures/suffix_tree/suffix_tree_node.py:17:89: E501 Line too long (120 > 88)
start (int | None, optional): The start index of the suffix in the text. Defaults to None.

Check failure on line 18 in data_structures/suffix_tree/suffix_tree_node.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (E501)

data_structures/suffix_tree/suffix_tree_node.py:18:89: E501 Line too long (102 > 88)
end (int | None, optional): The end index of the suffix in the text. Defaults to None.

Check failure on line 19 in data_structures/suffix_tree/suffix_tree_node.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (E501)

data_structures/suffix_tree/suffix_tree_node.py:19:89: E501 Line too long (98 > 88)
suffix_link (SuffixTreeNode | None, optional): Link to another suffix tree node. Defaults to None.
"""
self.children = children or {}
self.is_end_of_string = is_end_of_string
self.start = start
self.end = end
self.suffix_link = suffix_link
Empty file.
42 changes: 42 additions & 0 deletions data_structures/suffix_tree/tests/test_suffix_tree.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
import unittest
from data_structures.suffix_tree.suffix_tree import SuffixTree


class TestSuffixTree(unittest.TestCase):
def setUp(self) -> None:
"""Set up the initial conditions for each test."""
self.text = "banana"
self.suffix_tree = SuffixTree(self.text)

def test_search_existing_patterns(self):
"""Test searching for patterns that exist in the suffix tree."""
patterns = ["ana", "ban", "na"]
for pattern in patterns:
with self.subTest(pattern = pattern):
self.assertTrue(self.suffix_tree.search(pattern), f"Pattern '{pattern}' should be found.")

def test_search_non_existing_patterns(self):
"""Test searching for patterns that do not exist in the suffix tree."""
patterns = ["xyz", "apple", "cat"]
for pattern in patterns:
with self.subTest(pattern = pattern):
self.assertFalse(self.suffix_tree.search(pattern), f"Pattern '{pattern}' should not be found.")

def test_search_empty_pattern(self):
"""Test searching for an empty pattern."""
self.assertTrue(self.suffix_tree.search(""), "An empty pattern should be found.")

def test_search_full_text(self):
"""Test searching for the full text."""
self.assertTrue(self.suffix_tree.search(self.text), "The full text should be found in the suffix tree.")

def test_search_substrings(self):
"""Test searching for substrings of the full text."""
substrings = ["ban", "ana", "a", "na"]
for substring in substrings:
with self.subTest(substring = substring):
self.assertTrue(self.suffix_tree.search(substring), f"Substring '{substring}' should be found.")


if __name__ == "__main__":
unittest.main()