Skip to content

Add AhoCorasick #4465

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 45 commits into from
Oct 8, 2023
Merged
Changes from 6 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
8bacb96
Added code to find Articulation Points and Bridges
Prabhat-Kumar-42 Sep 30, 2023
478bf04
tried to solve clang-formant test
Prabhat-Kumar-42 Sep 30, 2023
8e88928
removed new line at EOF to get lint to pass
Prabhat-Kumar-42 Sep 30, 2023
db5d447
feature: Added Ahocorasick Algorithm
Prabhat-Kumar-42 Sep 30, 2023
2cc9e3f
fixed lint using clang-format
Prabhat-Kumar-42 Sep 30, 2023
d2acaf6
removed datastructures/graphs/ArticulationPointsAndBridge.java from t…
Prabhat-Kumar-42 Oct 1, 2023
37c92ad
removed main, since test-file is added. Also modified and renamed few…
Prabhat-Kumar-42 Oct 1, 2023
efd912d
Added test-file for AhoCorasick Algorithm
Prabhat-Kumar-42 Oct 1, 2023
194a37d
Modified some comments in test-file
Prabhat-Kumar-42 Oct 1, 2023
288168c
Modified some comments in AhoCorasick.java
Prabhat-Kumar-42 Oct 1, 2023
6c4e2c2
lint fix
Prabhat-Kumar-42 Oct 1, 2023
ab22511
Merge branch 'master' into AhoCorasick
Prabhat-Kumar-42 Oct 1, 2023
af71647
added few more test cases
Prabhat-Kumar-42 Oct 1, 2023
96f6231
Merge branch 'AhoCorasick' of https://github.com/Prabhat-Kumar-42/Jav…
Prabhat-Kumar-42 Oct 1, 2023
59dfa0e
Modified some comments
Prabhat-Kumar-42 Oct 1, 2023
b135163
Merge branch 'master' into AhoCorasick
Prabhat-Kumar-42 Oct 1, 2023
6a75149
Change all class fields to private, added initializeSuffixLinksForChi…
Prabhat-Kumar-42 Oct 2, 2023
dd50c78
Added Missing Test-Cases and more
Prabhat-Kumar-42 Oct 2, 2023
f464bf8
Merge branch 'AhoCorasick' of https://github.com/Prabhat-Kumar-42/Jav…
Prabhat-Kumar-42 Oct 2, 2023
23b63cb
minor text changes
Prabhat-Kumar-42 Oct 2, 2023
42b21da
added direct test check i.e. defining a variable expected and just ch…
Prabhat-Kumar-42 Oct 2, 2023
106dfac
Merge branch 'master' into AhoCorasick
Prabhat-Kumar-42 Oct 5, 2023
704b8e1
Merge branch 'master' into AhoCorasick
Prabhat-Kumar-42 Oct 5, 2023
a28df8c
Created New Class Trie, merged 'buildTrie and buildSuffixAndOutputLin…
Prabhat-Kumar-42 Oct 6, 2023
b7cc61a
Updated TestFile according to the updated AhoCorasick Class. Added Fe…
Prabhat-Kumar-42 Oct 6, 2023
a164aab
Merge branch 'AhoCorasick' of https://github.com/Prabhat-Kumar-42/Jav…
Prabhat-Kumar-42 Oct 6, 2023
f5defac
Merge branch 'master' into AhoCorasick
Prabhat-Kumar-42 Oct 7, 2023
a1cbda6
updated - broken down constructor to relavent parts, made string fina…
Prabhat-Kumar-42 Oct 7, 2023
f60392b
lint fix clang
Prabhat-Kumar-42 Oct 7, 2023
62dfb86
Updated Tests Files
Prabhat-Kumar-42 Oct 7, 2023
1528cbf
Merge branch 'AhoCorasick' of https://github.com/Prabhat-Kumar-42/Jav…
Prabhat-Kumar-42 Oct 7, 2023
8b9a831
Merge branch 'master' into AhoCorasick
Prabhat-Kumar-42 Oct 7, 2023
69f3aa8
Added final field to Node class setters and Trie Constructor argument…
Prabhat-Kumar-42 Oct 8, 2023
fc9ca11
updated test file
Prabhat-Kumar-42 Oct 8, 2023
846bbda
lint fix clang
Prabhat-Kumar-42 Oct 8, 2023
82586b4
Merge branch 'AhoCorasick' of https://github.com/Prabhat-Kumar-42/Jav…
Prabhat-Kumar-42 Oct 8, 2023
6f81843
minor chage - 'removed a comment'
Prabhat-Kumar-42 Oct 8, 2023
4fa8df1
added final fields to some arguments, class and variables, added a me…
Prabhat-Kumar-42 Oct 8, 2023
4a53103
updated to remove * inclusion and added the required modules only
Prabhat-Kumar-42 Oct 8, 2023
653db0b
Implemented a new class PatternPositionRecorder to wrap up the positi…
Prabhat-Kumar-42 Oct 8, 2023
a2ec697
Merge branch 'master' into AhoCorasick
Prabhat-Kumar-42 Oct 8, 2023
b409f7a
Added final fields to PatternPositionRecorder Class
Prabhat-Kumar-42 Oct 8, 2023
eb1c369
Merge branch 'AhoCorasick' of https://github.com/Prabhat-Kumar-42/Jav…
Prabhat-Kumar-42 Oct 8, 2023
b38f067
style: mark default constructor of `AhoCorasick` as `private`
vil02 Oct 8, 2023
c7c743b
style: remoce redundant `public`
vil02 Oct 8, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
187 changes: 187 additions & 0 deletions src/main/java/com/thealgorithms/strings/AhoCorasick.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
package com.thealgorithms.strings;
import java.util.*;

/**
* Aho-Corasick String Matching Algorithm Implementation
*
* This code implements the Aho-Corasick algorithm, which is used for efficient
* string matching in a given text. It can find multiple patterns simultaneously
* and records their positions in the text.
*
* Author: Prabhat-Kumar-42
* GitHub: https://github.com/Prabhat-Kumar-42
*/
public class AhoCorasick {

class Node {
HashMap<Character, Node> child = new HashMap<>();
Node suffix_link;
Node output_link;
int pattern_ind;

Node() {
this.suffix_link = null;
this.output_link = null;
this.pattern_ind = -1;
}
}

Node root = null; // The root node of the Aho-Corasick trie
private ArrayList<ArrayList<Integer>> res; // Stores the positions where patterns are found in the text

// Clears the Aho-Corasick data structures
public void clear() {
root = null;
if (res != null) {
res.clear();
}
}

// Builds the Aho-Corasick trie from a set of input patterns
public void buildTrie(String[] patterns) {
root = new Node(); // Initialize the root of the trie
res = new ArrayList<>(patterns.length); // Initialize the result data structure

// Loop through each input pattern
for (int i = 0; i < patterns.length; i++) {
res.add(new ArrayList<>()); // Initialize a list to store positions of the current pattern

Node curr = root; // Start at the root of the trie for each pattern

// Loop through each character in the current pattern
for (int j = 0; j < patterns[i].length(); j++) {
char c = patterns[i].charAt(j); // Get the current character

// Check if the current node has a child for the current character
if (curr.child.containsKey(c)) {
curr = curr.child.get(c); // Update the current node to the child node
} else {
// If no child node exists, create a new one and add it to the current node's children
Node nn = new Node();
curr.child.put(c, nn);
curr = nn; // Update the current node to the new child node
}
}
curr.pattern_ind = i; // Store the index of the pattern in the current leaf node
}
}

// Builds the suffix links and output links in the Aho-Corasick trie
public void buildSuffixAndOutputLinks() {
root.suffix_link = root; // Initialize the suffix link of the root to itself
Queue<Node> q = new LinkedList<>(); // Initialize a queue for BFS traversal

// Initialize suffix links for child nodes of the root
for (char rc : root.child.keySet()) {
q.add(root.child.get(rc)); // Add child node to the queue
root.child.get(rc).suffix_link = root; // Set suffix link to the root
}

while (!q.isEmpty()) {
Node currentState = q.poll(); // Get the current node for processing

// Iterate through child nodes of the current node
for (char cc : currentState.child.keySet()) {
Node currentChild = currentState.child.get(cc); // Get the child node
Node parentSuffix = currentState.suffix_link; // Get the parent's suffix link

// Calculate the suffix link for the child based on parent's suffix link
while (!parentSuffix.child.containsKey(cc) && parentSuffix != root) {
parentSuffix = parentSuffix.suffix_link;
}

// Set the calculated suffix link or default to root
if (parentSuffix.child.containsKey(cc)) {
currentChild.suffix_link = parentSuffix.child.get(cc);
} else {
currentChild.suffix_link = root;
}

q.add(currentChild); // Add the child node to the queue for further processing
}

// Establish output links for nodes to efficiently identify patterns within patterns
if (currentState.suffix_link.pattern_ind >= 0) {
currentState.output_link = currentState.suffix_link;
} else {
currentState.output_link = currentState.suffix_link.output_link;
}
}
}

// Searches for patterns in the input text and records their positions
public ArrayList<ArrayList<Integer>> search(String text) {
Node parent = root; // Start searching from the root node
for (int i = 0; i < text.length(); i++) {
char ch = text.charAt(i); // Get the current character in the text

// Check if the current node has a child for the current character
if (parent.child.containsKey(ch)) {
parent = parent.child.get(ch); // Update the current node to the child node

// If the current node represents a pattern, record its position in res
if (parent.pattern_ind > -1) {
res.get(parent.pattern_ind).add(i);
}

Node output_link = parent.output_link;
// Follow output links to find and record positions of other patterns
while (output_link != null) {
res.get(output_link.pattern_ind).add(i);
output_link = output_link.output_link;
}
} else {
// If no child node exists for the character, backtrack using suffix links
while (parent != root && !parent.child.containsKey(ch)) {
parent = parent.suffix_link;
}
if (parent.child.containsKey(ch)) {
i--; // Decrement i to reprocess the same character
}
}
}
return res; // Return the positions of patterns found in the text
}

// Returns the count of occurrences of each pattern in the text
public ArrayList<Integer> getRepeatCountOfWords() {
ArrayList<Integer> countOfWords = new ArrayList<>();
for (int i = 0; i < res.size(); i++) {
countOfWords.add(res.get(i).size());
}
return countOfWords;
}

public static void main(String[] args) {
String[] patterns = {"ACC", "ATC", "CAT", "GCG", "C", "T"};
String text = "GCATCG";

AhoCorasick obj = new AhoCorasick();
obj.buildTrie(patterns);
obj.buildSuffixAndOutputLinks();

ArrayList<ArrayList<Integer>> res = obj.search(text);
ArrayList<Integer> countOfWords = obj.getRepeatCountOfWords();

System.out.println("Using Zero Based Indexing");
System.out.println("Dictonary is : ");
for (int i = 0; i < patterns.length; i++) {
System.out.println(i + ". " + patterns[i]);
}
System.out.println();
System.out.println("Given text is : " + text);
System.out.println();
System.out.println("-1 represents word is not in the given string");
for (int i = 0; i < patterns.length; i++) {
System.out.print(patterns[i] + " appeared " + countOfWords.get(i) + " times at indices : ");
if (res.get(i).isEmpty()) {
System.out.print(-1 + " ");
} else {
for (int endpoint : res.get(i)) {
System.out.print((endpoint - patterns[i].length() + 1) + " ");
}
}
System.out.println();
}
}
}