-
Notifications
You must be signed in to change notification settings - Fork 20k
Add AhoCorasick
#4465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Add AhoCorasick
#4465
Changes from 6 commits
Commits
Show all changes
45 commits
Select commit
Hold shift + click to select a range
8bacb96
Added code to find Articulation Points and Bridges
Prabhat-Kumar-42 478bf04
tried to solve clang-formant test
Prabhat-Kumar-42 8e88928
removed new line at EOF to get lint to pass
Prabhat-Kumar-42 db5d447
feature: Added Ahocorasick Algorithm
Prabhat-Kumar-42 2cc9e3f
fixed lint using clang-format
Prabhat-Kumar-42 d2acaf6
removed datastructures/graphs/ArticulationPointsAndBridge.java from t…
Prabhat-Kumar-42 37c92ad
removed main, since test-file is added. Also modified and renamed few…
Prabhat-Kumar-42 efd912d
Added test-file for AhoCorasick Algorithm
Prabhat-Kumar-42 194a37d
Modified some comments in test-file
Prabhat-Kumar-42 288168c
Modified some comments in AhoCorasick.java
Prabhat-Kumar-42 6c4e2c2
lint fix
Prabhat-Kumar-42 ab22511
Merge branch 'master' into AhoCorasick
Prabhat-Kumar-42 af71647
added few more test cases
Prabhat-Kumar-42 96f6231
Merge branch 'AhoCorasick' of https://github.com/Prabhat-Kumar-42/Jav…
Prabhat-Kumar-42 59dfa0e
Modified some comments
Prabhat-Kumar-42 b135163
Merge branch 'master' into AhoCorasick
Prabhat-Kumar-42 6a75149
Change all class fields to private, added initializeSuffixLinksForChi…
Prabhat-Kumar-42 dd50c78
Added Missing Test-Cases and more
Prabhat-Kumar-42 f464bf8
Merge branch 'AhoCorasick' of https://github.com/Prabhat-Kumar-42/Jav…
Prabhat-Kumar-42 23b63cb
minor text changes
Prabhat-Kumar-42 42b21da
added direct test check i.e. defining a variable expected and just ch…
Prabhat-Kumar-42 106dfac
Merge branch 'master' into AhoCorasick
Prabhat-Kumar-42 704b8e1
Merge branch 'master' into AhoCorasick
Prabhat-Kumar-42 a28df8c
Created New Class Trie, merged 'buildTrie and buildSuffixAndOutputLin…
Prabhat-Kumar-42 b7cc61a
Updated TestFile according to the updated AhoCorasick Class. Added Fe…
Prabhat-Kumar-42 a164aab
Merge branch 'AhoCorasick' of https://github.com/Prabhat-Kumar-42/Jav…
Prabhat-Kumar-42 f5defac
Merge branch 'master' into AhoCorasick
Prabhat-Kumar-42 a1cbda6
updated - broken down constructor to relavent parts, made string fina…
Prabhat-Kumar-42 f60392b
lint fix clang
Prabhat-Kumar-42 62dfb86
Updated Tests Files
Prabhat-Kumar-42 1528cbf
Merge branch 'AhoCorasick' of https://github.com/Prabhat-Kumar-42/Jav…
Prabhat-Kumar-42 8b9a831
Merge branch 'master' into AhoCorasick
Prabhat-Kumar-42 69f3aa8
Added final field to Node class setters and Trie Constructor argument…
Prabhat-Kumar-42 fc9ca11
updated test file
Prabhat-Kumar-42 846bbda
lint fix clang
Prabhat-Kumar-42 82586b4
Merge branch 'AhoCorasick' of https://github.com/Prabhat-Kumar-42/Jav…
Prabhat-Kumar-42 6f81843
minor chage - 'removed a comment'
Prabhat-Kumar-42 4fa8df1
added final fields to some arguments, class and variables, added a me…
Prabhat-Kumar-42 4a53103
updated to remove * inclusion and added the required modules only
Prabhat-Kumar-42 653db0b
Implemented a new class PatternPositionRecorder to wrap up the positi…
Prabhat-Kumar-42 a2ec697
Merge branch 'master' into AhoCorasick
Prabhat-Kumar-42 b409f7a
Added final fields to PatternPositionRecorder Class
Prabhat-Kumar-42 eb1c369
Merge branch 'AhoCorasick' of https://github.com/Prabhat-Kumar-42/Jav…
Prabhat-Kumar-42 b38f067
style: mark default constructor of `AhoCorasick` as `private`
vil02 c7c743b
style: remoce redundant `public`
vil02 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
187 changes: 187 additions & 0 deletions
187
src/main/java/com/thealgorithms/strings/AhoCorasick.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,187 @@ | ||
package com.thealgorithms.strings; | ||
import java.util.*; | ||
|
||
/** | ||
* Aho-Corasick String Matching Algorithm Implementation | ||
* | ||
* This code implements the Aho-Corasick algorithm, which is used for efficient | ||
* string matching in a given text. It can find multiple patterns simultaneously | ||
* and records their positions in the text. | ||
* | ||
* Author: Prabhat-Kumar-42 | ||
* GitHub: https://github.com/Prabhat-Kumar-42 | ||
*/ | ||
public class AhoCorasick { | ||
vil02 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
class Node { | ||
HashMap<Character, Node> child = new HashMap<>(); | ||
Node suffix_link; | ||
Node output_link; | ||
int pattern_ind; | ||
Prabhat-Kumar-42 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Node() { | ||
this.suffix_link = null; | ||
this.output_link = null; | ||
this.pattern_ind = -1; | ||
} | ||
} | ||
|
||
Node root = null; // The root node of the Aho-Corasick trie | ||
Prabhat-Kumar-42 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
private ArrayList<ArrayList<Integer>> res; // Stores the positions where patterns are found in the text | ||
Prabhat-Kumar-42 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
// Clears the Aho-Corasick data structures | ||
public void clear() { | ||
root = null; | ||
if (res != null) { | ||
res.clear(); | ||
} | ||
} | ||
Prabhat-Kumar-42 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
// Builds the Aho-Corasick trie from a set of input patterns | ||
public void buildTrie(String[] patterns) { | ||
root = new Node(); // Initialize the root of the trie | ||
res = new ArrayList<>(patterns.length); // Initialize the result data structure | ||
|
||
// Loop through each input pattern | ||
for (int i = 0; i < patterns.length; i++) { | ||
res.add(new ArrayList<>()); // Initialize a list to store positions of the current pattern | ||
|
||
Node curr = root; // Start at the root of the trie for each pattern | ||
|
||
// Loop through each character in the current pattern | ||
for (int j = 0; j < patterns[i].length(); j++) { | ||
char c = patterns[i].charAt(j); // Get the current character | ||
|
||
// Check if the current node has a child for the current character | ||
if (curr.child.containsKey(c)) { | ||
curr = curr.child.get(c); // Update the current node to the child node | ||
} else { | ||
// If no child node exists, create a new one and add it to the current node's children | ||
Node nn = new Node(); | ||
curr.child.put(c, nn); | ||
curr = nn; // Update the current node to the new child node | ||
} | ||
} | ||
curr.pattern_ind = i; // Store the index of the pattern in the current leaf node | ||
} | ||
} | ||
|
||
// Builds the suffix links and output links in the Aho-Corasick trie | ||
public void buildSuffixAndOutputLinks() { | ||
root.suffix_link = root; // Initialize the suffix link of the root to itself | ||
Queue<Node> q = new LinkedList<>(); // Initialize a queue for BFS traversal | ||
|
||
// Initialize suffix links for child nodes of the root | ||
for (char rc : root.child.keySet()) { | ||
q.add(root.child.get(rc)); // Add child node to the queue | ||
root.child.get(rc).suffix_link = root; // Set suffix link to the root | ||
} | ||
Prabhat-Kumar-42 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
while (!q.isEmpty()) { | ||
Node currentState = q.poll(); // Get the current node for processing | ||
|
||
// Iterate through child nodes of the current node | ||
for (char cc : currentState.child.keySet()) { | ||
Node currentChild = currentState.child.get(cc); // Get the child node | ||
Node parentSuffix = currentState.suffix_link; // Get the parent's suffix link | ||
|
||
// Calculate the suffix link for the child based on parent's suffix link | ||
while (!parentSuffix.child.containsKey(cc) && parentSuffix != root) { | ||
Prabhat-Kumar-42 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
parentSuffix = parentSuffix.suffix_link; | ||
} | ||
|
||
// Set the calculated suffix link or default to root | ||
if (parentSuffix.child.containsKey(cc)) { | ||
currentChild.suffix_link = parentSuffix.child.get(cc); | ||
} else { | ||
currentChild.suffix_link = root; | ||
Prabhat-Kumar-42 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
|
||
q.add(currentChild); // Add the child node to the queue for further processing | ||
} | ||
|
||
// Establish output links for nodes to efficiently identify patterns within patterns | ||
if (currentState.suffix_link.pattern_ind >= 0) { | ||
currentState.output_link = currentState.suffix_link; | ||
} else { | ||
currentState.output_link = currentState.suffix_link.output_link; | ||
} | ||
} | ||
} | ||
|
||
// Searches for patterns in the input text and records their positions | ||
public ArrayList<ArrayList<Integer>> search(String text) { | ||
Node parent = root; // Start searching from the root node | ||
for (int i = 0; i < text.length(); i++) { | ||
char ch = text.charAt(i); // Get the current character in the text | ||
|
||
// Check if the current node has a child for the current character | ||
if (parent.child.containsKey(ch)) { | ||
parent = parent.child.get(ch); // Update the current node to the child node | ||
|
||
// If the current node represents a pattern, record its position in res | ||
if (parent.pattern_ind > -1) { | ||
res.get(parent.pattern_ind).add(i); | ||
} | ||
|
||
Node output_link = parent.output_link; | ||
// Follow output links to find and record positions of other patterns | ||
while (output_link != null) { | ||
res.get(output_link.pattern_ind).add(i); | ||
output_link = output_link.output_link; | ||
} | ||
} else { | ||
// If no child node exists for the character, backtrack using suffix links | ||
while (parent != root && !parent.child.containsKey(ch)) { | ||
parent = parent.suffix_link; | ||
} | ||
if (parent.child.containsKey(ch)) { | ||
Prabhat-Kumar-42 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
i--; // Decrement i to reprocess the same character | ||
} | ||
} | ||
} | ||
return res; // Return the positions of patterns found in the text | ||
} | ||
|
||
// Returns the count of occurrences of each pattern in the text | ||
public ArrayList<Integer> getRepeatCountOfWords() { | ||
ArrayList<Integer> countOfWords = new ArrayList<>(); | ||
for (int i = 0; i < res.size(); i++) { | ||
countOfWords.add(res.get(i).size()); | ||
} | ||
return countOfWords; | ||
} | ||
|
||
public static void main(String[] args) { | ||
String[] patterns = {"ACC", "ATC", "CAT", "GCG", "C", "T"}; | ||
String text = "GCATCG"; | ||
|
||
AhoCorasick obj = new AhoCorasick(); | ||
obj.buildTrie(patterns); | ||
obj.buildSuffixAndOutputLinks(); | ||
|
||
ArrayList<ArrayList<Integer>> res = obj.search(text); | ||
ArrayList<Integer> countOfWords = obj.getRepeatCountOfWords(); | ||
|
||
System.out.println("Using Zero Based Indexing"); | ||
System.out.println("Dictonary is : "); | ||
for (int i = 0; i < patterns.length; i++) { | ||
System.out.println(i + ". " + patterns[i]); | ||
} | ||
System.out.println(); | ||
System.out.println("Given text is : " + text); | ||
System.out.println(); | ||
System.out.println("-1 represents word is not in the given string"); | ||
for (int i = 0; i < patterns.length; i++) { | ||
System.out.print(patterns[i] + " appeared " + countOfWords.get(i) + " times at indices : "); | ||
if (res.get(i).isEmpty()) { | ||
System.out.print(-1 + " "); | ||
} else { | ||
for (int endpoint : res.get(i)) { | ||
System.out.print((endpoint - patterns[i].length() + 1) + " "); | ||
} | ||
} | ||
System.out.println(); | ||
} | ||
} | ||
vil02 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.