Construct a String from another String using Suffix Trie
source link: https://www.geeksforgeeks.org/construct-a-string-from-another-string-using-suffix-trie/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
A suffix tree is a data structure based on trie compression that stores all the suffixes of a given string. Apart from detecting patterns in strings, it has a range of applications in fields like bioinformatics, string algorithms, and data compression.
Features of Suffix Trie:
- A Suffix Trie, commonly referred to as a suffix tree, is a data structure that resembles a tree and is used to store and look for patterns in strings.
- Each route in a suffix trie represents a particular suffix, and it keeps all the suffixes of a given string as pathways in a tree.
- We commence with a blank tree and add each suffix of the string to a tree to generate a suffix trie for a text sequence.
- The empty string would serve as the root node of the output tree, and then each leaf node will symbolize a suffix of the input string.
- A frequent substring that appears in at least two of the suffixes is represented by each internal node.
- The ability to quickly find substrings inside a text is one of the key benefits of employing a suffix trie.
- By moving down the tree along the route that the pattern specifies, we may search for a pattern in the suffix trie given a pattern.
- We shall arrive at a leaf node that represents the suffix that begins with the pattern if the pattern is found in the string.
Examples:
Input: str1 = “programming” str2 = “gaming”
Output: [(3, 3), (5, 6), (8, 10)]Explanation: In this solution, we construct the suffix trie for the string “str1”. Then, for each substring of “str2”, we check if it exists in the suffix trie of “str1”. If it exists, we record the starting and ending indices of the substring in “str1” that form the given substring of “str2”.
In the example given, the first substring of “str2” is “g”. We search for this substring in the suffix trie of “str1” and find it at index 3. Therefore, we record the starting and ending indices of this substring in “str1” as (3, 3). Similarly, we find that the substring “am” in “str2” can be constructed from the suffix trie of “str1” using indices (5, 6) in “str1”. Finally, we find that the substring “ing” in “str2” can be constructed from the suffix trie of “str1” using indices (8, 10) in “str1”.
Therefore, the output of the program is [(3, 3), (5, 6), (8, 10)], which represents the starting and ending indices of each substring of “str1” that can be used to construct the corresponding substring in “str2”.
Input: str1 = “banana” str2 = “ana”
Output: [(1, 3)]Explanation: A suffix trie is a data structure that stores all the suffixes of a given string in a tree-like structure. To construct str2 from str1 using a suffix trie, we first build a suffix trie for str1. Then, we search for str2 in the suffix trie by traversing down the tree, following the edges labeled with the characters of str2.
To construct “ana” from “banana”, we start at the root of the suffix trie and follow the edges labeled “a”, “n”, and “a”, respectively, until we reach the end of the string. The indices of the characters we traverse are (1, 3), which correspond to the substring “ana” in str1.
Approach: This can be solved with the following idea:
Step 1: Create a Suffix Trie for the Original String
- The first step is to construct a trie data structure that represents all the suffixes of the original string. This data structure is called a suffix trie and can be constructed using any standard algorithm.
Step 2: Identify Suffixes Beginning with the Initial Substring
- After constructing the suffix trie, the next step is to locate all the suffixes that start with the initial substring of interest. This can be achieved by traversing the trie from the root to the leaf node that corresponds to the initial substring. By following the edges that match the characters of the initial substring, we can identify all the suffixes that begin with it.
Step 3: Determine the Longest Common Prefix (LCP) of the Suffixes
- Once we have identified all the suffixes that begin with the initial substring, we need to determine their LCP. To accomplish this, we must identify the lowest common ancestor of the leaf nodes that correspond to the suffixes. The LCA represents the longest common prefix of the suffixes.
Step 4: Add the LCP to the Output String
- After determining the LCP of the suffixes, we can add it to the output string.
Step 5: Repeat for Additional Substrings
- To find the LCP for every additional substring, we repeat steps 2-4, beginning at the end of the previous substring. We identify all the suffixes that begin with the additional substring, determine their LCP, and add it to the output string.
Below is the code for the above approach:
- Python3
# Implementing Trie using Trie and # TrieNode classes """ The Trie_Node class is defined with a __init__ method that creates a list of None values with length 26 to represent children nodes and a Boolean variable isEndOfWord to mark the end of a word in the trie.""" class Trie_Node: # Trie node class def __init__(self): self.children = [None]*26 # Property to represent end # of a word in trie self.isEndOfWord = False """The Trie class is defined with a __init__ method that creates a root node using the getNode method and a _charToIndex private helper method to convert a character to its index in the children list.""" class Trie(Trie_Node): # Trie data structure class def __init__(self): self.root = self.getNode() def getNode(self): # Returns new trie node with # Null values return Trie_Node() def _charToIndex(self, ch): # Private helper function return ord(ch)-ord('a') """The insert method is defined to insert a new key (a string) into the trie. It iterates over each character of the key and checks if the character is already present in the trie. If it's not present, it creates a new node and adds it to the children list of the current node. The method marks the last node as the end of the word.""" def insert(self, key): # When word is already # present in trie word = self.root length = len(key) for level in range(length): index = self._charToIndex(key[level]) # If character is not present # in trie if not word.children[index]: word.children[index] = self.getNode() word = word.children[index] word.isEndOfWord = True """The search method is defined to search for a key (a string) in the trie. It iterates over each character of the key and checks if the character is present in the trie. If it's not present, the method returns False. if the method reaches the end of the key and the last node is marked as the end of the word, the method returns True.""" def search(self, key): # Search substring in the trie word = self.root length = len(key) level = 0 while level < length: index = self._charToIndex(key[level]) if not word.children[index]: return False word = word.children[index] level += 1 if level == length: return True else: return False """The build_via_substrings method is defined to build a suffix trie for a given input string P and search for all substrings of another input string Q in the trie.""" def build_via_substrings(P, Q): # handling when length of S is 1 if len(P) == 1: for i in range(len(Q)): if Q[i] != P: return False return [(0, 0)]*len(Q) else: # creating suffix trie x = Trie() for i in range(len(P)): x.insert(P[i:]) start_pos = 0 substrings = [] y = True k = 1 # Search substrings in trie while k <= len(Q): y = x.search(Q[start_pos:k]) if y == False: # Unsuccessful search # for a single lettered # substring. if k == start_pos + 1: return False elif k != start_pos + 1: # When search fails # for a substring # greater than # length = 1 sub = Q[start_pos:k-1] lt = len(sub) m = P.find(sub) substrings.append((m, m + lt-1)) start_pos = k-1 k = k-1 y = True elif y == True and k == len(Q): # We check whether we # have reached the # last letter sub = Q[start_pos:] lt = len(sub) m = P.find(sub) substrings.append((m, m + lt-1)) k = k + 1 if y == True and substrings == []: return [(P.find(Q), len(Q)-1)] else: return substrings # Driver code if __name__ == "__main__": str1 = "ssrtssr" str2 = "rsstsr" # Function call ans = Trie.build_via_substrings(str1, str2) print(ans)
[(2, 2), (0, 1), (3, 4), (2, 2)]
Time Complexity: O(n2 + m)
Auxiliary Space: O(n*26)
Applications of Suffix Trie:
- Suffix trie is used to find all occurrences of a pattern in a given text by searching for all substrings of the pattern in the text in pattern matching algorithms.
- It is also used to assemble genome sequences from short DNA sequences by matching and aligning the short reads to the reference genome in bioinformatics.
- Widely used to check whether a word is spelled correctly by searching for all possible substrings of the input word in spell-checking software.
- It is preferably used to identify and optimize frequently used code patterns in compilers and code optimization tools.
- Suffix trie is also used in natural language processing applications to properly match and categorize words and phrases based on their morphological and syntactical properties.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK