Topic 8: Tries
Table of Contents
Canonical Question Types #
Trie Implementation & Basic Operations #
- This is just about basic trie data structure operations
- Insert, search, and delete words in trie
Problems #
- Implement Trie (208)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34class TrieNode: def __init__(self): self.children = {} # remember it's an edge-check we aren't really interested in storing a collection of nodes. self.terminal = False # or we can directly keep the word here to avoid needing to make a root to node walk for the cumulative value. class Trie: def __init__(self): self.root = TrieNode() def insert(self, word): node = self.root for ch in word: if ch not in node.children: node.children[ch] = TrieNode() node = node.children[ch] node.terminal = True def search(self, word): node = self._traverse(word) return node is not None and node.terminal def startsWith(self, prefix): return self._traverse(prefix) is not None def _traverse(self, string): """ This is a root to node path-walking. """ node = self.root for ch in string: if ch not in node.children: return None node = node.children[ch] return node
Wildcard Matching / Add & Search Word #
- In general, this will affect how we do searching but the same prefix tree is what gets used.
- Search with wildcards (’.’) and pattern matching
Problems #
We can see this as a DFS with wildcard matching helping us out for the candidate selection. It’s not that complicated if we calm down about it.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30from collections import defaultdict class WordDictionary: def __init__(self): self.root = defaultdict(dict) self.terminal = "#" def addWord(self, word: str) -> None: node = self.root for ch in word: if ch not in node: node[ch] = defaultdict(dict) node = node[ch] node[self.terminal] = True def search(self, word: str) -> bool: def dfs(i, node): if i == len(word): return self.terminal in node ch = word[i] if ch == ".": return any( dfs(i + 1, next_node) for key, next_node in node.items() if key != self.terminal ) if ch not in node: return False return dfs(i + 1, node[ch]) return dfs(0, self.root)Alternatively, I had this approach, though I think the DFS approach is better because there’s no need to declare a custom TrieNode class of our own here.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43class Node: def __init__(self, val=None, terminal=False): self.children = dict() self.val = val self.terminal = terminal class WordDictionary: def __init__(self): self.root = Node() self.wild = "." def addWord(self, word: str) -> None: ptr = self.root for char in word: if char not in ptr.children: ptr.children[char] = Node(val=char) ptr = ptr.children[char] # terminate: ptr.terminal = True return def search(self, word: str) -> bool: options = [self.root] for idx, char in enumerate(word): next_options = [] for node in options: if char == self.wild: # Try all possible children next_options.extend(node.children.values()) else: if char in node.children: next_options.append(node.children[char]) # No valid paths if not next_options: return False options = next_options # After processing all characters, check terminal flag return any(node.terminal for node in options)
Multiword Search (Word Search with Trie) #
here the objective is to use a try for a better representation of the search space. We can use tries as supporting, aux datastructures to optimise other problems.
In these cases, we should remember to consider “turning off” a terminal node once found to avoid infinite recursion / double counting cases.
- Use trie to speed up multiple word searches in 2D grids
Problems #
This requires us to find so many words, we want to prune our actions by realising that we can avoid the repeated searches for the common prefixes over and over. So instead of a list, we preprocess as a trie and direct our efforts better. The rest is just DFS, guided by the existence in the trie.
We also see another optimisation where we can “prune the trie” after finding out a word, we need to effectively off it so that we don’t double count it again.
Also, we can ignore more dead-ends by removing the node if the child has no children.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54from collections import defaultdict class Node: def __init__(self): self.children = defaultdict(Node) self.word = None class Solution: def findWords(self, board: List[List[str]], words: List[str]) -> List[str]: ROWS, COLS = len(board), len(board[0]) DIRS = [(1, 0), (-1, 0), (0, 1), (0, -1)] VISITED = "*" res = [] # we build the trie: root = Node() for w in words: ptr = root for c in w: ptr = ptr.children[c] ptr.word = w def backtrack(r, c, node): # this is more of a DFS rather than "backtracking" cell = board[r][c] if cell not in node.children: return child = node.children[cell] if child.word: res.append(child.word) child.word = None # set it off # OPTIMISATION: remove if the child has no children, which prunes dead-ends: if not child.children: node.children.pop(cell) board[r][c] = VISITED for row, col in ((r + dr, c + dc) for dr, dc in DIRS if ( 0 <= r + dr < ROWS and 0 <= c + dc < COLS and board[r + dr][c + dc] != VISITED )): # choose it: backtrack(row, col, child) board[r][c] = cell return for row, col in ((r, c) for r in range(ROWS) for c in range(COLS)): backtrack(row, col, root) return res
Autocomplete & Prefix Counting #
- Count words with given prefix, autocomplete suggestions (extension)
Problems #
- Prefix counting and autocomplete implementations
- TODO Search suggestion systems (1268)
- TODO Design Search Autocomplete System (642) (paid qns)
Longest Common Prefix & Suffix Queries #
- Use tries (possibly reversed) to find longest common prefixes or suffixes among a set of strings efficiently.
Problems #
Longest Common Prefix (classic variations)
My first-reach solution was to do a try-except as part of the main logic flow, which is kind of unideal.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15class Solution: def longestCommonPrefix(self, strs: List[str]) -> str: if any(not s for s in strs): return "" idx = 0 while True: try: if len(set(s[idx] for s in strs)) == 1: idx += 1 else: return strs[0][:idx] except: return strs[0][:idx]Should try not to use try-excepts as part of the primary logic.
In this case, we could have implemented a vertical scanning approach like so:
1 2 3 4 5 6 7 8 9 10class Solution: def longestCommonPrefix(self, strs: List[str]) -> str: if not strs: return "" for i in range(len(strs[0])): char = strs[0][i] for s in strs[1:]: if i == len(s) or s[i] != char: return strs[0][:i] return strs[0]As trie-based solution is a little overkill, but here’s the solution for didactic purposes:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38class TrieNode: def __init__(self): self.children = {} self.is_end_of_word = False class Trie: def __init__(self): self.root = TrieNode() def insert(self, word): node = self.root for char in word: if char not in node.children: node.children[char] = TrieNode() node = node.children[char] node.is_end_of_word = True def longest_common_prefix(self): prefix = [] node = self.root # Traverse while node has exactly one child and is not end of word while node and not node.is_end_of_word and len(node.children) == 1: char = next(iter(node.children)) # get the single child key prefix.append(char) node = node.children[char] return "".join(prefix) class Solution: def longestCommonPrefix(self, strs: list[str]) -> str: if not strs: return "" trie = Trie() for word in strs: if word == "": return "" trie.insert(word) return trie.longest_common_prefix()
TODO short encoding of words (820) Shortest Unique Prefix for every word 820
Distinct Substrings / Subsequence Count #
- Use tries (or suffix tries/trees) to count distinct substrings or perform substring query-related tasks.
Problems #
- Count distinct substrings of a string
- Number of different sub-strings (variations in suffix trie)
IP Routing / Prefix Matching (Specialized Tries) #
- Tries are used to implement longest prefix matching in networking (e.g., Routing tables).
Problems #
- Not common on Leetcode, but classic applications in system design.
Compression & Dictionary Encoding #
- Tries can be used for compression tasks like dictionary compression or encoding sets of strings.
Problems #
- Remove Duplicate Letters (lex order variant that uses trie conceptually)
- Optimal prefix codes (Huffman coding, conceptually related)
- TODO String Compression (443)
⭐️ Bitwise or Binary Tries (for Integers) #
- Binary tries store integers bitwise to answer max XOR queries, or minimum/maximum XOR computations.
Problems #
- TODO Maximum XOR of Two Numbers in an Array (421)
- TODO HARD Maximum XOR With an Element From Array (1707)
Dynamic Updates & Modification in Trie #
- Insertions and deletions in dynamic sets of strings with efficient query support.
Problems #
- Online dictionary updates with prefix queries
- Implement data structure supporting add and remove of strings with prefix queries
SOE #
- Mental Model issues:
- seeing tries as a collection of children nodes instead of a map of maps that holds edges (correct)
- Implementation:
- NOT making the most out of using map of maps for this \(\implies\) easiest to just use nested maps here
- Forgetting to mark end-of-word in nodes
- Overusing memory leading to blowup (high branching factor)
- Poor handling of edge cases such as empty strings or duplicate inserts
Decision Flow #
- Need prefix query or inserts? → Trie basic operations
- Wildcard or pattern search? → DFS combined with trie
- Multiple word queries over grids? → Trie + backtracking
- Large dictionaries? → Optimize with compressed tries or suffix trees
Style, Tricks, Boilerplate #
Style #
- I prefer using a dict instead of children array for this, the key itself can be the edge label and the value can be the TrieNode.
- encode the terminus correctly, a flag or the actual work works well. If actual word, we can just get it directly without having to do the whole root to node walk.
Trick #
- for tries being used as a input space, remember to prune when necessary e.g. in the Word Search II problem.
Boilerplate #
Classic trie implementation. Remember that it’s going to end up
| |
- Also realise that if we don’t need to have a dedicated class for this, we could have chosen to just use a deeply-nested dictionary for this instead of a dedicate TrieNode class.