Topic 11: Graphs
Table of Contents
Graph algorithms are largely categorized into traversal, shortest paths, connectivity components, and cycle/dependency detection.
Relations, when well represented as graphs can be exploited to solve some common problems.
Canonical Question Types #
Graph Traversal (DFS & BFS) #
- Explore graph components, find paths, count connected or island areas.
Traversal could require some basic statistics / counting or something.
- Flood filling is a great mental model to keep in mind for this.
- Keep in mind which direction to traverse, sometimes the easier way is to do a reverse-reachability approach.
Problems #
Number of Islands (200) ;; DONE
It’s a flood fill from all possible islands and trying to reach the others. As a memory hack, just reuse the input for visited.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53# BFS approach to flood filling from collections import deque class Solution: def numIslands(self, grid: List[List[str]]) -> int: ROWS, COLS = len(grid), len(grid[0]) DIRS = [(-1, 0), (1, 0), (0, 1), (0, -1)] if not ROWS or not COLS: return 0 count = 0 # space trick: reuse input grid for visited tracking # helper BFS: def bfs(r, c): queue = deque([(r,c)]) grid[r][c] = '0' # mark first as visited while queue: row, col = queue.popleft() for dr, dc in DIRS: nr, nc = row + dr, col + dc if is_valid_nei:=(0 <= nr < ROWS and 0 <= nc < COLS and grid[nr][nc] == '1'): grid[nr][nc] = '0' # preemptively mark to reduce search spaces, before adding to queue queue.append((nr, nc)) return for r in range(ROWS): for c in range(COLS): if is_island:=(grid[r][c] == '1'): bfs(r, c) count += 1 return count # DFS approach to flood filling: this is a recursive DFS that we've done here class Solution: def numIslands(self, grid: List[List[str]]) -> int: ROWS, COLS = len(grid), len(grid[0]) def dfs(r, c): if not (0 <= r < ROWS and 0 <= c < COLS) or grid[r][c] == '0': return grid[r][c] = '0' for dr, dc in [ (0,1), (0,-1), (1,0), (-1,0) ]: dfs(r+dr, c+dc) count = 0 for r in range(ROWS): for c in range(COLS): if grid[r][c] == '1': count += 1 dfs(r, c) return countThere’s a few viable alternatives to solving this, such as using a union find approach (note the use of a dict because we wanna track tuples (for the cell coordinate)). For this, note how we only move RIGHT/DOWN for each land cell because we want to process each pair of adjacent land cells (basically each edge) once. By restricting it to only right and down neighbours, each pair is processed exactly once from one side, preventing redundant union operations and improving efficiency.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40class Solution: def numIslands(self, grid: List[List[str]]) -> int: rows, cols = len(grid), len(grid[0]) parent = {} rank = {} def find(x): if parent[x] != x: parent[x] = find(parent[x]) # path compression return parent[x] def union(x, y): rootX, rootY = find(x), find(y) if rootX != rootY: if rank[rootX] > rank[rootY]: parent[rootY] = rootX elif rank[rootX] < rank[rootY]: parent[rootX] = rootY else: parent[rootY] = rootX rank[rootX] += 1 # Initialize parent and rank for land cells for r in range(rows): for c in range(cols): if grid[r][c] == '1': parent[(r, c)] = (r, c) rank[(r, c)] = 0 # Union adjacent land cells (right and down to avoid duplicates) for r in range(rows): for c in range(cols): if grid[r][c] == '1': for dr, dc in [(0, 1), (1, 0)]: nr, nc = r + dr, c + dc if 0 <= nr < rows and 0 <= nc < cols and grid[nr][nc] == '1': union((r, c), (nr, nc)) # Count distinct roots representing islands return len(set(find(x) for x in parent))We can choose to solve this in 2 ways: a union find approach or a bfs-based flood fill approach for connectivity tracking.
Here’s the union find approach:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37class Solution: def findCircleNum(self, isConnected: List[List[int]]) -> int: """ We have to just do union find and get the total number of unique parents / roots. isConnected is about direct connections. This means direct edges between them. We can get started. """ n = len(isConnected) parent = list(range(n)) rank = [0] * n def find(x): while parent[x] != x: parent[x] = parent[parent[x]] x = parent[x] return x def union(x, y): """ Only side-effects matter for us here. """ rx, ry = find(x), find(y) if rank[rx] < rank[ry]: parent[rx] = ry elif rank[ry] < rank[rx]: parent[ry] = rx else: parent[ry] = rx rank[rx] += 1 for a in range(n): for b in range(a + 1, n): if isConnected[a][b]: union(a, b) return len({find(x) for x in range(n)})Here’s a BFS approach, which is more performant:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22from collections import deque class Solution: def findCircleNum(self, isConnected: List[List[int]]) -> int: """ We can do a connectivity traversal using a BFS flood fill approach, then we can just count the number of islands / provinces. """ n = len(isConnected) provinces = 0 visited = [0] * n # 0: univisited, 1: visited # now we run the BFS originating from any unvisited: for i in range(n): if not visited[i]: provinces += 1 queue = deque([i]) visited[i] = 1 while queue: node = queue.popleft() for nei in (nei for nei in range(n) if isConnected[node][nei] and not visited[nei]): queue.append(nei) visited[nei] = 1 return provincesMax Area of Island (695) ;; DONE This is just accumulating statistics on a flood fill approach to traversing islands. There’s a bunch of ways we can do this: flood fill (BFS/DFS) or via a union find. BFS is my favourite in general.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83# M1: FLOOD FILL BFS from collections import deque class Solution: def maxAreaOfIsland(self, grid: List[List[int]]) -> int: if not grid or not grid[0]: return 0 area = 0 ROWS, COLS = len(grid), len(grid[0]) directions = [(0, 1), (0, -1), (1, 0), (-1, 0)] for r in range(ROWS): for c in range(COLS): if grid[r][c] == 1: # we start a BFS for more land-cells. curr_area = 0 # use as root and carry on: queue = deque([(r, c)]) grid[r][c] = 0 # space hack: use the input and mark as visited curr_area += 1 while queue: r, c = queue.popleft() valid_children = ((r + dr, c + dc) for dr, dc in directions if 0 <= r + dr < ROWS and 0 <= c + dc < COLS and grid[r + dr][c + dc] == 1 ) for child_row, child_col in valid_children: # update accumulators (curr_area, visited) before adding to the queue grid[child_row][child_col] = 0 curr_area += 1 # enqueue queue.append((child_row, child_col)) area = max(area, curr_area) # helps us find the biggest island in the grid return area # M2: Flood fill DFS class Solution: def maxAreaOfIsland(self, grid: List[List[int]]) -> int: max_area = 0 rows, cols = len(grid), len(grid[0]) for r in range(rows): for c in range(cols): if grid[r][c] == 1: stack = [(r, c)] grid[r][c] = 0 area = 1 # local area, for this island's measure while stack: cr, cc = stack.pop() for dr, dc in [(-1,0),(1,0),(0,-1),(0,1)]: nr, nc = cr+dr, cc+dc if 0 <= nr < rows and 0 <= nc < cols and grid[nr][nc] == 1: grid[nr][nc] = 0 stack.append((nr, nc)) area += 1 max_area = max(max_area, area) return max_area # M3: union find but it's complex and ill-fitted for this solution class DSU: def __init__(self): self.parent = {} self.size = {} # same concept as rank, we use a dict here because the keys will be coordinate tuples def find(self, x): if self.parent[x] != x: self.parent[x] = self.find(self.parent[x]) return self.parent[x] def union(self, x, y): """ Return value here doesn't matter, we care about the side-effect only. """ xr, yr = self.find(x), self.find(y) if xr != yr: self.parent[xr] = yr self.size[yr] += self.size[xr] def add(self, x): if x not in self.parent: self.parent[x] = x self.size[x] = 1 # Inside solution, union adjacent '1' cells as you read the grid, # then return max(dsu.size.values()) as the answer if any, else 0.This is pretty straightforward easy question, just need to know the early return case when the source is already coloured and there’s no need to colour anything (including neis)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21from collections import deque class Solution: def floodFill(self, image: List[List[int]], sr: int, sc: int, color: int) -> List[List[int]]: ROWS, COLS = len(image), len(image[0]) DIRS = [(0, 1), (0, -1), (-1, 0), (1, 0)] start_color = image[sr][sc] if start_color == color: return image queue = deque([(sr, sc)]) while queue: r, c = queue.popleft() image[r][c] = color for nr, nc in ((r + dr, c + dc) for dr, dc in DIRS if (0 <= (r + dr) < ROWS) and (0 <= (c + dc) < COLS)): if image[nr][nc] != start_color: continue image[nr][nc] = color queue.append((nr, nc)) return imageWalls and Gates (??) (neetcode version) ;; DONE
This is just distance tracking, we can go layer by layer and get things done. The aux ds (queue / stack) can hold tuples to keep track of parent height. Oh and it’s a multi-source BFS that happens at the same time.
This is a case of multi-source BFS really. Nothing special here, we just need to write it in-place and have nothing to return.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23from collections import deque class Solution: def islandsAndTreasure(self, grid: List[List[int]]) -> None: if not grid: return queue = deque() ROWS, COLS = len(grid), len(grid[0]) DIRS = [(0, -1), (0, 1), (1, 0), (-1, 0)] INF = 2147483647 # 1: add all gates to queue at distance 0 (multiple source BFS, simultaneously run them) for r, c in ((r, c) for r in range(ROWS) for c in range(COLS) if grid[r][c] == 0): # is treasure: queue.append((r, c)) while queue: r, c = queue.popleft() candidates = ((r + dr, c + dc) for dr, dc in DIRS if (0 <= r + dr < ROWS) and (0 <= c + dc < COLS) and grid[r + dr][c + dc] == INF) for row, col in candidates: grid[row][col] = grid[r][c] + 1 queue.append((row, col))Extension: if the cost of traversing an edge from node to its neighbors had a costs (positive) then we would have to consider doing some kinda of Dijkstra’s approach. In this case a simple BFS works perfectly.
Rotting Oranges (994) ;; DONE
This is a time-simulation of a flood-fill, where the current hop is the time elapsed.
This is straightforward, we want to do multisource traversal (from each rotten orange) as a way of doing time-simulation and so we just use a simple BFS traversal. This also just keeps an accumulator aux value (fresh) collected via a pre-processing step for our ease.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24from collections import deque class Solution: def orangesRotting(self, grid: List[List[int]]) -> int: ROWS, COLS = len(grid), len(grid[0]) queue = deque() fresh = 0 # accumulate starts points for the BFS start as well as the aux variable, fresh for r in range(ROWS): for c in range(COLS): if grid[r][c] == 2: queue.append((r, c, 0)) elif grid[r][c] == 1: fresh += 1 time = 0 while queue: r, c, t = queue.popleft() for dr, dc in [(-1,0),(1,0),(0,-1),(0,1)]: nr, nc = r+dr, c+dc if 0 <= nr < ROWS and 0 <= nc < COLS and grid[nr][nc] == 1: grid[nr][nc] = 2 fresh -= 1 queue.append((nr, nc, t+1)) time = t+1 return -1 if fresh > 0 else timePacific Atlantic Water Flow (417) ;; DONE
This is a good reminder that traversal direction should be something that we should take a moment to think about before we make a conclusion. Here, we realise that the better approach is to go from outwards to in (instead of the actual flow of the water). This is similar to the point where we should consider simpler, complementary approaches to the problems.
This is a reverse-reachability type. It’s also a multisource traversal!
Also remember to mark things as visited ASAP so that we don’t introduce redundancies in the search space.
from collections import deque class Solution: def pacificAtlantic(self, heights: List[List[int]]) -> List[List[int]]: # early returns for trivial edge case: if not heights or not heights[0]: return [] ROWS, COLS = len(heights), len(heights[0]) DIRS = [(-1,0), (1,0), (0,-1), (0,1)] def bfs(starts): visited = set(starts) queue = deque(starts) while queue: r, c = queue.popleft() # direct for loop is better (easier to read) than generator then for loop: for dr, dc in DIRS: nr, nc = r + dr, c + dc # is reachable coordinate: if ( 0 <= nr < ROWS and 0 <= nc < COLS and (nr, nc) not in visited and heights[nr][nc] >= heights[r][c] ): # mark as visited before enqueing! visited.add((nr, nc)) queue.append((nr, nc)) return visited pacific_starts = [(0, c) for c in range(COLS)] + [(r, 0) for r in range(ROWS)] atlantic_starts = [(ROWS-1, c) for c in range(COLS)] + [(r, COLS-1) for r in range(ROWS)] pacific_reach = bfs(pacific_starts) atlantic_reach = bfs(atlantic_starts) return [ [r, c] for r, c in (pacific_reach & atlantic_reach) ]Surrounded Regions (130) ;; DONE
This one is about creating temporary marks as we navigate then restoring thing as part of a formatting step.
This is also a reverse-reachability type of question because we actually find the inverse (safe cells) that we can reach from the border cells since these safe cells won’t be converted. This will help us find the actual unsafe cells that we will convert.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43from collections import deque from typing import List class Solution: def solve(self, board: List[List[str]]) -> None: if not board or not board[0]: return ROWS, COLS = len(board), len(board[0]) DIRECTIONS = [(-1, 0), (1, 0), (0, -1), (0, 1)] def get_border_o_positions(): """ Returns a set of coordinates for all the Os that are at the borders """ # All 'O's on the border (first/last row or column) top_bottom = ((row, col) for col in range(COLS) for row in (0, ROWS-1) if board[row][col] == 'O') left_right = ((row, col) for row in range(ROWS) for col in (0, COLS-1) if board[row][col] == 'O') return set(top_bottom) | set(left_right) def mark_border_connected_region_as_safe(): border_o_positions = get_border_o_positions() queue = deque() for r, c in border_o_positions: # mark as visited before enqueing the start coordinates: board[r][c] = '*' queue.append((r, c)) while queue: r, c = queue.popleft() for dr, dc in DIRECTIONS: nr, nc = r + dr, c + dc if (0 <= nr < ROWS and 0 <= nc < COLS and board[nr][nc] == 'O'): board[nr][nc] = '*' queue.append((nr, nc)) mark_border_connected_region_as_safe() # List comprehensions for postprocessing for r, c in ((r, c) for r in range(ROWS) for c in range(COLS)): if board[r][c] == 'O': board[r][c] = 'X' elif board[r][c] == '*': board[r][c] = 'O'Minimum Jumps to Reach End via Prime Teleportation (3629)
Main idea is that we need to do a bunch of preprocessing to make the BFS efficient. The idea therefore is to do both prime sieving AND keeping track of prime factors.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83from math import sqrt class Solution: def minJumps(self, nums: List[int]) -> int: n = len(nums) max_val = max(nums) def sieve(limit): """ Returns a list of prime numbers within limit (inclusive) """ is_prime = [True] * (limit + 1) # effectively makes this 1-indexed is_prime[0], is_prime[1] = False, False # remember we only need to do till sqrt of the limit for num in range(2, int(sqrt(limit) + 1)): if is_prime[num]: # negate its multiples: for j in range(num * num, limit + 1, num): is_prime[j] = False return is_prime is_prime = sieve(max_val) # is the reference prime-sieve def prime_factors(num): """ Returns a set of prime factors for a number: """ factors = set() temp = num # prime factorisation for i in range(2, int(sqrt(num)) + 1): if not is_prime[i]: continue while temp % i == 0: factors.add(i) temp //= i if temp == 1: break if temp > 1: # then temp is a prime too factors.add(temp) return factors # creates a map from prime factor to list of indices in nums with number divisible by that prime, these are the teleport indices prime_to_indices = defaultdict(list) for idx, num in enumerate(nums): factors = prime_factors(num) for f in factors: prime_to_indices[f].append(idx) # now we do the BFS with hop counting: visited = [False] * n visited[0] = True visited_prime = set() # track prime whose teleportations are visited q = deque([(0, 0)]) # (idx, hops) while q: idx, hops = q.popleft() # reached the end: if idx == n - 1: return hops # adjacent moves: for nei in [idx - 1, idx + 1]: if 0 <= nei < n and not visited[nei]: visited[nei] = True q.append((nei, hops + 1)) # teleports: num = nums[idx] if is_prime[num] and num not in visited_prime: visited_prime.add(num) for nei in prime_to_indices[num]: if not visited[nei]: visited[nei] = True q.append((nei, hops + 1)) return -1The main part is that if we have the possibilities lookup via something like this, then we can use it for our BFS and then the least hop reach to the final cell is our answer that we seek.
1 2 3 4 5 6# creates a map from prime factor to list of indices in nums with number divisible by that prime, these are the teleport indices prime_to_indices = defaultdict(list) for idx, num in enumerate(nums): factors = prime_factors(num) for f in factors: prime_to_indices[f].append(idx)Find if Path Exists in Graph (1971)
ended up being a basic traversal to check for reachability from source to destination, given edges.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23from collections import defaultdict, deque class Solution: def validPath(self, n: int, edges: List[List[int]], source: int, destination: int) -> bool: """ We could directly do a graph traversal. """ adj = defaultdict(list) for u, v in edges: adj[u].append(v) adj[v].append(u) visited = set() q = deque([source]) while q: node = q.popleft() if node == destination: return True for nei in adj[node]: if nei not in visited: visited.add(nei) q.append(nei) return False
Shortest Path (Unweighted Graph) #
- BFS-based shortest path finding in unweighted graphs (fixed cost). If there were costs involved then we could explore a Bellman-Ford / Dijkstra’s approach.
Problems #
This is a shortest path problem that feels like an edit distance kind, but it’s a graph traversal. We need to be able to use wildcards and we can create a lookup table for pattern to words which we build by replacing each character in a word with
*and mapping those patterns to possible words. Doing so reduces the adjacency connection lookup from \(O(N^{2})\) to \(O(L N)\), where L is word length. Also take note that we only want to reach each node (pattern) once in the overall procedure, so we silence it after we reach it.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34from collections import defaultdict, deque class Solution: def ladderLength(self, beginWord: str, endWord: str, wordList: List[str]) -> int: if endWord not in wordList: return 0 L = len(beginWord) # better mapping: pattern_to_words = defaultdict(list) for word in wordList: for i in range(L): pattern = word[:i] + "*" + word[i + 1:] pattern_to_words[pattern].append(word) # now we run BFS: queue = deque([(beginWord, 1)]) # word, level visited = {beginWord} while queue: word, level = queue.popleft() for i in range(L): pattern = word[:i] + "*" + word[i + 1:] for nei in pattern_to_words[pattern]: if nei == endWord: return level + 1 if nei not in visited: visited.add(nei) queue.append((nei, level + 1)) # NB: prevents repeat lookups, we silence that pattern check so that each pattern is used only once per search. pattern_to_words[pattern] = [] return 0In some ways, this question involves a graph expansion / modification as well. This is implicit in our use of
pattern_to_words.
Topological Sort & Cycle Detection #
- Detect cycles in directed or undirected graphs and generate valid node orderings in DAGs
Problems #
Course Schedule (207) ;; DONE
This is a simpler version of the problem that just expects a boolean return on whether we can do a topo sort (i.e. whether it’s an acyclic DAG). There’s 2 ways to do this: graph colouring (to ensure no cycles) or just straight up doing kahn’s algo:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65# M1: GRAPH COLORING DFS APPROACH (3-colour graph (0: unvisited: 1: visiting 2: visited)) class Solution: def canFinish(self, numCourses: int, prerequisites: List[List[int]]) -> bool: # list of lists, access using relative idx adj = [[] for _ in range(numCourses)] # careful: we are ultimately trying to clear the topo sort, so we are building edge from a -> b if a is a prereq of b for course, pre in prerequisites: adj[pre].append(course) # single visited with 3 states, space optimisation! visited = [0] * numCourses # 0 = unvisited, 1 = visiting, 2 = visited def dfs(node): if visited[node] == 1: # Cycle! return False if visited[node] == 2: # done state, can return early return True visited[node] = 1 for nei in adj[node]: if not dfs(nei): return False visited[node] = 2 return True for i in range(numCourses): if not dfs(i): return False return True # M2: Kahn's algo used to do cycle-checks: from collections import defaultdict, deque class Solution: def canFinish(self, numCourses: int, prerequisites: List[List[int]]) -> bool: # Build adjacency list and in-degree array adj = defaultdict(list) # in-degree: For each node, count how many edges point to it = how many prerequisites it has. in_degree = [0] * numCourses for course, pre in prerequisites: adj[pre].append(course) # prereq → course (edge creation) in_degree[course] += 1 # course needs one more prereq # 1. Find all courses with no prerequisites. Any element in the queue is "ready to be taken" queue = deque(i for i in range(numCourses) if in_degree[i] == 0) visited = 0 # How many courses we've "finished" # 2. Process all courses "ready" to be taken while queue: node = queue.popleft() # count as finished / visited visited += 1 for neighbor in adj[node]: # All courses that depend on this one in_degree[neighbor] -= 1 # One prereq is now done! if in_degree[neighbor] == 0: # this neighbor is not ready to be taken queue.append(neighbor) # 3. If we managed to "take" all courses, there is no cycle! return visited == numCoursesCourse Schedule II (210) ;; DONE
This is a natural extension where we need to generate the actual topologically sorted array of nodes. Similarly, we can use Kahn’s Algo or do a DFS-based graph colouring for our purpose (just do a post-order inclusion). For the DFS approach, remember that the way we choose to mark visited will end up affecting whether or not we might need to reverse the schedule. I just prefer that Kahn’s Algo.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72# M1: Using Kahn's Algo from collections import defaultdict class Solution: def findOrder(self, numCourses: int, prerequisites: List[List[int]]) -> List[int]: adj = defaultdict(list) indegree = [0] * numCourses # calculate the indegrees: for course, pre in prerequisites: adj[pre].append(course) indegree[course]+= 1 # find no prereqs: # ready-queue queue = deque([course for course in range(numCourses) if indegree[course] == 0]) schedule = [] # NB: we can use the length of this as "num visited" to check for cycle, no aux counter variable necessary. while queue: node = queue.popleft() schedule.append(node) for neighbor in adj[node]: indegree[neighbor] -= 1 # neighbour ready to be taken: if indegree[neighbor] == 0: queue.append(neighbor) # if cycle found: if not len(schedule) == numCourses: return [] return schedule # M2: here's the DFS graph colouring approach: class Solution: def findOrder(self, numCourses, prerequisites): # Preprep: # NOTE: maps prereq -> course adj = [[] for _ in range(numCourses)] for course, pre in prerequisites: adj[pre].append(course) state = [0] * numCourses # 0=unvisited, 1=visiting, 2=visited order = [] def dfs(node): # detected cycle in the same round! if state[node] == 1: return False if state[node] == 2: return True # mark new as visiting: state[node] = 1 for nei in adj[node]: # explore depth in neighbours: if not dfs(nei): return False # finally, mark as visited -- this node is ready to be taken state[node] = 2 # mark this course as taken order.append(node) return True for i in range(numCourses): # if unvisited: if state[i] == 0: if not dfs(i): return [] return order[::-1] # remember to reverse because it's a postorder accumulation that we have done here.Graph Valid Tree (??) (neetcode link) ;; DONE (connected + no cycle)]
This is a traversal on an undirected graph to check if it’s a tree. We can use the basic math properties for early returns (n - 1 edges in a tree for a tree of n nodes). In this case, we also handle the undirected nature by splitting it into two directed edges (we also need to do the parent tracking for this!). This is really just a cycle check.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27from collections import defaultdict class Solution: def validTree(self, n: int, edges: List[List[int]]) -> bool: # early return based on math: there should be n - 1 edges for a tree with n nodes if len(edges) != n - 1: return False # must be exact number of edges adj = defaultdict(list) for a, b in edges: adj[a].append(b) adj[b].append(a) visited = set() stack = [(0, -1)] # (current, parent) while stack: node, parent = stack.pop() if node in visited: return False # cycle detected visited.add(node) for nei in adj[node]: if nei == parent: continue # don't backtrack stack.append((nei, node)) return len(visited) == nSort Items by Groups Respecting Dependencies (1203)
This is a higher order question whereby we need to topo sort by 2 dimensions: topo sorting within items, topo sorting of groups. This gives us a 2-phased approach.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63from collections import defaultdict, deque from typing import List class Solution: def sortItems(self, n: int, m: int, group: List[int], beforeItems: List[List[int]]) -> List[int]: # Step 0: Assign unique group IDs to ungrouped items, keep them in unique groups new_gid = m for i in range(n): if group[i] == -1: group[i] = new_gid new_gid += 1 # Step 1: Prepare adjacency lists + indegrees group_adj, group_indeg = defaultdict(list), defaultdict(int) item_adj, item_indeg = defaultdict(list), defaultdict(int) # Step 2: Build both graphs for item in range(n): for prev_item in beforeItems[item]: # Build item dependency graph item_adj[prev_item].append(item) item_indeg[item] += 1 # Build group dependency graph, only if they belong to different groups if group[prev_item] != group[item]: src, dest = group[prev_item], group[item] group_adj[src].append(dest) group_indeg[dest] += 1 # Step 3: Topological sort helper def topo(nodes, adj, indeg): q = deque(x for x in nodes if indeg[x] == 0) res = [] while q: u = q.popleft() res.append(u) for nei in adj[u]: indeg[nei] -= 1 if indeg[nei] == 0: q.append(nei) return res if len(res) == len(nodes) else [] # Step 4: Get group order & item order group_nodes = list(range(new_gid)) group_order = topo(group_nodes, group_adj, group_indeg.copy()) # GOTCHA: remember to copy over else the reference indegree may be mutated if not group_order: return [] item_nodes = list(range(n)) items_order = topo(item_nodes, item_adj, item_indeg.copy()) if not items_order: return [] # Step 5: Arrange items by group according to items_order group_to_items = defaultdict(list) for item in items_order: group_to_items[group[item]].append(item) # Step 6: Build the final result res = [] for g in group_order: res.extend(group_to_items[g]) return resAnother big-brain approach is to use an augmented graph for this (so we have a supergraph) but this hurts my head.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83from collections import defaultdict, deque from typing import List class Solution: def sortItems(self, n: int, m: int, group: List[int], beforeItems: List[List[int]]) -> List[int]: """ Single-graph approach for 1203. Instead of 2 separate topo sorts (items, then groups), we build one unified DAG containing: - virtual group nodes - item nodes and run ONE topological sort on all of them. """ # --- STEP 1: Assign unique group IDs to ungrouped items --- # Problem states group[i] == -1 if no group. We give these items # their own 'singleton' pseudo-group to enforce adjacency. new_group_id = m for i in range(n): if group[i] == -1: group[i] = new_group_id new_group_id += 1 total_groups = new_group_id # updated count after assignment # --- STEP 2: Node indexing scheme --- # 0..n-1 → item nodes # n..n+total_groups-1 → virtual group nodes def group_node(g_id: int) -> int: """ Map a group ID to its virtual node index in the graph """ return n + g_id # --- Data structures for graph + indegree --- G = defaultdict(list) # adjacency list indeg = defaultdict(int) # indegree dictionary # Initialize indegrees to zero for all nodes all_nodes = list(range(n)) + [group_node(g) for g in range(total_groups)] for node in all_nodes: indeg[node] = 0 # --- STEP 3: Connect each group node to the items it contains --- # This ensures items of the same group appear *after* the group's # node in topo order, making them contiguous in the final output. for item in range(n): gnode = group_node(group[item]) G[gnode].append(item) indeg[item] += 1 # --- STEP 4: Add edges for dependencies --- # For every beforeItems[i], item i must appear AFTER prev_item. for item in range(n): for prev_item in beforeItems[item]: if group[item] == group[prev_item]: # Same group → direct dependency between items G[prev_item].append(item) indeg[item] += 1 else: # Different groups → dependency between group nodes G[group_node(group[prev_item])].append(group_node(group[item])) indeg[group_node(group[item])] += 1 # --- STEP 5: Kahn's Algorithm (BFS Topological Sort) --- q = deque([node for node in all_nodes if indeg[node] == 0]) topo_order = [] while q: u = q.popleft() topo_order.append(u) for v in G[u]: indeg[v] -= 1 if indeg[v] == 0: q.append(v) # --- STEP 6: Check for cycles --- # A cycle anywhere (items or groups) means no valid order exists. if len(topo_order) != len(all_nodes): return [] # --- STEP 7: Extract only items from topo_order --- # Group nodes were just helpers to enforce contiguous grouping. result = [node for node in topo_order if node < n] return result
Union-Find Disjoint Sets #
- Manage dynamic connectivity, detect cycles, count components efficiently
Problems #
Number of Connected Components in an Undirected Graph (323) ;; DONE
This is traditionally a union find question but we can also use a DFS approach for this.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57# M1: Ideal, Union Find approach class Solution: def countComponents(self, n: int, edges: List[List[int]]) -> int: parent = list(range(n)) rank = [0] * len(n) # should init to 0 def find(x): while parent[x] != x: parent[x] = parent[parent[x]] # path compression x = parent[x] return x def union(x, y): rootX, rootY = find(x), find(y) if rank[rootX] < rank[rootY]: parent[rootX] = rootY return if rank[rootY] < rank[rootX]: parent[rootY] = rootX return parent[rootX] = rootY rank[rootY] += 1 for a, b in edges: union(a, b) return len({find(x) for x in range(n)}) # M2: BFS approach: from collections import defaultdict, deque class Solution: def countComponents(self, n: int, edges: List[List[int]]) -> int: adj = defaultdict(list) for a, b in edges: adj[a].append(b) adj[b].append(a) visited = [False] * n def bfs(start): queue = deque([start]) # FIX 1 : remember to mark the first entry node as visited visited[start] = True while queue: node = queue.popleft() for nei in adj[node]: if not visited[nei]: visited[nei] = True queue.append(nei) comps = 0 for node in range(n): if not visited[node]: comps += 1 bfs(node) return compsI think it’s alright to put the bfs as a helper or within the function directly, either way works or default to what looks prettier.
Redundant Connection (684) ;; DONE
We traverse through the edges (since these are the connections) and try to union the nodes together. If they’re already joined then that edge we are currently entertaining is an extra edge.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25def findRedundantConnection(edges): n = len(edges) parent = [i for i in range(n+1)] def find(x): while parent[x] != x: # set grandparent as parent parent[x] = parent[parent[x]] # path compression # check grandparent x = parent[x] return x def union(x, y): rootX, rootY = find(x), find(y) if rootX == rootY: return False # x and y are already connected parent[rootY] = rootX return True for u, v in edges: if not union(u, v): return [u, v]This version is an improvement that considers ranking:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34class Solution: def findRedundantConnection(self, edges: List[List[int]]) -> List[int]: # we do a union find n = len(edges) parent = list(range(n + 1)) # since the nodes are 1-indexed rank = [0] * (n + 1) def find(u): # compress parent while parent[u] != u: # path compress to its own parent parent[u] = parent[parent[u]] u = parent[u] return u def union(x, y): rx, ry = find(x), find(y) if rx == ry: return False # have been connected before, this is redundant if rank[rx] < rank[ry]: parent[rx] = ry elif rank[ry] < rank[rx]: parent[ry] = rx else: parent[ry] = rx rank[rx] += 1 return True for u,v in edges: if not union(u, v): return [u, v]We can choose to solve this in 2 ways: a union find approach or a bfs-based flood fill approach for connectivity tracking.
Here’s the union find approach:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37class Solution: def findCircleNum(self, isConnected: List[List[int]]) -> int: """ We have to just do union find and get the total number of unique parents / roots. isConnected is about direct connections. This means direct edges between them. We can get started. """ n = len(isConnected) parent = list(range(n)) rank = [0] * n def find(x): while parent[x] != x: parent[x] = parent[parent[x]] x = parent[x] return x def union(x, y): """ Only side-effects matter for us here. """ rx, ry = find(x), find(y) if rank[rx] < rank[ry]: parent[rx] = ry elif rank[ry] < rank[rx]: parent[ry] = rx else: parent[ry] = rx rank[rx] += 1 for a in range(n): for b in range(a + 1, n): if isConnected[a][b]: union(a, b) return len({find(x) for x in range(n)})Here’s a BFS approach, which is more performant:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22from collections import deque class Solution: def findCircleNum(self, isConnected: List[List[int]]) -> int: """ We can do a connectivity traversal using a BFS flood fill approach, then we can just count the number of islands / provinces. """ n = len(isConnected) provinces = 0 visited = [0] * n # 0: univisited, 1: visited # now we run the BFS originating from any unvisited: for i in range(n): if not visited[i]: provinces += 1 queue = deque([i]) visited[i] = 1 while queue: node = queue.popleft() for nei in (nei for nei in range(n) if isConnected[node][nei] and not visited[nei]): queue.append(nei) visited[nei] = 1 return provinces
Graph Construction & Cloning #
- Build or reconstruct graph structure from input data (edge lists, matrices)
Problems #
This is just a traversal (any kind will do) and keep track of a mapping between old and new nodes, that way we build a whole new graph with the correct node values. Also remember that this cloning only pertains to the nodes, the edges are just a happy side-effect.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36from collections import deque from typing import Optional class Solution: def cloneGraph(self, node: Optional['Node']) -> Optional['Node']: if not node: return None queue = deque([node]) old_to_new = {node: Node(val=node.val)} while queue: cur = queue.popleft() for nei in cur.neighbors: if nei not in old_to_new: old_to_new[nei] = Node(val=nei.val) queue.append(nei) old_to_new[cur].neighbors.append(old_to_new[nei]) return old_to_new[node] # Recursive DFS solution for the kicks and giggles: class Solution: def cloneGraph(self, node: 'Node') -> 'Node': if not node: return None old_to_new = {} def dfs(n): if n in old_to_new: return old_to_new[n] copy = Node(n.val) old_to_new[n] = copy for nei in n.neighbors: copy.neighbors.append(dfs(nei)) return copy return dfs(node)
Grid as Graph Modeling #
- Represent problems on grids as graph problems focusing on connectivity, traversal, or shortest path.
Problems #
Rotting Oranges (994) ;; DONE
This is a time-simulation of a flood-fill, where the current hop is the time elapsed.
This is straightforward, we want to do multisource traversal (from each rotten orange) as a way of doing time-simulation and so we just use a simple BFS traversal. This also just keeps an accumulator aux value (fresh) collected via a pre-processing step for our ease.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24from collections import deque class Solution: def orangesRotting(self, grid: List[List[int]]) -> int: ROWS, COLS = len(grid), len(grid[0]) queue = deque() fresh = 0 # accumulate starts points for the BFS start as well as the aux variable, fresh for r in range(ROWS): for c in range(COLS): if grid[r][c] == 2: queue.append((r, c, 0)) elif grid[r][c] == 1: fresh += 1 time = 0 while queue: r, c, t = queue.popleft() for dr, dc in [(-1,0),(1,0),(0,-1),(0,1)]: nr, nc = r+dr, c+dc if 0 <= nr < ROWS and 0 <= nc < COLS and grid[nr][nc] == 1: grid[nr][nc] = 2 fresh -= 1 queue.append((nr, nc, t+1)) time = t+1 return -1 if fresh > 0 else timePacific Atlantic Water Flow (417) ;; DONE
This is a good reminder that traversal direction should be something that we should take a moment to think about before we make a conclusion. Here, we realise that the better approach is to go from outwards to in (instead of the actual flow of the water). This is similar to the point where we should consider simpler, complementary approaches to the problems.
This is a reverse-reachability type. It’s also a multisource traversal!
Also remember to mark things as visited ASAP so that we don’t introduce redundancies in the search space.
from collections import deque class Solution: def pacificAtlantic(self, heights: List[List[int]]) -> List[List[int]]: # early returns for trivial edge case: if not heights or not heights[0]: return [] ROWS, COLS = len(heights), len(heights[0]) DIRS = [(-1,0), (1,0), (0,-1), (0,1)] def bfs(starts): visited = set(starts) queue = deque(starts) while queue: r, c = queue.popleft() # direct for loop is better (easier to read) than generator then for loop: for dr, dc in DIRS: nr, nc = r + dr, c + dc # is reachable coordinate: if ( 0 <= nr < ROWS and 0 <= nc < COLS and (nr, nc) not in visited and heights[nr][nc] >= heights[r][c] ): # mark as visited before enqueing! visited.add((nr, nc)) queue.append((nr, nc)) return visited pacific_starts = [(0, c) for c in range(COLS)] + [(r, 0) for r in range(ROWS)] atlantic_starts = [(ROWS-1, c) for c in range(COLS)] + [(r, COLS-1) for r in range(ROWS)] pacific_reach = bfs(pacific_starts) atlantic_reach = bfs(atlantic_starts) return [ [r, c] for r, c in (pacific_reach & atlantic_reach) ]
Bipartite Graph Testing and Coloring #
- Check if a graph can be colored with two colors with no adjacent vertices sharing the same color
- Critical for identifying bipartite graphs and detecting odd cycles
- When we want to track many:many relations between two dimensions (e.g. actors and movies )
- Core algorithms: BFS or DFS based 2-coloring validation
Problems #
We are given an adjacency matrix and asked to find out if it’s a bipartite graph, so we just colour it and attempt early returns on failure cases. DFS works well here, BFS will work too.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53# DFS graph colouring implementation. class Solution: def isBipartite(self, graph: List[List[int]]) -> bool: # reminders: may be disconnected ==> just attempt to go through the unvisited ones n = len(graph) visited = [0] * n # value meaning: 0: unvisited, 1: colour A, 2: colour B # do an iterative DFS for each unvisited: for idx in range(n): if visited[idx] != 0: # visited already, carry on: continue stack = [idx] visited[idx] = 1 # default colour? while stack: node_idx = stack.pop() node_colour = visited[node_idx] for nei in graph[node_idx]: # case 1: same colour, violation: if node_colour == visited[nei]: return False # case 3: uncoloured, start colouring if visited[nei] == 0: visited[nei] = 1 if node_colour == -1 else -1 stack.append(nei) # case 2: different , carry on, do nothing return True # BFS implementation from collections import deque class Solution: def isBipartite(self, graph: List[List[int]]) -> bool: n = len(graph) color = [0] * n # 1: colour A, -1: colour B, 0: uncoloured for start in range(n): if color[start] != 0: continue queue = deque([start]) color[start] = 1 while queue: node = queue.popleft() for nei in graph[node]: if color[nei] == color[node]: return False if color[nei] == 0: color[nei] = -color[node] queue.append(nei) return TrueThis is a simpler version of the typical incompatibility graph colouring kind of question. We just colour and be happy about it, if there’s an attempt to colour something that has already been coloured, then we know that it’s not possible to have a bipartition. This is an undirected graph, so we pretend it’s two directed edges per undirected edge.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38# BFS check for bipartition (via graph coloring) from collections import defaultdict, deque class Solution: def possibleBipartition(self, n: int, dislikes: List[List[int]]) -> bool: # build adj list, disliker ==> disliked adj = defaultdict(list) for a, b in dislikes: adj[a].append(b) adj[b].append(a) # visited nodes, 0: unvisited, 1: colour A, -1: colour B visited = [0] * (n + 1) # just makes it easy to handle 1-indexing for start_idx in range(1, n + 1): if visited[start_idx]: continue queue = deque([start_idx]) # colour start: visited[start_idx] = 1 while queue: curr = queue.popleft() curr_color = visited[curr] # colour neighbours: for nei in adj[curr]: if visited[nei] == curr_color: return False if not visited[nei]: # uncoloured visited[nei] = 1 if curr_color == -1 else -1 queue.append(nei) return True
SOE #
Concentration errors:
- Forget to mark nodes as visited causing infinite loops
- forget to add elements into aux ds-es
TLE issues happen because of unacceptable search spaces:
- when we don’t mark a cell as visited BEFORE adding it into the aux ds (stack / queue) because we introduce redundant things in the search space.
Previous misconceptions:
- Not every DAG is a tree, a DAG is a superset of a tree. Therefore, to do things like tree-validity checks, we can’t do Kahn’s algo topo-sorting and all that.
Comprehension errors from not slowing down and analysing the question properly
Not handling return types for trivial vs error cases properly. They might be different (e.g. empty vs impossible grid)
It seems that in most questions, we have to slow down and ask the basic questions. E.g. if graph, what is a node, what is an edge. If directed edge, what does a \(\rightarrow\) b mean.
this is necessary for the correct parsing of our question information. Else we might make silly errors.
Take the Kahn’s Algo from the “Course Schedule I” as an example.
As usual, we need to care about the direction of the edge within the DAG. In this, the topo sort a \(\rightarrow\) b means that we do a then we do b. Therefore, if the prereq is given such that a \(\rightarrow\) b meaning that b needs to be done BEFORE a, then the adjacency list we build (which represents the graph) should be b \(\rightarrow\) a since b has to be taken before a is taken.
the learning here is that as usual, keep asking about the basics “what’s the source, what’s the destination, what is the adj list for (for topo) and what is the indegrees for (for guiding our order of traversal)”
Incorrect calculation of entry/exit indegrees in topo sort
Union-find path compression and rank updates errors
For Topo sorting, remember that the adjacency list is built such that a \(\rightarrow\) b means that we do a first then b. Don’t mistake the directionality of prerequisites form the directionality of our topo sorting / graph represented as an adj list.
Forgetting to do cycle detection on graphs (e.g. when we apply kahn’s for topo sorting or other things) don’t just assume that it’s acyclic.
when adding in 2 directed edges for an undirected graph’s adj list, there’s a need to check if we need to do an explicit parent tracking or not. This matters especially in the case when we do DFSes. Here’s an example of when we do this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27from collections import defaultdict class Solution: def validTree(self, n: int, edges: List[List[int]]) -> bool: # early return based on math: there should be n - 1 edges for a tree with n nodes if len(edges) != n - 1: return False # must be exact number of edges adj = defaultdict(list) for a, b in edges: adj[a].append(b) adj[b].append(a) visited = set() stack = [(0, -1)] # (current, parent) while stack: node, parent = stack.pop() if node in visited: return False # cycle detected visited.add(node) for nei in adj[node]: if nei == parent: continue # don't backtrack stack.append((nei, node)) return len(visited) == nUnion find boiler plate’s find implementation can be recursively or iteratively written, don’t mix the two.
Wrong:
1 2 3 4 5def find(x): """Wrong because this mixes both while loop and a recursive call to find().""" while parent[x] != x: parent[x] = find(parent[x]) return parent[x]instead, either of these approaches would work:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19def find(self, x: int) -> int: """ Find the root of x with path compression. This implementation of the find function is recursive in nature for the path compression. """ if self.parent[x] != x: # Path compression: set parent[x] directly to the root self.parent[x] = self.find(self.parent[x]) return self.parent[x] def find_(self, x: int) -> int: """ Iterative implementation for the find function with path compression: """ while parent[x] != x: parent[x] = parent[parent[x]] x = parent[x] return x
Decision Flow #
- Need connectivity components? → DFS/BFS or Union-Find
- Find orderings under dependencies? → Topological sort
- Find shortest path unweighted? → BFS
Style, Tricks and Boilerplate #
Boilerplate #
Union Find, Disjoint Sets #
For parent and rank tracking, we could use a dictionary as well if we might have different ways to track a single element (e.g. a tuple of
(row, col)for coordinates in a grid).essential optimisations are the path compression step when implementing the
findfunction and the rank-based “merging”.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47class UnionFind: def __init__(self, n: int): # Initially, each node is its own parent and its rank (size) is 1 self.parent = [i for i in range(n)] self.rank = [1] * n # can also track size instead of rank (then just put 0) def find(self, x: int) -> int: """ Find the root of x with path compression. This implementation of the find function is recursive in nature for the path compression. """ if self.parent[x] != x: # Path compression: set parent[x] directly to the root self.parent[x] = self.find(self.parent[x]) return self.parent[x] def find_(self, x: int) -> int: """ Iterative implementation for the find function with path compression: """ while parent[x] != x: parent[x] = parent[parent[x]] x = parent[x] return x def union(self, x: int, y: int) -> bool: """ Union the sets containing x and y. Returns True if merged, False if already in the same set. """ rootX = self.find(x) rootY = self.find(y) if rootX == rootY: return False # already connected # Union by rank (attach smaller tree to bigger tree) if self.rank[rootX] < self.rank[rootY]: self.parent[rootX] = rootY elif self.rank[rootX] > self.rank[rootY]: self.parent[rootY] = rootX else: self.parent[rootY] = rootX self.rank[rootX] += 1 return True
Bipartite checks #
in this example, we treat it as a graph colouring problem with 2 colours and that’s sufficient for us, here’s a DFS approach:
| |
Cycle Detection #
key problem to solve is: do I have a cycle in my graph?
Directed Graph Cycle detection
DFS with Recursion Stack Approach (graph coloring)
To check if there’s a cycle, we need to consider a general cycle (more than just immediate link-back).
This means that we’d need some way to encode 3 states for nodes.
We can encode (colour) them in any way we want, just need 3 states:
- Unvisited
- Visiting (currently, in this branch exploration)
- Visited (when leaving it)
During DFS, if you encounter a node marked as Visiting in the current path, a cycle exists.
Course Schedule (207) offers an opportunity to solve this via colouring:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33# M1: GRAPH COLORING DFS APPROACH (3-colour graph (0: unvisited: 1: visiting 2: visited)) class Solution: def canFinish(self, numCourses: int, prerequisites: List[List[int]]) -> bool: # list of lists, access using relative idx adj = [[] for _ in range(numCourses)] # careful: we are ultimately trying to clear the topo sort, so we are building edge from a -> b if a is a prereq of b for course, pre in prerequisites: adj[pre].append(course) # single visited with 3 states, space optimisation! visited = [0] * numCourses # 0 = unvisited, 1 = visiting, 2 = visited def dfs(node): if visited[node] == 1: # Cycle! return False if visited[node] == 2: # done state, can return early return True visited[node] = 1 for nei in adj[node]: if not dfs(nei): return False visited[node] = 2 return True for i in range(numCourses): if not dfs(i): return False return True
Coloring / Mark and Check Approach
This is the DFS with Recursion stack, just that the mental model we’re using here is that we’re colouring nodes to signal the 3 states.
Purely just a framing of the approach here.
Kahn’s Algorithm (BFS, topo sorting)
If not all nodes are processed (cannot remove all nodes with in-degree 0), the remaining nodes form a cycle.
The key idea here is that a Topo sort is only viable when we have a DAG, that’s why if not everything is processed, there must be a cycle.
We’ve seen this used in Course Schedule I (207)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30from collections import defaultdict, deque class Solution: def canFinish(self, numCourses: int, prerequisites: List[List[int]]) -> bool: # Build adjacency list and in-degree array adj = defaultdict(list) # in-degree: For each node, count how many edges point to it = how many prerequisites it has. in_degree = [0] * numCourses for course, pre in prerequisites: adj[pre].append(course) # prereq → course (edge creation) in_degree[course] += 1 # course needs one more prereq # 1. Find all courses with no prerequisites. Any element in the queue is "ready to be taken" queue = deque(i for i in range(numCourses) if in_degree[i] == 0) visited = 0 # How many courses we've "finished" # 2. Process all courses "ready" to be taken while queue: node = queue.popleft() # count as finished / visited visited += 1 for neighbor in adj[node]: # All courses that depend on this one in_degree[neighbor] -= 1 # One prereq is now done! if in_degree[neighbor] == 0: # this neighbor is ready to be taken, we add it into the queue queue.append(neighbor) # 3. If we managed to "take" all courses, there is no cycle! return visited == numCourses
Undirected Graph Cycle Detection
Union-Find Approach Disjoint Set Union
- Each node starts in its own set.
- For every edge (to be “added”), check if the two ends have the same parent:
- If yes, adding the edge would create a cycle.
- If no, unite their sets (union them).
- Note: Standard Union-Find does not directly apply to directed graph cycle detection.
Back-Edge Checking Using Parent Pointers (DFS, Undirected Graphs)
- Track the parent node on DFS traversal.
- If you REVISIT a neighbor that is not the parent of the current node, you have a cycle.
Style #
- I like the idea of defining
ROWS,COLS,DIRECTIONSat the top and consistently making references to that. - Naming preferences:
neiandneisfor “neighbour”, “neighbours”- careful about not mixing up american and british spellings
- Usually time simulations involve a tick-by-tick approach for which BFS is usually a good fit.
- when we have to deal with 1-index cases, we can just do dp-like inits with (n + 1) and ignore the 0th index to handle this. That is easier than trying to morph the indices and such.
Tricks #
Traversal direction should be something that we should take a moment to think about. In cases such as the “Pacific Atlantic Water Flow” we realise that the better approach is to go from outwards to in (instead of the actual flow of the water). This is similar to the point where we should consider simpler, complementary approaches to the problems.
This rings true for ANY graph question really because we should always think about how to represent the problem in the best way (what’s the best representation for nodes and edges…).
TRICK: Memory hack: reuse input grids / spaces instead of spawning new ones.
we can reuse the input grid for tracking (e.g. in the floodfill approach for “Number of Islands” problem)
TRICK: reduce search space ASAP
to reduce search spaces, we can ensure that any children added to queues / stacks get marked as visited as we insert into the ds, rather than when we retrieve from the DS.
TRICK: to avoid last-scans for completion, use an aux counter number that we can decrement. IF fully decremented, then the overall state is complete, else it’s an incomplete state.
e.g. Rotting Oranges problem.
TRICK: intermediate safe states while doing the main job then having a formatting area of the code (like “Surround Regions (130)”)
We can mark intermediate safe states to help us. Here’s it’s the ‘*’ that we temporarily tag.
It’s interesting how there’s an intermediate marking state which we then change it back once we have our answers.
we could do topo sorting on a combined supergraph, such as in “Sort Items by Groups Respecting Dependencies (1203)”.
What we did here is that we created a whole new graph called
Gwith it’sindegree. The nodes within these graph were from groups / items and it didn’t matter because we treated them as the same. This is what steps 3 and 4 are about in the solution below.Then we ran the same Kahn’s Algo (step 5) and checked for cycles. Finally we just filtered out the actual nodes (ignoring the groups).
The supergraphs solution is just tedius, but not impossible to come up with.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84from collections import defaultdict, deque from typing import List class Solution: def sortItems(self, n: int, m: int, group: List[int], beforeItems: List[List[int]]) -> List[int]: """ Single-graph approach for 1203. Instead of 2 separate topo sorts (items, then groups), we build one unified DAG containing: - virtual group nodes - item nodes and run ONE topological sort on all of them. """ # --- STEP 1: Assign unique group IDs to ungrouped items --- # Problem states group[i] == -1 if no group. We give these items # their own 'singleton' pseudo-group to enforce adjacency. new_group_id = m for i in range(n): if group[i] == -1: group[i] = new_group_id new_group_id += 1 total_groups = new_group_id # updated count after assignment # --- STEP 2: Node indexing scheme --- # 0..n-1 → item nodes # n..n+total_groups-1 → virtual group nodes def group_node(g_id: int) -> int: """ Map a group ID to its virtual node index in the graph """ return n + g_id # --- Data structures for graph + indegree --- G = defaultdict(list) # adjacency list indeg = defaultdict(int) # indegree dictionary # Initialize indegrees to zero for all nodes all_nodes = list(range(n)) + [group_node(g) for g in range(total_groups)] for node in all_nodes: indeg[node] = 0 # --- STEP 3: Connect each group node to the items it contains --- # This ensures items of the same group appear *after* the group's # node in topo order, making them contiguous in the final output. for item in range(n): gnode = group_node(group[item]) G[gnode].append(item) indeg[item] += 1 # --- STEP 4: Add edges for dependencies --- # For every beforeItems[i], item i must appear AFTER prev_item if they're from the same group # if they're from different groups then the ordering reflects the ordering of their groups. for item in range(n): for prev_item in beforeItems[item]: if group[item] == group[prev_item]: # Same group → direct dependency between items G[prev_item].append(item) indeg[item] += 1 else: # Different groups → dependency between group nodes G[group_node(group[prev_item])].append(group_node(group[item])) indeg[group_node(group[item])] += 1 # --- STEP 5: Kahn's Algorithm (BFS Topological Sort) --- q = deque([node for node in all_nodes if indeg[node] == 0]) topo_order = [] while q: u = q.popleft() topo_order.append(u) for v in G[u]: indeg[v] -= 1 if indeg[v] == 0: q.append(v) # --- STEP 6: Check for cycles --- # A cycle anywhere (items or groups) means no valid order exists. if len(topo_order) != len(all_nodes): return [] # --- STEP 7: Extract only items from topo_order --- # Group nodes were just helpers to enforce contiguous grouping. result = [node for node in topo_order if node < n] return result