Topic 1: Arrays & Hashing
Table of Contents
This includes prefix-sum type questions as well
Canonical Questions #
String simulations #
These problems involve doing some operations on strings. The key idea here is that we should be exploiting string-specific APIs that python provides. Other things could be that instead of actually doing the hard operations, it might make sense to do soft simulations of the applications to solve some of these problems.
Problems #
Process String with Special Operations I (3612)
I have an on optimal solution that works well enough for the constraints by actually carrying out the processes.
Optimisations would be amortising things (soft deletes, lazy reversal flags) OR some other way to simulate without the memory overheads.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22class Solution: def processStr(self, s: str) -> str: """ It's a string state management exercise, might need to find ways to do this efficiently. Okay this works, a better approach is likely to amortise the expensive operations, keep a "is_reversed" flag and then try not to copy over on the reversed operations and just pop and all that when we need, I guess a soft delete would work as well, and we can filter it out at the end when joining the list into a string. """ res = [] for char in s: if char == "#": res += res continue if char == "%": res = res[::-1] continue if char == "*": res = res[:-1] continue res.append(char) return "".join(res)Process String with Special Operations II (3614)
Here, we will observe that if we actually carry out the operations, they are too expensive. Therefore, our objective should be to do soft simulations on these operations. This is a reverse simulation that we’d need to do, so we can do a 2 pass approach on this. First one is to find out the final length of the output string, then we will use the inverse of each process and do some logic there.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40class Solution: def processStr(self, s: str, k: int) -> str: """ This is virtual simulation of the string operations, instead of directly simulating it. We won't construct the string and operate on it, we will work backwards, doing the inverse operations. We shall do 2 passes: 1. find the final length, early return if not satisfactory 2. in reverse order of the input, we carry out inverse operations. we can early return when the char count matches k. """ # first pass: final length calculation size = 0 for c in s: if c.islower(): size += 1 elif c == "*" and size > 0: # remover, pops 1 out size -= 1 elif c == "#": # duplicator size *= 2 # early returns if out of bounds index if not (0 <= k < size): return '.' # 2nd pass, inverse simulation: for c in reversed(s): if c.islower(): size -= 1 # found the char: if k == size: return c elif c == "*": # inverse of poping is pushing size += 1 elif c == "#": # inverse of doubling is halving, but need to be careful if we need to rotate k size //= 2 if k >= size: # is in theright half k -= size elif c == "%": # need to flip it: k = size - 1 - k return "."Sort Vowels in a String (2785)
This is just a simple string manipulation, we’re just focused on constructing something efficiently. My solution uses a Counter and a generator (iterator) to spit out the next values for vowels, when we come across them. Some useful tips for this would be to directly use sets instead of lists for such direct membership checks with characters.
1 2 3 4 5 6 7 8 9 10 11 12 13class Solution: def sortVowels(self, s: str) -> str: vowels = set("aeiouAEIOU") # set allows for fast membership checks sorted_vowels = sorted([ch for ch in s if ch in vowels]) res = [] idx = 0 for ch in s: if ch in vowels: res.append(sorted_vowels[idx]) idx += 1 else: res.append(ch) return "".join(res)
Prefix sums / Subarray Sum = K (prefix sum + hashmap) #
The key idea for such problems is to rely on preprocessed prefix/suffix arrays to help make local decisions better.
- These problems may also deal with contiguous segments (subarrays) and searching for some condition to be met.
- The idea here is that we can create prefix sums and that can be our target that we might want to search for, so it’s also like a 2sum searchability problem
- Also classic use when we care about range sums and the like.
this type of problem is actually also a DP problem. The main reason why is that the order of filling matters for us (so that we avoid double counting).
Problems #
Product of Array Except Self (238)
This is a classic problem. It’s just a prefix / suffix processing that we need to do. It’s just a 2 pass that we do here. The intuition is that we need to rely on partial running subs from left/right.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18class Solution: def productExceptSelf(self, nums: List[int]) -> List[int]: n = len(nums) output = [1] * n # handle the prefixes first prefix = 1 for i in range(n): output[i] = prefix prefix *= nums[i] # now handlethe suffixes suffix = 1 for i in range(n - 1, -1, -1): output[i] *= suffix suffix *= nums[i] return outputThis just does a single pass of 3 monotonic regions and ensures other things such as the length of the segments are appropriate. The solution presented below just does a direct sweep for things instead of actually using prefix/suffix sums.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41class Solution: def isTrionic(self, nums: List[int]) -> bool: """ 3 partitions, they are strictly increasing, strictly decreasing, strictly increasing. consider the boundaries a, +, a, -/0,b, -,b, -, b, +/0, c, +, c boundaries can be 0 or n We also realise that we need to form 3 segments, we aren't looking for turning points necessarily, we are looking for 3 segments. m1 failed: if I just use triples """ n = len(nums) if n < 4: return False # can't form 2 segments with less than 4 elements # 1: find end of strictly increasing segment: p = 0 while p < n - 1 and nums[p] < nums[p + 1]: p += 1 # check if failure: if p == 0: return False # 2: find end of strictly decreasing: q = p while q < n - 1 and nums[q] > nums[q + 1]: q += 1 # for us to end, the final segment must have at least one element and q must have moved from p if q == p or q == n - 1: return False # 3: check final segment is strictly decreasing: i = q while i < n - 1 and nums[i] < nums[i + 1]: i += 1 # i should be n - 1 (reached the end) return i == n - 1This is a nifty use of prefix sums. I think this is one of the rare instances where most of the work is to determine what to precompute.
We should think of it as a search for ALL the trionic subarrays, which we shall use precomputed prefix sums for. The rough idea is to look for 3 segments (formed by 2 turning points, one peak, one trough). We shall move a pointer representing the middle peak idx, then once we find it, we shall attempt to expand rightward to find the trough then we shall use the precomputed suffix array to get the best right hand side value.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59class Solution: def maxSumTrionic(self, nums: List[int]) -> int: """ Scan for 3 segments: A: strictly increasing prefix, B: strictly decreasing middle, C: strictly increasing suffix. Precompute for every index: - inc_sum_left: max sum of a strictly increasing subarray ending at each position (from the left) - inc_sum_right: max sum of a strictly increasing subarray starting at each position (from the right) The main pass looks for peaks, then greedily extends the decreasing segment as much as possible. Then checks that after the decreasing segment, another increase exists -- if so, computes the sum by joining the 3 regions. """ n = len(nums) inc_sum_left = [0] * n # max sum of strictly increasing subarray ending at each i (left-to-right) inc_sum_right = [0] * n # max sum of strictly increasing subarray starting at each i (right-to-left) inc_sum_left[0] = nums[0] inc_sum_right[-1] = nums[-1] # Precompute left-to-right strictly increasing sums (for segment A) for i in range(1, n): if nums[i] > nums[i-1]: inc_sum_left[i] = max(inc_sum_left[i-1] + nums[i], nums[i]) else: inc_sum_left[i] = nums[i] # Precompute right-to-left strictly increasing sums (for segment C) for i in range(n-2, -1, -1): if nums[i] < nums[i+1]: inc_sum_right[i] = max(inc_sum_right[i+1] + nums[i], nums[i]) else: inc_sum_right[i] = nums[i] i = 0 ans = float("-inf") while i < n: # Looking for a peak: left increasing, peak, then decreasing middle if i+2 < n and nums[i] < nums[i+1] and nums[i+1] > nums[i+2]: curr_sum = 0 # curr_sum shall hold the sum of the decreasing segment. j = i+1 # Expand the strictly decreasing region (middle segment B) while j+1 < n and nums[j] > nums[j+1]: curr_sum += nums[j] j += 1 # If another increasing segment exists after the trough if j+1 < n and nums[j] < nums[j+1]: curr_sum += nums[j] trionic_subarray_sum = inc_sum_left[i] + curr_sum + inc_sum_right[j+1] ans = max(ans, trionic_subarray_sum) i += 1 return ansMaximum Number of Subsequences After One Inserting (3628)
We are asked for max number of subsequences wherein the elements don’t have to be contiguous but need to preserve relative rank order. This is what asks us to try doing a prefix / suffix counting approach.
We keep prefix and suffix partial subsequences COUNTS for every index and then think how to use the insertion. For the insertion, we can insert L, C or T. For L and T there’s a single best case (add to leftmost and rightmost respectively) for C, there’s a need to comb through.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68class Solution: def numOfSubsequences(self, s: str) -> int: """ For this, the first thought that comes to mind is to consider prefix counts of partial subsequences. So for each i, we should try count all - # of "L"s until that point - # of "LC"s until that point - # of "LCT"s until that point These partial counts will be useful for us to get the final values without insertion. Now we consider the insertion. The question is probably, which partial sum is the highest jump so that we can add it there? Not too sure how to factor in the "at most one insertion" part. For the insertions, what we insert matters for the cases: 1. L: should only be inserted at the start since it's the first (for max benefit) 2. C: should be inserted at any points midway to do the L_T properly 3. T: should only be inserted at the end (mirrors argument for case 1) We take the max gain from these 3 choices """ n = len(s) # precompute prefix counts for L and LC subsequences preL = [0] * (n + 1) preLC = [0] * (n + 1) for i in range(n): preL[i + 1] = preL[i] + (1 if s[i] == 'L' else 0) preLC[i + 1] = preLC[i] + (preL[i] if s[i] == 'C' else 0) # precompute suffix counts for C, CT pairs, and T (the C and T help us form intermediates to get CT) sufC = [0] * (n + 1) sufCT = [0] * (n + 1) sufT = [0] * (n + 1) for i in reversed(range(n)): sufC[i] = sufC[i + 1] sufCT[i] = sufCT[i + 1] sufT[i] = sufT[i + 1] if s[i] == 'T': sufT[i] += 1 elif s[i] == 'C': sufC[i] += 1 sufCT[i] += sufT[i] # count initial number of "LCT" subsequences countL = countLC = countLCT = 0 for c in s: match c: case 'L': countL += 1 case 'C': countLC += countL case 'T': countLCT += countLC res = countLCT # insert L at start: adds all "CT" pairs res = max(res, countLCT + sufCT[0]) # insert T at end: adds all "LC" pairs res = max(res, countLCT + preLC[n]) # insert C at every position: max over L before * T after for i in range(n + 1): res = max(res, countLCT + preL[i] * sufT[i]) return res
Sequences #
Problems #
Longest Consecutive Sequence (128): it’s more about the elements forming an consecutive sequence
this is more about finding out whether we have a “noop ladder” like in ROP hacking. Solution: create a set of nums (\(O(n)\)), then for each num, check if it’s the starting num (
( n - 1 ) not in num_set) then just keep climbing up that ladder until no more, keep track of max length. Remember, only try to build sequences from the start of a sequence.This is such a simple way of looking at the problem, our first reach should NOT be some kind of interval merging or something. The input variables already hint that it’s a linear sweep of some sort.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17class Solution: def longestConsecutive(self, nums: List[int]) -> int: num_set = set(nums) longest = 0 for n in num_set: # if this n is the start of the "noop ladder": if n - 1 not in num_set: length = 1 # build sequences from a new start: while n + length in num_set: length += 1 longest = max(longest, length) return longest
NO Dynamic Range Querying using Fenwick Trees (Binary Index Trees) #
- CLOSING NOTE
I stopped prepping for it, I’ll just skip this task until I need to come back to interview prep again - this kind of problems deal with needing to do prefix sums and also updating them efficiently. They are also extremely niche and a bunch of hard questions in the more recent leetcode questions only accept such solutions. Typically, we can still achieve partial solutions using DP approaches, but will likely timeout unless fenwick trees are used.
Problems #
Number of Integers with Popcount-Depth Equal to K II (3624)
The key idea here is that we need to accumulate interval counts of the popcount depths for depths in range [0, 5], so we have 6 different fenwick trees to track this. Querying comes in 2 forms: range queries and update queries and both can be handled in \(O(log n)\) time thanks to the use of fenwick trees.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61from typing import List from functools import cache class FenwickTree: def __init__(self, n): self.n = n self.fw = [0]*(n+1) def update(self, i, delta): i += 1 while i <= self.n: self.fw[i] += delta i += i & (-i) def query(self, i): i += 1 s = 0 while i > 0: s += self.fw[i] i -= i & (-i) return s def range_query(self, l, r): return self.query(r) - (self.query(l-1) if l > 0 else 0) class Solution: def popcountDepth(self, nums: List[int], queries: List[List[int]]) -> List[int]: @cache def get_depth(x): depth = 0 while x != 1: x = x.bit_count() depth += 1 return depth n = len(nums) max_depth = 5 depths = [get_depth(x) for x in nums] # stores the current depths of each num, idxed by the nums idx # we build a fenwick tree, each tree will contain the prefix sum for the number of nums with depth = k (fixed) within fenwicks = [FenwickTree(n) for _ in range(max_depth+1)] # Initialize Fenwicks for i, d in enumerate(depths): fenwicks[d].update(i, 1) ans = [] for query in queries: match query: case 1, l, r, k: num_indices = fenwicks[k].range_query(l, r) ans.append(num_indices) case 2, idx, val: old_depth, new_depth = depths[idx], get_depth(val) if old_depth == new_depth: continue fenwicks[old_depth].update(idx, -1) fenwicks[new_depth].update(idx, 1) depths[idx] = new_depth return ansSum of Beautiful Subsequences (3671)
To be honest, I don’t understand this. The DP version will pass 80% of the test cases, I most likely will end up resorting to that. The fenwick tree solutions, I don’t even have an intuition for it.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94# M1: Fenwick Tree solution (correct, passes all) class Fenwick: def __init__(self, n): self.arr = [0] * (n + 1) def sum(self, i): s = 0 while i > 0: s += self.arr[i] i -= i & (-i) return s def add(self, i, x): while i < len(self.arr): self.arr[i] += x i += i & (-i) class Solution: def totalBeauty(self, nums: List[int]) -> int: MOD = 10**9 + 7 max_val = max(nums) + 1 locs = defaultdict(list) for idx, val in enumerate(nums): locs[val].append(idx) F = [0] * max_val for d in range(1, max_val): # indices of all numbers divisible by d indices = sorted(pos for val in range(d, max_val, d) for pos in locs[val]) if len(indices) <= 1: F[d] = len(indices) continue rank = {pos: r for r, pos in enumerate(indices, 1)} # rank compress fenw = Fenwick(len(indices)) for val in range(d, max_val, d): for pos in reversed(locs[val]): r = rank[pos] new_subseq_count = 1 + fenw.sum(r - 1) F[d] += new_subseq_count fenw.add(r, new_subseq_count) for d in range(max_val - 1, 0, -1): for multiple in range(d * 2, max_val, d): F[d] -= F[multiple] F[d] %= MOD return sum((d * F[d]) % MOD for d in range(1, max_val)) % MOD # M2::PARTIAL: dp solution: from math import gcd from collections import defaultdict class Solution: def totalBeauty(self, nums: List[int]) -> int: """ I think this is a multi step question as well First we just find all the subsequences (and the GCDs within them) Then we combine via the GCD values and get the beauty scores == steps {problem decomposition} ==> naive 1. enum strictly increasing subsequences in the array 2. for each subsequence, find its gcd 3. for each GCD, g, count subsequences with gcd = g 4. calculate == key intuition for optimisation: 1. directly enumerating is costly 2^n, so we have to build up the results. 2. so we use dp 2D where dp[i][g] is the number of strictly increasing subsequences ending at idx i with gcd g """ MOD = (10 ** 9) + 7 n = len(nums) total_beauty = 0 dp = [defaultdict(int) for _ in range(n)] # so dp[i][g] gives number of strictly increasing subsequences ending at index i with gcd g. for i in range(n): dp[i][nums[i]] = 1 # init value, it's gcd 1 for itself for j in range(i): # left pointer if nums[j] < nums[i]: # then can accumulate more GCDs for g, count in dp[j].items(): new_g = gcd(g, nums[i]) new_count = dp[i][new_g] + count dp[i][new_g] = new_count % MOD # now we calculate beauty values: for g, count in dp[i].items(): total_beauty = (total_beauty + (g * count)) % MOD return total_beauty
N-Sum Family #
Problems #
- N-Sum Family
- 2-sum is about complement finding, pairwise just keep a seen index and use that when searching for complements further down
TODO Sudoku Family #
- Sudoku Family
- n-dimensional array access is the main learning point here. When we access blocks, then for each cell, we can get which block they’re from using
block_idx = row_idx // 3, col_idx // 3.
- n-dimensional array access is the main learning point here. When we access blocks, then for each cell, we can get which block they’re from using
TODO Stock Family #
SOE #
General #
- Forgetting negatives / zeros in prefix sums \(\implies\) this affects the order of filling / presences of duplicate prefix sums in my list of prefix sums
Jumping the Gun to stdlib thoughts #
- Off-by-one in subarray loops
- Jumping straight to “pythonic” stdlib uses is NOT good. Just think about the minimal implementation then on the second look at it, think of stdlib uses. Need to avoid cases where we interpret the question wrongly. E.g. for “Top K Frequent Elements (347)”, there should be no reason to use a heapq, we can just directly sort based on counted values.
Disambiguating Sliding window vs prefix/suffix accumulation #
- Sliding window vs prefix/suffix accumulation is a common pitfall
- If the problem wants you to process variable subarrays, or is about max-min over all partitions (not just windows that move by 1), sliding window is probably not a fit.
Decision Flow #
Is it lookup/search? → HashMap
Is it range sum? → Prefix sums
Do we care about some properties of subsequences (where the contiguity not important BUT the relative order matters) -> consider using prefix/suffix counting, which may be a state-tracking kind of counting as well.
Is the problem about a ‘window’ (i.e. strict subarray of size K / variable) or is it accumulating something from start / end
- if window: most likely Sliding window
- if accumulation / arbitrary splits then it’s most likely: Prefix/Suffix tracking
Char counting / Freq counting \(\rightarrow\) use fixed array counter if is fixed space else use
collections.Counter
Style, Recipes, Boilerplate #
- Better to early return instead of more code-golf like solutions
- love the use of
defaultdict, careful on the choice of keys (must be hashable) - use genexps for intermediate lists where possible