139. Word Break¶
Problem Description¶
LeetCode Problem 139: Given a
string s
and a dictionary of strings wordDict
, return true
if s
can be segmented
into a space-separated sequence of one or more dictionary words.
Note that the same word in the dictionary may be reused multiple times in the segmentation.
Clarification¶
-
Assumption¶
-
Solution¶
Approach 1: Breadth-First Search¶
The process of segmenting a string into a sequence of words can be modeled as a graph where
- each node represents a position (i.e., index) in the string. When reaching
node with index
i
, it meanss[0:i]
(i
is excluded) is successfully segmented into a sequence of valid words. - each edge represents a word,
s[i:j]
from the dictionary, wherei
is index in parent node andj
is index in a child node.
We can use the BFS approach to explore all possible paths from the start index (0
) to the
end index of the string (len(s)
). If we find a path that reaches the end of the string,
we return True
. If we exhaust all possible paths, we return False
.
graph TD
p("0")
c1("4")
c2("3")
c3("2")
c11("8")
c21("5")
c211("8")
p --leet--> c1 --code--> c11
p --lee--> c2 --tc--> c21 --"ode"--> c211
p --le--> c3
style c11 fill:green
style c211 fill:green
from collections import deque
class Solution:
def wordBreak(self, s: str, wordDict: List[str]) -> bool:
queue = deque([0]) # start index
seen = set()
while queue:
start = queue.popleft()
if start == len(s):
return True
for word in wordDict:
end = start + len(word)
if end > len(s) or end in seen:
continue
if s[start:end] == word:
queue.append(end)
seen.add(end)
return False
Complexity Analysis of Approach 1¶
- Time complexity: \(O(n \cdot k \cdot m)\) where \(n\) is the length of the string, \(k\) is
the number of words in the dictionary, and \(m\) is the maximum length of the words.
- In the worst case, we may visit each index of string
s
(i.e., each node) at most once, which is \(O(n)\). - For each index (node),
- try all
k
words, - each word check costs \(O(m)\) due to string slicing
s[start:end]
.
- try all
- The total cost is \(O(n \cdot k \cdot m)\).
- In the worst case, we may visit each index of string
- Space complexity: \(O(n)\)
queue
holds at most \(O(n)\) indices.seen
holds at most \(O(n)\) indices.- The total cost is \(O(n) + O(n) = O(n)\).
Approach 2: Top-Down Dynamic Programming¶
The problem can also be solved using a top-down dynamic programming approach with memoization.
- State:
dp[i]
isTrue
if the substrings[0:i]
(i
is not included) can be segmented into a sequence of valid words. - Transition: For each index
i
, we need to check two conditions:- If
s[j:i]
is a valid word in the dictionary, wherej
is any index from0
toi
. - If
dp[j]
isTrue
, meaning the substrings[0:j]
can be segmented into valid words.
So the transition can be expressed as:dp[i] = any(s[i - len(word) : i] == word and dp[i - len(word)] for word in words)
- If
- Base Case:
dp[i] = True
fori <= 0
because we can build an empty string from no words.
class Solution:
def wordBreak(self, s: str, wordDict: List[str]) -> bool:
return self._wordBreak(s, tuple(wordDict), len(s))
@cache
def _wordBreak(self, s: str, words: Tuple[str], end: int) -> bool:
if end <= 0:
return True
for word in words:
if s[end - len(word) : end] == word and self._wordBreak(s, words, end - len(word)):
return True
return False
Complexity Analysis of Approach 2¶
- Time complexity: \(O(n \cdot k \cdot m)\)
- There are at most \(n\) states of
dp[i]
and we only calculate each state once due to memoization(\(O(n)\)). - The input
wordDict
is converted to a tuple, which is immutable and can be used as a cache key, which is \(O(k)\) - For each state,
- try all
k
words, - each word check costs \(O(m)\) due to string slicing
s[start:end]
.
- try all
- The total cost is \(O(k) + O(n \cdot k \cdot m) = O(n \cdot k \cdot m)\).
- There are at most \(n\) states of
- Space complexity: \(O(n)\)
- The memoization cache stores at most \(O(n)\) states.
- The input
wordDict
is converted to a tuple, which takes \(O(k)\). - The recursion stack can go up to \(O(n)\) deep.
- The total cost is \(O(n) + O(k) + O(n) = O(n)\).
Approach 3: Bottom-Up Dynamic Programming¶
The problem can also be solved using a bottom-up dynamic programming approach. We will use
- an array
dp
wheredp[i]
isTrue
if the substrings[0:i + 1]
(i
is included but noti + 1
) can be segmented into a sequence of valid words. - the same recurrence relation
dp[i] = any(s[i - len(word) + 1 : i] == word and dp[i - len(word)] for word in words)
Note that dp[i]
is slightly different from the previous approaches. Here, dp[i]
is
True
if the substring s[0:i + 1]
can be segmented into valid words, while in the
previous approaches, dp[i]
is True
if the substring s[0:i]
can be segmented into
valid words. Both approaches are equivalent. Just want to show different definitions of dp[i]
.
class Solution:
def wordBreak(self, s: str, wordDict: List[str]) -> bool:
dp = [False] * len(s)
for i in range(len(s)):
for word in wordDict:
# Handle out of bounds case
if i < len(word) - 1:
continue
if i == len(word) - 1 or dp[i - len(word)]:
if s[i - len(word) + 1 : i + 1] == word:
dp[i] = True
break
return dp[-1]
Complexity Analysis of Approach 3¶
- Time complexity: \(O(n \cdot k \cdot m))\)
Similar to previous approaches, we calculate \(O(n)\) states in total and each state takes \(O(m \cdot k)\) to compute. - Space complexity: \(O(n)\)
We use an arraydp
of size \(O(n)\) to store the results of subproblems.
Comparison of Different Approaches¶
The table below summarize the time complexity and space complexity of different approaches:
Approach | Time Complexity | Space Complexity |
---|---|---|
Approach - BFS | \(O(n \cdot k \cdot m)\) | \(O(n)\) |
Approach - Top-Down Dynamic Programming | \(O(n \cdot k \cdot m)\) | \(O(n)\) |
Approach - Bottom-Up Dynamic Programming | \(O(n \cdot k \cdot m)\) | \(O(n)\) |
Test¶
- Test empty string and empty dictionary:
wordBreak("", [])
should returnTrue
. - Test single character string and single character dictionary:
wordBreak("a", ["a"])
should returnTrue
. - Test single character string and empty dictionary:
wordBreak("a", [])
should returnFalse
. - Test string that can be segmented:
wordBreak("leetcode", ["leet", "code"])
should returnTrue
. - Test string that cannot be segmented:
wordBreak("leetcode", ["leet", "c"])
should returnFalse
.