Dynamic Programming

Dynamic programming, like the divide-and-conquer method, solves problems by combining the solutions to subproblems. (“Programming” in this context refers to a tabular method, not to writing computer code.)

As we saw, divide-and-conquer algorithms partition the problem into disjoint subproblems, solve the subproblems recursively, and then combine their solutions to solve the original problem. In contrast, dynamic programming applies when the subproblems overlapthat is, when subproblems share subproblems. In this context, a divide-and-conquer algorithm does more work than necessary, repeatedly solving the common subproblems. A dynamic-programming algorithm solves each subproblem just once and then saves its answer in a table, thereby avoiding the work of recomputing the answer every time it solves each subproblem.

Dynamic programming typically applies to optimization problems. Such problems can have many possible solutions. Each solution has a value, and you want to find a solution with the optimal (minimum or maximum) value. We call such a solution an optimal solution to the problem, as opposed to the optimal solution, since there may be several solutions that achieve the optimal value.

To develop a dynamic-programming algorithm, follow a sequence of four steps:

  • Characterize the structure of an optimal solution.
  • Recursively define the value of an optimal solution.
  • Compute the value of an optimal solution, typically in a bottom-up fashion.
  • Construct an optimal solution from compute= information.

Steps - form the basis of a dynamic-programming solution to a problem. If you need only the value of an optimal solution, and not the solution itself, then you can omit step . When you do perform step it often pays to maintain additional information during step so that you can easily construct an optimal solution.

Rod Cutting

Our first example uses dynamic programming to solve a simple problem in deciding where to cut steel rods. Serling Enterprises buys long steel rods and cuts them into shorter rods, which it then sells. Each cut is free. The management of Serling Enterprises wants to know the best way to cut up the rods.

Serling Enterprises has a table giving, for the price in dollars that they charge for a rod of length inches. The length of each rod in inches is always an integer.

The rod-cutting problem is the following. Given a rod of length inches and a table of prices for determine the maximum revenue obtainable by cutting up the rod and selling the pieces. If the price for a rod of length is large enough, an optimal solution might require no cutting at all.

Serling Enterprises can cut up a rod of length in different ways, since they have an independent option of cutting, or not cutting, at distance inches from the left end, for . We denote a decomposition into pieces using ordinary additive notation, so that indicates that a rod of length is cut into three piecestwo of length and one of length . If an optimal solution cuts the rod into pieces, for some then an optimal decomposition of the rod into pieces of lengths provides maximum corresponding revenue More generally, we can express the values for in terms of optimal revenues from shorter rods: The first argument, corresponds to making no cuts at all and selling the rod of length as is. The other arguments to max correspond to the maximum revenue obtained by making an initial cut of the rod into two pieces of size and for each and then optimally cutting up those pieces further, obtaining revenues and from those two pieces. Since you don’t know ahead of time which value of optimizes revenue, you have to consider all possible values for and pick the one that maximizes revenue. You also have the option of picking no at all if the greatest revenue comes from selling the rod uncut.

To solve the original problem of size you solve smaller problems of the same type. Once you make the first cut, the two resulting pieces form independent instances of the rod-cutting problem. The overall optimal solution incorporates optimal solutions to the two resulting subproblems, maximizing revenue from each of those two pieces. We say that the rod-cutting problem exhibits optimal substructure: optimal solutions to a problem incorporate optimal solutions to related subproblems, which you may solve independently.

In a related, but slightly simpler, way to arrange a recursive structure for the rod-cutting problem, let’s view a decomposition as consisting of a first piece of length cut off the left-hand end, and then a right-hand remainder of length . Only the remainder, and not the first piece, may be further divided. Think of every decomposition of a length- ro= in this way: as a first piece followed by some decomposition of the remainder. Then we can express the solution with no cuts at all by saying that the first piece has size and revenue and that the remainder has size with corresponding revenue . We thus obtain the following simpler version of equation In this formulation, an optimal solution embodies the solution to only one related subproblem rather than two (the remainder).

Recursive Top-down Implementation

The Cut-Rod procedure implements the computation implicit in equation in a straightforward, top-down, recursive manner.

Procedure takes as input an array of prices and an integer an= it returns the maximum revenue possible for a rod of length . For length no revenue is possible, and so Cut-Rod returns . Procedure initializes the maximum revenue to so that the for loop correctly computes Cut-Rod. A simple induction on proves that this answer is equal to the desired answer using equation .

Cut-Rod

if
return

for to

return

Why is Cut-Rod so inefficient? The problem is that Cut-Rod calls itself recursively over and over again with the same parameter values, which means that it solves the same subproblems repeatedly.

We can design a recursion tree demonstrating what happens for : Cut-Rod calls Cut-Rod for . Equivalently, Cut-Rod calls Cut-Rod for each . When this process unfolds recursively, the amount of work done, as a function of grows explosively.

To analyze the running time of Cut-Rod, let denote the total number of calls made to Cut-Rod for a particular value of . This expression equals the number of nodes in a subtree whose root is labeled in the recursion tree. The count includes the initial call at its root. Thus, and The initial is for the call at the root, and the term counts the number of calls (including recursive calls) due to the call Cut-Rod where . We can show that and so the running time of Cut-Rod is exponential in .

In retrospect, this exponential running time is not so surprising. Cut-Rod explicitly considers all possible ways of cutting up a rod of length .

A rod of length has potential locations to cut. Each possible way to cut up the rod makes a cut at some subset of these locations, including the empty set, which makes for no cuts. Viewing each cut location as a distinct member of a set of elements, you can see that there are subsets. Each leaf in the recursion tree corresponds to one possible way to cut up the rod. Hence, the recursion tree has leaves. The labels on the simple path from the root to a leaf give the sizes of each remaining right-hand piece before making each cut. That is, the labels give the corresponding cut points, measured from the right-hand end of the rod.

Using Dynamic Programming for Optimal Rod Cutting

The dynamic-programming method works as follows. Instead of solving the same subproblems repeatedly, as in the naive recursion solution, arrange for each subproblem to be solved only once. There’s actually an obvious way to do so: the first time you solve a subproblem, save its solution. If you need to refer to this subproblem’s solution again later, just look it up, rather than recomputing it.

Saving subproblem solutions comes with a cost: the additional memory needed to store solutions. Dynamic programming thus serves as an example of a time-memory trade-off. The savings may be dramatic. For example, we’re about to use dynamic programming to go from the exponential-time algorithm for rod cutting down to a -time algorithm. A dynamic-programming approach runs in polynomial time when the number of distinct subproblems involve= is polynomial in the input size and you can solve each such subproblem in polynomial time.

There are usually two equivalent ways to implement a dynamic-programming approach. Solutions to the rod-cutting problem illustrate both of them.

  • The first approach is top-down with memoization. In this approach, you write the procedure recursively in a natural manner, but modified to save the result of each subproblem (usually in an array or hash table). The procedure now first checks to see whether it has previously solved this subproblem. If so, it returns the saved value, saving further computation at this level. If not, the procedure computes the value in the usual manner but also saves it. We say that the recursive procedure has been memoized: it “remembers” what results it has computed previously.

  • The second approach is the bottom-up method. This approach typically depends on some natural notion of the “size” of a subproblem, such that solving any particular subproblem depends only on solving “smaller” subproblems. Solve the subproblems in size order, smallest first, storing the solution to each subproblem when it is first solved. In this way, when solving a particular subproblem, there are already saved solutions for all of the smaller subproblems its solution depends upon. You need to solve each subproblem only once, and when you first see it, you have already solved all of its prerequisite subproblems.

These two approaches yield algorithms with the same asymptotic running time, except in unusual circumstances where the top-down approach does not actually recurse to examine all possible subproblems. The bottom-up approach often has much better constant factors, since it has lower overhead for procedure calls.

The procedures Memoized-Cut-Rod and Memoized-Cut-Rod-Aux demonstrate how to memoize the top-down Cut-Rod procedure. The main procedure Memoized-Cut-Rod initializes a new auxiliary array with the value which, since known revenue values are always nonnegative, is a convenient choice for denoting “unknown.” Memoized-Cut-Rod then calls its helper procedure, Memoized-Cut-Rod-Aux, which is just the memoized version of the exponential-time procedure, Cut-Rod. It first checks to see whether the desired value is already known and, if it is, then lines returns it. Otherwise, lines compute the desired value in the usual manner, saves it in and returns it.

Memoized-Cut-Rod

let be a new array
for to

return Memoized-Cut-Rod

Memoized-Cut-Rod-Aux

if return if else for to return

The bottom-up version, Bottom-Up-Cut-Rod, is even simpler. Using the bottom-up dynamic-programming approach, Bottom-Up-Cut-Rod takes advantage of the natural ordering of the subproblems: a subproblem of size is “smaller” than a subproblem of size if . Thus, the procedure solves subproblems of sizes in that order.

First line of Bottom-Up-Cut-Rod creates a new array in which to save the results of the subproblems, and then next line initializes to since a rod of length earns no revenue. Lines solve each subproblem of size for in order of increasing size. The approach used to solve a problem of a particular size is the same as that used by Cut-Rod, except that procedure now directly references array entry instead of making a recursive call to solve the subproblem of size . Lines then saves in the solution to the subproblem of size . Finally, procedure returns which equals the optimal value .

let be a new array

for to

for to


return

The bottom-up and top-down versions have the same asymptotic running time. The running time of Bottom-Up-Cut-Rod is due to its doubly nested loop structure. The number of iterations of its inner for loop forms an arithmetic series. The running time of its top-down counterpart, Memoized-Cut-Rod, is also although this running time may be a little harder to see. Because a recursive call to solve a previously solved subproblem returns immediately, Memoized-Cut-Rod solves each subproblem just once. It solves subproblems for sizes . To solve a subproblem of size the for loop iterates times. Thus, the total number of iterations of this for loop, over all recursive calls of Memoized-Cut-Rod, forms an arithmetic series, giving a total of iterations, just like the inner for loop of Bottom-Up-Cut-Rod.

Subproblem Graphs

When you think about a dynamic-programming problem, you need to understand the set of subproblems involved and how subproblems depend on one another. The subproblem graph for the problem embodies exactly this information. The subproblem graph has a directed edge from the vertex for subproblem to the vertex for subproblem if determining an optimal solution for subproblem involves directly considering an optimal solution for subproblem . For example, the subproblem graph contains an edge from to if a top-down recursive procedure for solving directly calls itself to solve . You can think of the subproblem graph as a “reduced” or “collapsed” version of the recursion tree for the top-down recursive method, with all nodes for the same subproblem coalesced into a single vertex and all edges directed from parent to child.

The bottom-up method for dynamic programming considers the vertices of the subproblem graph in such an order that you solve the subproblems adjacent to a given subproblem before you solve subproblem . (The adjacency relation in a directed graph is not necessarily symmetric.) In a bottom-up dynamic-programming algorithm, you consider the vertices of the subproblem graph in an order that is a “reverse topological sort,” or a “topological sort of the transpose” of the subproblem graph. In other words, no subproblem is considered until all of the subproblems it depends upon have been solved. Similarly, you can view the top-down method (with memoization) for dynamic programming as a “depth-first search” of the subproblem graph.

The size of the subproblem graph can help you determine the running time of the dynamic-programming algorithm. Since you solve each subproblem just once, the running time is the sum of the times needed to solve each subproblem. Typically, the time to compute the solution to a subproblem is proportional to the degree (number of outgoing edges) of the corresponding vertex in the subproblem graph, and the number of subproblems is equal to the number of vertices in the subproblem graph. In this common case, the running time of dynamic programming is linear in the number of vertices and edges.

Reconstructing a Solution

The procedures Memoized-Cut-Rod and Bottom-Up-Cut-Rod return the value of an optimal solution to the rod-cutting problem, but they do not return the solution itself: a list of piece sizes.

Let’s see how to extend the dynamic-programming approach to record not only the optimal value computed for each subproblem, but also a choice that led to the optimal value. With this information, you can readily print an optimal solution. The procedure Extended-Bottom-Up-Cut-Rod computes, for each rod size not only the maximum revenue but also the optimal size of the first piece to cut off. It’s similar to Bottom-Up-Cut-Rod, except that it creates the array an= it updates to hold the optimal size of the first piece to cut off when solving a subproblem of size .

The procedure Print-Cut-Rod-Solution takes as input an array of prices and a rod size . It calls Extended-Bottom-Up-Cut-Rod to compute the array of optimal first-piece sizes. Then it prints out the complete list of piece sizes in an optimal decomposition of a rod of length .

Extended-Bottom-Up-Cut-Rod

let and be new arrays

for to

for to
if



return and

Print-Cut-Rod-Solution

Extended-Bottom-Up-Cut-Rod
while
print

Matrix-chain Multiplication

Our next example of dynamic programming is an algorithm that solves the problem of matrix-chain multiplication. Given a sequence (chain) of matrices to be multiplied, where the matrices aren’t necessarily square, the goal is to compute the product using the standard algorithm for multiplying rectangular matrices, while minimizing the number of scalar multiplications.

You can evaluate the expression using the algorithm for multiplying pairs of rectangular matrices as a subroutine once you have parenthesize= it to resolve all ambiguities in how the matrices are multiplied together. Matrix multiplication is associative, and so all parenthesizations yield the same product. A product of matrices is fully parenthesized if it is either a single matrix or the product of two fully parenthesized matrix products, surrounded by parentheses. For example, if the chain of matrices is then you can fully parenthesize the product in five distinct ways: How you parenthesize a chain of matrices can have a dramatic impact on the cost of evaluating the product. Consider first the cost of multiplying two rectangular matrices. The standard algorithm is given by the procedure Rectangular-Matrix-Multiply, which generalizes the square-matrix multiplication procedure Matrix-Multiply. The Rectangular-Matrix-Multiply procedure computes for three matrices and where is is , and is .

Rectangular-Matrix-Multiply

for to
for to
for to

The running time of Rectangular-Matrix-Multiply is dominated by the number of scalar multiplications in last line, which is . Therefore, we’ll consider the cost of multiplying matrices to be the number of scalar multiplications. (The number of scalar multiplications dominates even if we consider initializing to perform just .)

We state the matrix-chain multiplication problem as follows:

  • Given a chain of matrices, where for matrix has dimension fully parenthesize the product in a way that minimizes the number of scalar multiplications. The input is the sequence of dimensions .

The matrix-chain multiplication problem does not entail actually multiplying matrices. The goal is only to determine an order for multiplying matrices that has the lowest cost. Typically, the time invested in determining this optimal order is more than paid for by the time saved later on when actually performing the matrix multiplications.

Counting the Number of Parenthesizations

Before solving the matrix-chain multiplication problem by dynamic programming, let us convince ourselves that exhaustively checking all possible parenthesizations is not an efficient algorithm. Denote the number of alternative parenthesizations of a sequence of matrices by . When the sequence consists of just one matrix, and therefore there is only one way to fully parenthesize the matrix product. When a fully parenthesized matrix product is the product of two fully parenthesized matrix subproducts, and the split between the two subproducts may occur between the th and st matrices for any . Thus, we obtain the recurrence We claim that the solution to a similar recurrence is the sequence of Catalan numbers, which grows as . We can also show that the solution to the recurrence is . The number of solutions is thus exponential in n, and the brute-force method of exhaustive search makes for a poor strategy when determining how to optimally parenthesize a matrix chain.

Applying Dynamic Programming

Let’s use the dynamic-programming method to determine how to optimally parenthesize a matrix chain, by following the four-step sequence that we stated at the beginning of this chapter:

Step 1: The Structure of an Optimal Parenthesization

In the first step of the dynamic-programming method, you find the optimal substructure and then use it to construct an optimal solution to the problem from optimal solutions to subproblems. To perform this step for the matrix-chain multiplication problem, it’s convenient to first introduce some notation. Let where denote the matrix that results from evaluating the product . If the problem is nontrivial, that is, then to parenthesize the product the product must split between and for some integer in the range . That is, for some value of first compute the matrices and and then multiply them together to produce the final product . The cost of parenthesizing this way is the cost of computing the matrix plus the cost of computing plus the cost of multiplying them together.

The optimal substructure of this problem is as follows. Suppose that to optimally parenthesize you split the product between and . Then the way you parenthesize the “prefix” subchain within this optimal parenthesization of must be an optimal parenthesization of . If there were a less costly way to parenthesize then you could substitute that parenthesization in the optimal parenthesization of to produce another way to parenthesize whose cost is lower than the optimum: a contradiction. A similar observation holds for how to parenthesize the subchain in the optimal parenthesization of it must be an optimal parenthesization of .

Now let’s use the optimal substructure to show how to construct an optimal solution to the problem from optimal solutions to subproblems. Any solution to a nontrivial instance of the matrix-chain multiplication problem requires splitting the product, and any optimal solution contains within it optimal solutions to subproblem instances. Thus, to build an optimal solution to an instance of the matrix-chain multiplication problem, split the problem into two subproblems, find optimal solutions to the two subproblem instances, and then combine these optimal subproblem solutions. To ensure that you’ve examined the optimal split, you must consider all possible splits.

Step 2: A Recursive Solution

The next step is to define the cost of an optimal solution recursively in terms of the optimal solutions to subproblems. For the matrix-chain multiplication problem, a subproblem is to determine the minimum cost of parenthesizing for . Given the input dimensions an index pair specifies a subproblem. Let be the minimum number of scalar multiplications needed to compute the matrix . For the full problem, the lowest-cost way to compute is thus .

We can define recursively as follows. If the problem is trivial: the chain consists of just one matrix so that no scalar multiplications are necessary to compute the product. Thus, for . To compute when we take advantage of the structure of an optimal solution from step . Suppose that an optimal parenthesization splits the product between and where . Then, equals the minimum cost for computing the subproduct plus the minimum cost for computing the subproduct, plus the cost of multiplying these two matrices together. Because each matrix is computing the matrix product takes scalar multiplications. Thus, we obtain This recursive equation assumes that you know the value of . But you don’t, at least not yet. You have to try all possible values of . How many are there? Just namely . Since the optimal parenthesization must use one of these values for you need only check them all to find the best. Thus, the recursive definition for the minimum cost of parenthesizing the product becomes The values give the costs of optimal solutions to subproblems, but they do not provide all the information you need to construct an optimal solution. To help you do so, let’s define to be a value of at which you split the product in an optimal parenthesization. That is, equals a value such that .

Step 3: Computing the Optimal Costs

At this point, you could write a recursive algorithm based on recurrence to compute the minimum cost for multiplying . But as we saw for the rod-cutting problem, this recursive algorithm takes exponential time.

Fortunately, there aren’t all that many distinct subproblems: just one subproblem for each choice of and satisfying or in all. A recursive algorithm may encounter each subproblem many times in different branches of its recursion tree. This property of overlapping subproblems is the second hallmark of when dynamic programming applies (the first hallmark being optimal substructure).

Instead of computing the solution to recurrence recursively, let’s compute the optimal cost by using a tabular, bottom-up approach, as in the procedure Matrix-Chain-Order. The input is a sequence of matrix dimensions, along with so that for matrix has dimensions . The procedure uses an auxiliary table to store the costs and another auxiliary table that records which index achieved the optimal cost in computing . The table will help in constructing an optimal solution.

Matrix-Chain-Order

let and be new tables
for to


for to
for to


for to

if


return and

In what order should the algorithm fill in the table entries? To answer this question, let’s see which entries of the table need to be accessed when computing the cost . Equation tells us that to compute the cost of matrix product first the costs of the products and need to have been computed for all . The chain consists of matrices, and the chains and consist of and matrices, respectively. Since a chain of matrices consists of fewer than matrices. Likewise, since a chain of matrices consists of fewer than matrices. Thus, the algorithm should fill in the table from shorter matrix chains to longer matrix chains. That is, for the subproblem of optimally parenthesizing the chain it makes sense to consider the subproblem size as the length of the chain.

Now, let’s see how the Matrix-Chain-Order procedure fills in the entries in order of increasing chain length. Lines initialize for since any matrix chain with just one matrix requires no scalar multiplications. In the first for loop, the loop variable denotes the length of matrix chains whose minimum costs are being computed. Each iteration of this loop uses recurrence to compute for . In the first iteration, and so the loop computes for : the minimum costs for chains of length . The second time through the loop, it computes for the minimum costs for chains of length . And so on, ending with a single matrix chain of length and computing . When lines compute an cost, this cost depends only on table entries and which have already been computed.

A simple inspection of the nested loop structure of Matrix-Chain-Order yields a running time of for the algorithm. The loops are nested three deep, and each loop index and takes on at most values. We claim that the running time of this algorithm is in fact also . The algorithm requires space to store the and tables. Thus, Matrix-Chain-Order is much more efficient than the exponential-time method of enumerating all possible parenthesizations and checking each one.

Step 4: Constructing an Optimal Solution

Although Matrix-Chain-Order determines the optimal number of scalar multiplications needed to compute a matrix-chain product, it does not directly show how to multiply the matrices. The table provides the information needed to do so. Each entry records a value of such that an optimal parenthesization of splits the product between and . The final matrix multiplication in computing optimally is . The table contains the information needed to determine the earlier matrix multiplications as well, using recursion: determines the last matrix multiplication when computing and determines the last matrix multiplication when computing .

The recursive procedure Print-Optimal-Parenthesization prints an optimal parenthesization of the matrix chain product given the table computed by Matrix-Chain-Order and the indices and . The initial call Print-Optimal-Parenthesization prints an optimal parenthesization of the full matrix chain product .

Print-Optimal-Parenthesization

if
print “A”
else
print ""
Print-Optimal-Parenthesization
Print-Optimal-Parenthesization
print ""

Elements of Dynamic Programming

Although you have just seen two complete examples of the dynamic-programming method, you might still be wondering just when the method applies. From an engineering perspective, when should you look for a dynamic-programming solution to a problem?

Optimal Substructure

The first step in solving an optimization problem by dynamic programming is to characterize the structure of an optimal solution. Recall that a problem exhibits optimal substructure if an optimal solution to the problem contains within it optimal solutions to subproblems. When a problem exhibits optimal substructure, that gives you a good clue that dynamic programming might apply. Dynamic programming builds an optimal solution to the problem from optimal solutions to subproblems. Consequently, you must take care to ensure that the range of subproblems you consider includes those used in an optimal solution.

You will find yourself following a common pattern in discovering optimal substructure:

  • You show that a solution to the problem consists of making a choice, such as choosing an initial cut in a rod or choosing an index at which to split the matrix chain. Making this choice leaves one or more subproblems to be solved.

  • You suppose that for a given problem, you are given the choice that leads to an optimal solution. You do not concern yourself yet with how to determine this choice. You just assume that it has been given to you.

  • Given this choice, you determine which subproblems ensue and how to best characterize the resulting space of subproblems.

  • You show that the solutions to the subproblems used within an optimal solution to the problem must themselves be optimal by using a “cut-and-paste” technique. You do so by supposing that each of the subproblem solutions is not optimal and then deriving a contradiction. In particular, by “cutting out” the non-optimal solution to each subproblem and “pasting in” the optimal one, you show that you can get a better solution to the original problem, thus contradicting your supposition that you already had an optimal solution. If an optimal solution gives rise to more than one subproblem, they are typically so similar that you can modify the cut-and-paste argument for one to apply to the others with little effort.

To characterize the space of subproblems, a good rule of thumb says to try to keep the space as simple as possible and then expand it as necessary. For example, the space of subproblems for the rod-cutting problem contained the problems of optimally cutting up a rod of length for each size . This subproblem space worked well, and it was not necessary to try a more general space of subproblems.

Optimal substructure varies across problem domains in two ways:

  • how many subproblems an optimal solution to the original problem uses, and
  • how many choices you have in determining which subproblem(s) to use in an optimal solution.

Informally, the running time of a dynamic-programming algorithm depends on the product of two factors: the number of subproblems overall and how many choices you look at for each subproblem. In rod cutting, we had subproblems overall, and at most choices to examine for each, yielding an running time. Matrix-chain multiplication had subproblems overall, and each had at most choices, giving an running time (actually, a running time. Usually, the subproblem graph gives an alternative way to perform the same analysis. Each vertex corresponds to a subproblem, and the choices for a subproblem are the edges incident from that subproblem.

Dynamic programming often uses optimal substructure in a bottom-up fashion. That is, you first find optimal solutions to subproblems and, having solved the subproblems, you find an optimal solution to the problem. Finding an optimal solution to the problem entails making a choice among subproblems as to which you will use in solving the problem. The cost of the problem solution is usually the subproblem costs plus a cost that is directly attributable to the choice itself.

Next chapter explores “greedy algorithms,” which have many similarities to dynamic programming. In particular, problems to which greedy algorithms apply have optimal substructure. One major difference between greedy algorithms and dynamic programming is that instead of first finding optimal solutions to subproblems and then making an informed choice, greedy algorithms first make a “greedy” choicethe choice that looks best at the timeand then solve a resulting subproblem, without bothering to solve all possible related smaller subproblems.

Overlapping Problems

The second ingredient that an optimization problem must have for dynamic programming to apply is that the space of subproblems must be “small” in the sense that a recursive algorithm for the problem solves the same subproblems over and over, rather than always generating new subproblems. Typically, the total number of distinct subproblems is a polynomial in the input size. When a recursive algorithm revisits the same problem repeatedly, we say that the optimization problem has overlapping subproblems.

In contrast, a problem for which a divide-and-conquer approach is suitable usually generates brand-new problems at each step of the recursion. Dynamic-programming algorithms typically take advantage of overlapping subproblems by solving each subproblem once and then storing the solution in a table where it can be looked up when needed, using constant time per lookup.

If we compare top-down, recursive algorithm (without memoization) with the bottom-up dynamic-programming algorithm; the latter is more efficient because it takes advantage of the overlapping-subproblems property. Matrix-chain multiplication has only distinct subproblems, and the dynamic-programming algorithm solves each exactly once. The recursive algorithm, on the other hand, must solve each subproblem every time it reappears in the recursion tree. Whenever a recursion tree for the natural recursive solution to a problem contains the same subproblem repeatedly, and the total number of distinct subproblems is small, dynamic programming can improve efficiency.

Reconstructing an Optimal Solution : Memoization

As we saw for the rod-cutting problem, there is an alternative approach to dynamic programming that often offers the efficiency of the bottom-up dynamic-programming approach while maintaining a top-down strategy. The idea is to memoize the natural, but inefficient, recursive algorithm. As in the bottom-up approach, you maintain a table with subproblem solutions, but the control structure for filling in the table is more like the recursive algorithm.

A memoized recursive algorithm maintains an entry in a table for the solution to each subproblem. Each table entry initially contains a special value to indicate that the entry has yet to be filled in. When the subproblem is first encountered as the recursive algorithm unfolds, its solution is computed and then stored in the table. Each subsequent encounter of this subproblem simply looks up the value stored in the table and returns it.

In general practice, if all subproblems must be solved at least once, a bottom-up dynamic-programming algorithm usually outperforms the corresponding top-down memoized algorithm by a constant factor, because the bottom-up algorithm has no overhead for recursion and less overhead for maintaining the table. Moreover, for some problems you can exploit the regular pattern of table accesses in the dynamic-programming algorithm to reduce time or space requirements even further. On the other hand, in certain situations, some of the subproblems in the subproblem space might not need to be solved at all. In that case, the memoized solution has the advantage of solving only those subproblems that are definitely required.

Longest Common Subsequence

Biological applications often need to compare the DNA of two (or more) different organisms. A strand of DNA consists of a string of molecules called bases, where the possible bases are adenine, cytosine, guanine, and thymine.

We formalize the notion of similarity between DNA as the longest-common-subsequence problem. A subsequence of a given sequence is just the given sequence with or more elements left out. Formally, given a sequence , another sequence is a subsequence of if there exists a strictly increasing sequence of indices of such that for all we have .

Given two sequences and we say that a sequence is a common subsequence of and if is a subsequence of both and . In the longest-common-subsequence (LCS) problem, the input is two sequences and and the goal is to find a maximum-length common subsequence of and .

Step 1: Characterizing a Longest Common Subsequence

You can solve the LCS problem with a brute-force approach: enumerate all subsequences of and check each subsequence to see whether it is also a subsequence of keeping track of the longest subsequence you find. Each subsequence of corresponds to a subset of the indices of . Because has subsequences, this approach requires exponential time, making it impractical for long sequences.

The LCS problem has an optimal-substructure property, however, as the following theorem shows. As we’ll see, the natural classes of subproblems correspond to pairs of “prefixes” of the two input sequences. To be precise, given a sequence we define the th prefix of for as .

  • Theorem (I) (Optimal Substructure of an LCS) Let and be sequences, and let be any LCS of and .

    • If then and is an LCS of and .
    • If and then is an LCS of and .
    • If and then is an LCS of and .

    Proof:

    • If then we could append to to obtain a common subsequence of and of length contradicting the supposition that is a longest common subsequence of and . Thus, we must have . Now, the prefix is a length- common subsequence of and . We wish to show that it is an LCS. Suppose for the purpose of contradiction that there exists a common subsequence of and with length greater than . Then, appending to produces a common subsequence of and whose length is greater than which is a contradiction.
    • If then is a common subsequence of and . If there were a common subsequence of and with length greater than then would also be a common subsequence of and contradicting the assumption that is an LCS of and .
    • If then is a common subsequence of and . If there were a common subsequence of and with length greater than then would also be a common subsequence of and contradicting the assumption that is an LCS of and .

The way that Theorem (I) characterizes longest common subsequences says that an LCS of two sequences contains within it an LCS of prefixes of the two sequences. Thus, the LCS problem has an optimal-substructure property. A recursive solution also has overlapping-subproblems property.

Step 2: A Recursive Solution

Theorem (I) implies that you should examine either one or two subproblems when finding an LCS of and :

  • If you need to find an LCS of and . Appending to this LCS yields an LCS of and .
  • If then you have to solve two subproblems: finding an LCS of and and finding an LCS of and .

Whichever of these two LCSs is longer is an LCS of and . Because these cases exhaust all possibilities, one of the optimal subproblem solutions must appear within an LCS of and .

The LCS problem has the overlapping-subproblems property. To find an LCS of and you might need to find the LCSs of and and of and . But each of these subproblems has the subsubproblem of finding an LCS of and . Many other subproblems share subsubproblems.

As in the matrix-chain multiplication problem, solving the LCS problem recursively involves establishing a recurrence for the value of an optimal solution. Let’s define to be the length of an LCS of the sequences and . If either or one of the sequences has length and so the LCS has length . The optimal substructure of the LCS problem gives the recursive formula In this recursive formulation, a condition in the problem restricts which subproblems to consider. When you can and should consider the subproblem of finding an LCS of and . Otherwise, you instead consider the two subproblems of finding an LCS of and and of and . In the previous dynamic-programming algorithms we have examined, we didn’t rule out any subproblems due to conditions in the problem. Finding an LCS is not the only dynamic-programming algorithm that rules out subproblems based on conditions in the problem.

Step 3: Computing the Length of an LCS

Based on equation you could write an exponential-time recursive algorithm to compute the length of an LCS of two sequences. Since the LCS problem has only distinct subproblems (computing for and dynamic programming can compute the solutions bottom up.

The procedure LCS-Length takes two sequences and as inputs, along with their lengths. It stores the values in a table and it computes the entries in row-major order. That is, the procedure fills in the first row of from left to right, then the second row, and so on. The procedure also maintains the table to help in constructing an optimal solution. Intuitively, points to the table entry corresponding to the optimal subproblem solution chosen when computing . The procedure returns the and tables, where contains the length of an LCS of and . The running time of the procedure is since each table entry takes time to compute.

LCS-Length

let and be new tables
for to

for to

for to
for to
if


else if


else


return and

Step 4: Constructing an LCS

With the table returned by LCS-Length, you can construct an LCS of and . Begin at and trace through the table by following the arrows. Each encountered in an entry implies that is an element of the LCS that LCS-Length found. This method gives you the elements of this LCS in reverse order. The recursive procedure Print-LCS prints out an LCS of and in the proper, forward order.

Print-LCS

if or
return
if
Print-LCS
print
else if
Print-LCS
else
Print-LCS

The initial call is Print-LCS. The procedure takes time, since it decrements at least one of and in each recursive call.

Improving the Code

Once you have developed an algorithm, you will often find that you can improve on the time or space it uses. Some changes can simplify the code and improve constant factors but otherwise yield no asymptotic improvement in performance. Others can yield substantial asymptotic savings in time and space.

In the LCS algorithm, for example, you can eliminate the table altogether. Each entry depends on only three other table entries: and . Given the value of you can determine in time which of these three values was used to compute without inspecting table . Thus, you can reconstruct an LCS in time using a procedure similar to Print-LCS. Although this method saves space, the auxiliary space requirement for computing an LCS does not asymptotically decrease, since the table takes space anyway.

You can, however, reduce the asymptotic space requirements for LCS-Length, since it needs only two rows of table at a time: the row being computed and the previous row. (In fact, you can use only slightly more than the space for one row of c to compute the length of an LCS.) This improvement works if you need only the length of an LCS. If you need to reconstruct the elements of an LCS, the smaller table does not keep enough information to retrace the algorithm’s steps in time.

Optimal Binary Search Trees

Suppose that you are designing a program to translate text from English to Latvian. For each occurrence of each English word in the text, you need to look up its Latvian equivalent. You can perform these lookup operations by building a binary search tree with n English words as keys and their Latvian equivalents as satellite data. Because you will search the tree for each individual word in the text, you want the total time spent searching to be as low as possible. You can ensure an search time per occurrence by using a red-black tree or any other balanced binary search tree. Words appear with different frequencies. You want words that occur frequently in the text to be placed nearer the root.

How can you organize a binary search tree so as to minimize the number of nodes visited in all searches, given that you know how often each word occurs?

What you need is an optimal binary search tree. Formally, given a sequence of distinct keys such that build a binary search tree containing them. For each key you are given the probability that any given search is for key . Since some searches may be for values not in you also have “dummy” keys representing those values. In particular, represents all values less than represents all values greater than and for the dummy key represents all values between and . For each dummy key you have the probability that a search corresponds to . (Each key is an internal node, and each dummy key is a leaf.) Since every search is either successful (finding some key ) or unsuccessful (finding some dummy key we have Knowing the probabilities of searches for each key and each dummy key allows us to determine the expected cost of a search in a given binary search tree . Let us assume that the actual cost of a search equals the number of nodes examined, which is the depth of the node found by the search in plus . Then the expected cost of a search in is where denotes a node’s depth in the tree . The last equation follows from equation .

For a given set of probabilities, your goal is to construct a binary search tree whose expected search cost is smallest. We call such a tree an optimal binary search tree. An optimal binary search tree is not necessarily a tree whose overall height is smallest. Nor does an optimal binary search tree always have the key with the greatest probability at the root.

As with matrix-chain multiplication, exhaustive checking of all possibilities fails to yield an efficient algorithm. You can label the nodes of any -node binary tree with the keys to construct a binary search tree, and then add in the dummy keys as leaves. Earlier, we saw that the number of binary trees with nodes is . Thus you would need to examine an exponential number of binary search trees to perform an exhaustive search. We’ll see how to solve this problem more efficiently with dynamic programming.

Step 1: The Structure of an Optimal Binary Search Tree

To characterize the optimal substructure of optimal binary search trees, we start with an observation about subtrees. Consider any subtree of a binary search tree. It must contain keys in a contiguous range for some . In addition, a subtree that contains keys must also have as its leaves the dummy keys .

Now we can state the optimal substructure:

  • If an optimal binary search tree has a subtree containing keys then this subtree must be optimal as well for the subproblem with keys and dummy keys . The usual cut-and-paste argument applies. If there were a subtree whose expected cost is lower than that of then cutting out of and pasting in would result in a binary search tree of lower expected cost than thus contradicting the optimality of .

With the optimal substructure in hand, here is how to construct an optimal solution to the problem from optimal solutions to subproblems. Given keys one of these keys, say is the root of an optimal subtree containing these keys. The left subtree of the root contains the keys (and dummy keys ), and the right subtree contains the keys (and dummy keys ). As long as you examine all candidate roots where and you determine all optimal binary search trees containing and those containing you are guaranteed to find an optimal binary search tree.

There is one technical detail worth understanding about “empty” subtrees. Suppose that in a subtree with keys you select as the root. By the above argument, ‘s left subtree contains the keys : no keys at all. Bear in mind, however, that subtrees also contain dummy keys. We adopt the convention that a subtree containing keys has no actual keys but does contain the single dummy key . Symmetrically, if you select as the root, then ‘s right subtree contains the keys . This right subtree contains no actual keys, but it does contain the dummy key .

Step 2: A Recursive Solution

To define the value of an optimal solution recursively, the subproblem domain is finding an optimal binary search tree containing the keys where and . (When there is just the dummy key but no actual keys.) Let denote the expected cost of searching an optimal binary search tree containing the keys . Your goal is to compute the expected cost of searching an optimal binary search tree for all the actual and dummy keys.

The easy case occurs when . Then the subproblem consists of just the dummy key . The expected search cost is .

When you need to select a root from among and then make an optimal binary search tree with keys as its left subtree and an optimal binary search tree with keys as its right subtree.

What happens to the expected search cost of a subtree when it becomes a subtree of a node? The depth of each node in the subtree increases by . By equation the expected search cost of this subtree increases by the sum of all the probabilities in the subtree. For a subtree with keys denote this sum of probabilities as Thus, if is the root of an optimal subtree containing keys we have Noting that we rewrite as The recursive equation assumes that you know which node to use as the root. Of course, you choose the root that gives the lowest expected search cost, giving the final recursive formulation: The values give the expected search costs in optimal binary search trees.

To help keep track of the structure of optimal binary search trees, define for to be the index for which is the root of an optimal binary search tree containing keys .

Step 3: Computing the Expected Search Cost of an Optimal BST

At this point, you may have noticed some similarities between our characterizations of optimal binary search trees and matrix-chain multiplication. For both problem domains, the subproblems consist of contiguous index subranges. A direct, recursive implementation of equation would be just as inefficient as a direct, recursive matrix-chain multiplication algorithm. Instead, you can store the values in a table . The first index needs to run to rather than because in order to have a subtree containing only the dummy key you need to compute and store . The second index needs to start from because in order to have a subtree containing only the dummy key you need to compute and store . Only the entries for which are filled in. The table records the root of the subtree containing keys and uses only the entries for which .

One other table makes the dynamic-programming algorithm a little faster. Instead of computing the value of from scratch every time you compute which would take additions, store these values in a table .

For the base case, compute for . For compute Thus, you can compute the values of in time each.

The Optimal-BST procedure takes as inputs the probabilities and and the size and it returns the tables and .

Optimal-BST

let and be new tables
for to



for to
for to




for to

if


return and

The very first for loop initializes the values of and . Then the second for loop uses the recurrences and to compute e and for all . In the first iteration, when the loop computes and for . The second iteration, with computes and for and so on. The innermost for loop, tries each candidate index to determine which key to use as the root of an optimal binary search tree containing keys . This for loop saves the current value of the index in whenever it finds a better key to use as the root.

The Optimal-BST procedure takes time, just like Matrix-Chain-Order. Its running time is since its for loops are nested three deep and each loop index takes on at most values. The loop indices in Optimal-BST do not have exactly the same bounds as those in Matrix-Chain-Order, but they are within at most in all directions. Thus, like Matrix-Chain-Order, the Optimal-BST procedure takes time.