15-451 Algorithms 11/07/06 * NP-completeness and expressiveness * informal definitions * formal definitions * Circuit-SAT and 3-SAT ============================================================================ In the last few classes, we've looked at increasingly more expressive problems: network flow, min cost max flow, linear programming. These problems have the property that you can code up a lot of different problems in their "language". So, by solving these well, we end up having some important hammers we can use to solve other problems. In fact, to talk about this a little more precisely, let's make the following definitions: * We'll say that an algorithm runs in Polynomial Time if for some constant c, its running time is O(n^c), where n is the size of the input. "Size of input" means "number of bits it takes to write the input down" * Problem A is poly-time REDUCIBLE to problem B (A <=_p B) if given a poly-time alg for B, we can use it to produce a poly-time alg for A. Problem A is poly-time EQUIVALENT to problem B (A =_p B) if A <=_p B and B <=_p A. For instance, we showed BIPARTITE MATCHING <=_p MAX FLOW MIN-COST MAX-FLOW <=_p LINEAR PROGRAMMING and more. =============================================================== Let's start today by thinking about what would be a *really* expressive problem, such that if we could solve it we could do all sorts of things. Here is a natural candidate: The SHORT SOLUTION EXISTENCE PROBLEM: Given an algorithm V(I,X) [written in some standard programming language, and think of it as outputting either YES or NO], and given I and a bound B written in unary (B bits). Question: does there exist an input X such that V(I,X) halts in at most B steps and outputs YES? Why am I calling this the "short solution existence problem"? Consider some problem we might want to solve like the TRAVELING-SALESMAN-PROBLEM: given a graph G and an integer k, is there a tour that visits all the nodes of G and has total length <= k? We don't know any fast ways of solving that problem, but we can easily write a program V that given I = (G,k) and given a proposed solution X verifies whether X indeed visits all the nodes of G and has total length <= k. Furthermore, this solution-verifier is linear time. So, if we could solve the "short solution existence problem", we could tell if there is a X that makes V output YES and thereby solve the TSP. What if we actually wanted to find the tour? One way we could solve that would be to start deleting edges from G and then re-running the above procedure. If the answer is "NO" then we put the edge back in. If we do this for all edges, what's left in G will be just the tour itself. Let's try another problem. Say we wanted to factor a big integer N. We don't know any polynomial-time algorithms for solving that problem. But, we can easily write a verifier that given N and a proposed factor X, tells us if X is a solution to the problem. In fact, let's modify this slightly (you'll see why in a second) so the verifier takes in an additional integer k (so I = (N,k)) and outputs YES if X divides N *and* 1 < X < k. So, if we can solve the short-solution-existence-problem, we can tell if N has a factor between 2 and k-1 by feeding V and I=(N,k) into our solver. Then if we want to actually *find* a factor, we can do binary search on k. (That's why we needed the k). In fact, we could use an algorithm for the "short solution existence problem" to solve *any* problem for which we have a polynomial-time algorithm for simply *checking* if a proposed solution is correct: we just write down V and then make the length bound B big enough to handle the running time of V. Interestingly, the short-solution-existence-problem also belongs to this same category. Namely if someone hands us a proposed solution X, we can check it by just running V. WHAT WE NOW HAVE ================ This class of problems - problems for which we can efficiently verify a proposed solution - is called NP. A problem Q is said to be NP-complete if (a) Q is in NP and (2) you could use a polynomial-time algorithm for Q to solve *any* problem in NP in polynomial time. That is, for any Q' in NP, Q' <=_p Q. And we have just proven our first NP-complete problem, namely the SSEP. Now this problem seems pretty stylized. But we can now show that other simpler-looking problems have the property that if you could solve them in polynomial-time, then you could solve this problem in polynomial time, so they too are NP-complete. So, the way to think of it is an NP-complete problem is super-expressive. It is so expressive, we believe there are no polynomial-time algorithms for solving them. In particular, if we *could* solve an NP-complete problem in polynomial-time, then it would mean that for any problem where we could *check* a proposed solution efficiently, we could also *find* such a solution efficiently. ========================================================== Now, onto formal definitions. We will formally be considering decision problems: problems whose answer is YES or NO. COMPLEXITY CLASSES: P and NP. P: problems solvable in polynomial time. E.g., Given a network G and a flow value f, does there exist a flow >= f? NP: problems that have polynomial-time verifiers. Specificially, problem Q is in NP if there is a polynomial-time algorithm V(I,X) such that: If "I" is a YES-instance, then there exists X such that V(I,X) = 1. If "I" is a NO-instance, then for all X, V(I,X) = 0. Furthermore, X should have length polynomial in size of I (since we are really only giving V time polynomial in the size of the instance). "X" is often called a "witness". E.g., consider 3-coloring. The witness that an answer is YES is the coloring. The verifier just checks that all edges are correct and that at most 3 colors are used. So, 3-coloring is in NP. NP is like: "I might not know how to find it, but I'll know it when I see it" (BTW, Co-NP is the class of problems that work the other way around). It's pretty clear that P is contained in NP. Huge open question in complexity theory is P=NP? It would be really weird if they are equal so most people believe P != NP. But, it's very hard to prove that a fast algorithm for something does NOT exist. So, it's still an open problem. NP-Completeness =============== Problem Q is NP-complete if: (1) Q is in NP, and (2) Any other problem Q' in NP is polynomial-time reducible to Q. So if you could solve Q in polynomial time, you could solve *any* problem in NP in polynomial time. If Q just satisfies (2) then it's called NP-hard. Here's another NP-complete problem: CIRCUIT-SAT: Given a circuit of NAND gates with a single output and no loops (some of the inputs may be hardwired). Question: is there a setting of the inputs that causes the circuit to output 1? Proof Sketch: First of all, it is clearly in NP, since you can just guess the input and try it. To show it is NP-complete, we just need to show that if we could solve this, then we could solve the short-solution-existence-problem. Here's how. Say we are given V, I, B, and want to tell if there exists X such that V(I,X) outputs YES in at most than B steps. What we can do (and this part is a little handwavy but in principle you did this in 251 when you built a computer out of NAND gates) is we can effectively compile V into a circuit of depth polynomial in B and the number of bits in I that mimics what V does in it first B time steps. We then hardwire the inputs corresponding to I and feed this into our circuit-SAT solver. OK, fine so we now have one more NP-complete problem. But CIRCUIT-SAT looks really complicated. We weren't expecting to be able to solve it. But *now* we can show that a much simpler looking problem has the property that if you could solve it efficiently, then you could solve CIRCUIT-SAT efficiently. ("efficiently" = "polynomial time"). This problem is 3-SAT. 3-SAT: Given: a CNF formula (AND of ORs) over n variables x1,...,xn, where each clause has at most 3 variables in it. (x1 OR x2 OR not(x3)) AND (not(x2) OR x3) AND (x1 OR x3) AND ... Goal: find an assignment to the variables that satisfies the formula if one exists. Claim: we can reduce solving CIRCUIT-SAT to solving 3-SAT. I.e., if we can solve 3-SAT then we can solve CIRCUIT-SAT and so we can solve all of NP. We'll do the reduction next time, but before we end, here is formally how we are going to do reductions: Say we have some problem A that we know is NP-complete. We want to show problem B is NP-complete too. Well, first we show B is actually in NP but that is usually the easy part. The main thing we need to do is show that any polynomial-time algorithm for B would give a polynomial-time algorithm for A. We do this by "reducing A to B", and in particular what we want is this: Reducing problem A to problem B =============================== To reduce problem A to problem B we want a function f that takes instances of A to instances of B such that: (1) if x is a yes-instance of A then f(x) is a yes-instance of B (2) if x is a no-instance of A then f(x) is a no-instance of B (3) f can be computed in polynomial time. So, if we had an algorithm for B, we could using it to solve A by running it on f(x).