15-212 Section A Notes

January 21, 1998
Thomas Kang

1. Administrative Details

Just look on the course web page for all the relevant info on me. Also, take a look at the policies for late assignments and cheating. If you have any questions or comments, feel free to e-mail me.

2. Theory of Computation (Or, "Why are we doing this?")

You may be wondering a bit about how all this theoretical mumbo-jumbo fits into the grand scheme of things. After all, this is a course in Java programming, right? Not quite. It's a course on fundamentals of computer science, and there's nothing more fundamental to CS than the stuff we're doing right now. Rest assured that you'll get plenty of exercise with Java later on, but if you really want to gain a deeper understanding of what CS is all about and how computers really work, you have to understand the theory behind the machines.

Computation theory began way back in the beginning of this century, and it tries to define precisely what it means to perform a computation. It does this by creating a mathematical model of computation, thus defining and capturing the essence of the act of computation in the precise and rigorous language of mathematics, which in turn can be studied in its own right. The results of this approach are quite extraordinary, in both theoretical and practical contexts. It enables us to study computations without having to physically construct machines that can perform them; it puts definite bounds on what such machines can and cannot compute, regardless of the resources that they have available to them; and it raises some interesting philosophical issues about our own abilities to think, perceive, and interact.

To define "computation," let's consider this simple working definition:

A computation is the act of generating an output on the basis of an input.

In this course, we'll consider several mathematical models of computation that try to codify this definition in a more precise way. The first model we study is the finite automaton; next, we'll look at the push-down automaton; and finally, we'll come to the Turing machine. Each successive model will prove to be absolutely more "powerful" in some way than the previous one. And when we attempt to find a model that's more powerful than the Turing machine, we'll fail, thus leading us to postulate that the Turing machine is the most powerful model of computation that exists. (This postulate, however, is just a thesis; it cannot be proven mathematically.) Then, by studying the properties of the Turing machine, we'll demonstrate the existence of a class of functions that cannot be computed. And finally, we'll form a classification scheme for those functions that can be computed, based on their "complexity."

3. Languages

We begin with the idea of a language, from a mathematical standpoint. We begin by appealing to our intuitive notion of a language:

A language is made up of a set of words.
Words are strings, which are made up of characters.
Characters are symbols.
The set of symbols is finite; we call it the alphabet.
The number of symbols in any given word is finite.

The underlined terms are the ones we're interested in. Now we just take these observations and just formalize it a bit:

Let alphabet SIGMA be a finite set of symbols; e.g., SIGMA = {a,b,c}.

Let SIGMA* be the set of all finite-length strings that can be formed from the symbols in SIGMA; e.g., SIGMA* = {a, abba, cab, abc, bb, ...}. Note that SIGMA* can be (and often is) an infinite set. Also note that SIGMA* always contains the empty string, which we designate with epsilon.

Let language L be any subset of SIGMA*; e.g., L = {a, abc, ccd}. L can be empty, the entire SIGMA*, or anything in between. You need not explicitly enumerate the elements of L; you can define it by a rule: e.g., L = { all strings in SIGMA* that begin with the letter a } = {a, aa, aaa, abba, acaca, ...}.

So what good is all this? Well, languages give us a way to represent the inputs and the outputs of these models of computation that we want to study. Remember our working definition of computation? If we are to have machines that take in inputs and spit out outputs, we obviously need to nail down what those inputs and outputs look like.

(Something to ponder: so obviously, we'll be representing the inputs and outputs of these computation models using these abstract languages. But what about the models themselves? What do we use to represent these models? How do we write down the specifications of some FA, PDA, or TM? Could we use these abstract languages to do that, too? Hm.... As it turns out, this issue lies at the heart of a topic we'll be covering soon, called undecidability. Very cool stuff.)

More specifically, we're interested in building machines that act like a language checker. In other words, for any given model of computation, we want it to represent some language, so that when we then give it some string as an input, it'll output "YES" if that string is in the language, and "NO" if it's not in the language. Now, this seems like it's not very useful -- after all, we're interested in representing any possible computation you can perform, and this model just seems only to be good for a spell checker or something. But think again; this is actually a very powerful and general model of computation. Using our notion of language, we can easily come up with languages which represent arbitrarily complex computations. For example, we can say L = { all numbers divisible by 17 } and L = { all primes } using an alphabet of the decimal numbers; L = { all polynomials whose roots are complex } using an alphabet of numbers, arithmetic operators, and variables; L = { all physical configurations of the stars in the Milky Way between 1 billion years ago and 2 billion years ago } using an alphabet of numbers to form vectors of numbers that represent each star's position in some coordinate system; and L = { all valid Java programs } using an alphabet of ASCII characters to represent the source code.

4. Finite Automata

So now we're ready to make our first attempt at modeling computation. We call these Finite Automata. And before we go any further, we should straighten out all the different acronyms people use for these. Basically, remember this:

Automaton / Automata
Deterministic
Finite
Machine
Nondeterministic
State

To discuss finite automata in general, most people use FA, FSM, or FSA; they all mean the same thing. To draw the distinction between the deterministic and nondeterministic varieties, most people use DFA/NDFA, DFSM/NDFSM, and DFSM/NFSM, among others. Again, they all mean the same thing. My personal preference is FA/DFA/NDFA; it's easy to say, and the different lengths of the acronyms make it that much harder to get confused about which means what, even when you're talking about them with your friend at 5 in the morning with 4 cups of coffee flowing through your veins.

Let's consider what an FA must do: it must read a string, it must process it in some way, and it must output "YES" if the string is in the language that it represents, or "NO" if the string is not in the language that it represents. For an FA, we define the act of reading the string as taking it in one symbol at a time, and never trying to re-read a character that it read before; i.e., FAs can never backtrack.

So here's the $64,000 question: what goes in the box? We're going to use three things to build the machinery inside the box: states, transitions, and a sheep. On paper, we represent states with empty circles and transitions with arrows. Transitions must always originate from and point to states, and they must always have a label that contains a symbol from the alphabet. And of course, there can only be a finite number of these states and transitions in the box. The sheep? It hangs out inside states, and it can move among states via transitions. It also has a pretty nifty ability, which we'll discuss later.

The rest is pretty simple: we put together a computing machine by taking a bunch of states and connecting them with transitions. We can specify a transition by naming the two states that it connects and the label that it carries. (These two states can be the same, incidentally.) So let (A,s,B) be a transition arrow from state A to state B, with the symbol s as its label. Then here's the meaning of this transition: "If you are in state A, and you read s as the next symbol in the input, then go to state B." There is no limit on the number of transitions that can originate from or end up at any of the states.

Here's something to consider, though: what if, in the course of putting together these transitions, you find that some state ends up with multiple transitions to different states but with the same label? For example, what if you have (A,c,B) and (A,c,H) in the same machine? Then, if you're in state A and the next symbol in the input is c, where do you go -- B or H? There are two ways to handle this. We could say that this is not allowed; for any given state, there can never be more than one transition from that state with a given label. We call the resulting machines DFAs. Or we can allow them and do something extra to deal with this, as will be explained below. We call these NDFAs.

We're almost ready to construct a FA now. The finishing touches involve two more things: the end states and the start state. After we have wired up our FA, we can go back and designate some of those states as "end" states, so that if we finish reading the entire input and somehow we end up in one of these states, we will ouput "YES"; otherwise, we output "NO." And since we don't really want to run around in this machine ourselves -- that would be very tiring -- we will use the sheep to run around in it for us. The state into which we initially place the sheep we designate as the start state.

So to summarize, a FA has five things -- a set of states, an alphabet, a set of transitions, a start state, and a set of end states. If you look in the book, this is the five-tuple that they talk about: (K, SIGMA, DELTA, s, F). Then to run the computation, we just place the sheep in the start state, who then looks at the incoming input one symbol at a time and follows the appropriate transitions accordingly. (It's a very smart sheep.) After the entire input has been read, just look to see if the sheep has ended up in an end state; if so, return "YES," otherwise return "NO."

Now we get to the NDFA wrinkle: what about those ambiguous transitions, where we have multiple choices in terms of which transition to take? In such cases, a remarkable thing happens: the sheep clones itself, as many times as there are choices. Then each sheep takes a different valid transition from that state to the following states. Basically, whenever we have multiple choices, we take all of them. And at the end, if any of the sheep end up in an end state, we return "YES."

It is interesting to note that although NDFAs look like they are vastly superior to DFAs, in fact anything you can do with an NDFA can be done with a DFA. There is even a well-defined method for taking an NDFA and constructing a DFA that does the same computation. When two FAs perform the same computation (albeit possibly in different ways), we say that they are equivalent. Another way of thinking about equivalence is that two machines are equivalent if for every possible input, the two always return identical outputs. So in a general sense, we can say that DFAs and NDFAs are equivalent. The proof, which isn't shown here, depends on the ever-popular pigeonhole principle.

So where does this take us? Well, it turns out that there is a class of languages, which we call regular languages, which are those languages that can be represented with FAs. The set of all regular languages has an interesting property: it is closed under any combination of several special operations. (A sidenote: what does it mean for a set S to be closed under an operation $? It means that if you take any elements of S and apply $ to them, the result is also an element in S.) Let L and M be any regular languages. Then here are the five valid operations that always result in another regular language:

Kleene star (repetition): L* contains all strings formed by taking any string from L and repeating it an arbitrary number of times.
complementation (negation): -L contains all strings that are not in L.
union: L|M contains all strings from both L and M.
intersection: L&M contains all strings that are in both L and M.
concatenation: LM contains all strings formed by taking any string from L and tacking on any string from M at the end of it.

It is easy to show that given a regular language L, you can always construct a FA that accepts only those strings that are in L. It is also possible to show that given a FA, there is always a regular language that contains exactly those strings accepted by that FA, no more and no less. Thus, the set of all regular languages and the set of all FAs are isomorphic; i.e., identical except for the notation.