Alpha-beta search

There's a method can use to speed up game tree search, sort of like the ``stop when you get to a win'' technique you may have used in Assignment 5, but a little trickier. It's called alpha-beta search. (Yeah, it's a stupid name. It comes from arbitrary variable names in the code; see the pseudocode below.)

Idea: say you have a tree and have figured out the following so far.

                 o        MAX    (currently we know this is worth at least 40)
               /   \
             40     o     MIN    (we know this node is worth at most 30)
                   / \
                 30   ??  MAX

Do we need to search the ``??''? No. Because no matter what we find it won't affect the outcome.

Here's another way of looking at it: maybe we originally defined ``100'' to be a win for MAX and ``0'' to be a win for MIN. But, right now we can think of ``<= 40'' as being a win for MIN. Why? Because if MIN ever finds a move that gets it <= 40, MAX will use its other option.

Same thing can happen in other direction:

                  o     MIN
                /   \
              40     o  MAX
                    / \
                   50  ??

Idea of alpha-beta pruning: We will think of the range of the values as not being from 0 to 100, but just from alpha to beta. E.g., MIN passes its ``best for MIN so far'' (beta) to MAX, and if MAX ever finds that its current best is >= beta, it knows it can return immediately.

Can be useful even in win/lose/tie setting. E.g.,

                   o    X player
                 /   \
               TIE     o O player
                      / \
                    TIE  ??  <--don't need to evaluate.

I.e., X tells O that ``as far as I'm concerned, you can treat a TIE as if it were a WIN for you.''

In pseudocode, this looks like the following. Assume that the heuristic function is always between -99999 and 99999 and represents the goodness of a board for a given player. (The following is from Seth's code; it may be buggy.)

SEARCH_FOR_PLAY(board, player, alpha, beta, depth)
  // base case
  if board is a win for player       then return  100000
  else if board is a tie             then return       0
  else if board is a loss for player then return -100000
  else if depth = 0                  then return HEURISTIC(board, player)
  fi

  for each legal move on board do
    try move
	// now recurse; switch the notion of alpha and beta
    value <- OPTIMAL_PLAY(new board, other player, beta, alpha, depth - 1)
    undo move

    // swap value because it's computed for other player
    value <- -value

	// improve alpha if possible
    if value > alpha then
      alpha <- value
      if alpha >= -beta then
        // we don't need to search this node more
        return -beta
      fi
    fi
  od
  return alpha

(A note about my pseudocode: like many other computer scientists, I use fi to end if statements and od to end do loops. Don't let those lines confuse you.)