The open-loop and closed-loop strategies of the previous section differ in their handling of price fluctuation. A fundamental way of taking price fluctuation into account is to place ``safe bids.'' A very high bid exposes an agent to the danger of buying something at a ridiculously high price. If prices are in fact stable then high bids are safe. But if prices fluctuate, then high bids, such as the bids of the stable-price strategy, are risky. In TAC, hotel rooms are sold in a Vickrey-style th price action. There is a separate auction for each day of each hotel and these auctions are done sequentially. Although the order of the auctions is randomized, and not known to the agent, when placing bids in one of these auctions the agent assumes that auction will close next. We assumed in the design of our agent that our bids in one auction do not affect prices in other auctions. This assumption is not strictly true, but in a large economy one expects that the bids of a single individual have a limited effect on prices. Furthermore, the price most affected by a bid is the price of the item being bid on; the effect on other auctions seems less direct and perhaps more limited. Assuming bids in one auction do not affect prices in another, the optimal bidding strategy is the standard strategy for a Vickrey auction--the bid for an item should be equal to its utility to the bidder. So, to place a Vickrey-optimal bid, one must be able to estimate the utility of an item. The utility of owning an item is simply the expected final score assuming one owns the item minus the expected final score assuming one does not own the item. So, the problem of computing a Vickrey-optimal bid can be reduced to the problem of predicting final scores for two alternative game situations. We use two score prediction procedures, which we call the stable-price score predictor (corresponding to Equation 5) and the unstable-price score predictor (Equation 4).
The Stable-Price Score Predictor. The stable-price score predictor first estimates the expected prices in the rest of the game using whatever information is available in the given game situation. It then computes the value achieved by optimal purchases under the estimated prices. In an economy with stable prices, this estimate will be quite accurate--if we make the optimal purchases for the expected price then, if the prices are near our estimates, our performance will also be near the estimated value.
The Unstable-Price Score Predictor. Stable-price score prediction does not take into account the ability of the agent to react to changes in price as the game progresses. Suppose a given room is often cheap but is sometimes expensive. If the agent can first determine the price of the room, and then plan for that price, the agent will do better than guessing the price ahead of time and sticking to the purchases dictated by that price. The unstable price predictor uses a model of the distribution of possible prices. It repeatedly samples prices from this distribution, computes the stable-price score prediction under the sampled price, and then takes the average of these stable-price scores over the various price samples. This score prediction algorithm is similar to the algorithm used in Ginsberg's Ginsberg01 quite successful computer bridge program where the score is predicted by sampling the possible hands of the opponent and, for each sample, computing the score of optimal play in the case where all players have complete information (double dummy play). While this approach has a simple intuitive motivation, it is clearly imperfect. The unstable-price score predictor assumes both that future decisions are made in the presence of complete price information, and that the agent is free to change existing bids in auctions that have not yet closed. Both of these assumptions are only approximately true at best. Ways of compensating for the imperfections in score prediction were described in Section 5.
Buy Now or Decide Later. The trading agent must decide what airline tickets to buy and when to buy them. In deciding whether to buy an airline ticket, the agent can compare the predicted score in the situation where it owns the airline ticket with the predicted score in the situation where it does not own the airline ticket but may buy it later. Airline tickets tend to increase in price, so if the agent knows that a certain ticket is needed it should buy it as soon as possible. But whether or not a given ticket is desirable may depend on the price of hotel rooms, which may become clearer as the game progresses. If airline tickets did not increase in price, as was the case in TAC-00, then they should be bought at the last possible moment [Stone, Littman, Singh, KearnsStone et al.2001]. To determine whether an airline ticket should be bought now or not, one can compare the predicted score in the situation where one has just bought the ticket at its current price with the predicted score in the situation where the price of the ticket is somewhat higher but has not yet been bought. It is interesting to note that if one uses the stable-price score predictor for both of these predictions, and the ticket is purchased in the optimal allocation under the current price estimate, then the predicted score for buying the ticket now will always be higher--increasing the price of the ticket can only reduce the score. However, the unstable-price score predictor can yield an advantage for delaying the purchase. This advantage comes from the fact that buying the ticket may be optimal under some prices but not optimal under others. If the ticket has not yet been bought, then the score will be higher for those sampled prices where the ticket should not be bought. This corresponds to the intuition that in certain cases the purchase should be delayed until more information is available.
Our guiding principle in the design of the agent was, to the greatest extent possible, to have the agent analytically calculate optimal actions. A key component of these calculations is the score predictor, based either on a single estimated assignment of prices or on a model of the probability distribution over assignments of prices. Both score predictors, though clearly imperfect, seem useful. Of these two predictors, only the unstable-price predictor can be used to quantitatively estimate the value of postponing a decision until more information is available. The accuracy of price estimation is clearly of central importance. Future research will undoubtedly focus on ways of improving both price modeling and score prediction based on price modeling.