The action selection method is designed to make use of memory to select the action most probable to succeed, and to fill memory when no useful memories were available. For example, when the defender is at position , the agent begins by retrieving and as described in Section 2.3.2. Then, it acts according to the following function:
An action is only selected based on the memory values if these values indicate that one action is likely to succeed and that it is better than the other. If, on the other hand, neither value nor indicate a positive likelihood of success, then an action is chosen randomly. The only exception to this last rule is when one of the values is zero, suggesting that there has not yet been any training examples for that action at that memory location. In this case, there is a bias towards exploring the untried action in order to fill out memory.