Our hypothesis is that the complexification process outlined above allows discovering more sophisticated strategies, i.e. strategies that are more effective, flexible, and general, and include more components and variations than do strategies obtained through search in a fixed space. To demonstrate this effect, we need a domain where it is possible to develop a wide range increasingly sophisticated strategies, and where sophistication can be readily measured. A coevolution domain is particularly appropriate because a sustained arms race should lead to increasing sophistication.
In choosing the domain, it is difficult to strike a balance between being able to evolve complex strategies and being able to analyze and understand them. Pursuit and evasion tasks have been utilized for this purpose in the past , and can serve as a benchmark domain for complexifying coevolution as well. While past experiments evolved either a predator or a prey, an interesting coevolution task can be established if the agents are instead equal and engaged in a duel. To win, an agent must develop a strategy that outwits that of its opponent, utilizing structure in the environment.
In this paper we introduce such a duel domain, in which two simulated robots try to overpower each other (Figure 4). The two robots begin on opposite sides of a rectangular room facing away from each other. As the robots move, they lose energy in proportion to the amount of force they apply to their wheels. Although the robots never run out of energy (they are given enough to survive the entire competition), the robot with higher energy wins when it collides with its competitor. In addition, each robot has a sensor indicating the difference in energy between itself and the other robot. To keep their energies high, the robots can consume food items, arranged in a symmetrical pattern in the room.
The robot duel task supports a broad range of sophisticated strategies that are easy to observe and interpret. The competitors must become proficient at foraging, prey capture, and escaping predators. In addition, they must be able to quickly switch from one behavior to another. The task is well-suited to competitive coevolution because naive strategies such as forage-then-attack can be complexified into more sophisticated strategies such as luring the opponent to waste its energy before attacking.
The simulated robots are similar to Kheperas (Mondada et al. 1993; Figure 6). Each has two wheels controlled by separate motors. Five rangefinder sensors can sense food and another five can sense the other robot. Finally, each robot has an energy-difference sensor, and a single wall sensor.
The robots are controlled with neural networks evolved with NEAT. The networks receive all of the robot sensors as inputs, as well as a constant bias that NEAT can use to change the activation thresholds of neurons. They produce three motor outputs: Two to encode rotation either right or left, and a third to indicate forward motion power. These three values are then translated into forces to be applied to the left and right wheels of the robot.
The state of the world at time
is defined by the
positions of the robots and food, the energy levels of
the robots, and the internal states (i.e. neural activation)
of the robots' neural networks, including sensors, outputs,
and hidden nodes. The subsequent state
is
determined by
the outputs of the robots' neural network controllers, computed
from the inputs (i.e. sensor values) in
in one
step of propagation through the network.
The robots change their position in
according to their
neural network outputs as follows. The change in direction
of motion is proportional to the difference between the left and right motor
outputs.
The robot drives forward
a distance proportional to the forward output
on a continuous board of size 600 by 600.
The robot first makes
half its turn, then moves forward, then completes the
second half of its turn, so that the turning and forward
motions are effectively combined. If the robot encounters food, it
receives an energy boost, and the food disappears from the
world. The loss
of energy due to movement is computed as the sum of the turn
angle and the forward motion, so that even turning in place takes
energy. If the robots are within a distance of 20,
a collision occurs and the robot with a higher energy wins
(see Appendix B for the exact parameter values used).
Since the observed state
taken by the sensors does not include the internal state of the
opponent robot, the robot duel is a partially-observable
Markov decision process (POMDP). Since the next observed state
depends on the decision of the opponent, it is necessary for
robots to learn to predict what the opponent is likely to do,
based on their past
behavior, and also based on what is reasonable behavior in
general. For example, it is reasonable to assume that if the opponent
robot is quickly approaching and has higher energy, it is probably
trying to collide. Because an important
and complex portion of
is not observable, memory, and hence
recurrent connections, are crucial for success.
This complex robot-control domain allows competitive coevolution to evolve increasingly sophisticated and complex strategies, and can be used to understand complexification, as will be described next.