A reinforcement-learning agent's current state plays a central role in its selection of reward-maximizing actions. Viewing the agent as a state-free black box, a description of the current state is its input. Depending on the agent architecture, its output is either an action selection, or an evaluation of the current state that can be used to select an action. The problem of deciding how the different aspects of an input affect the value of the output is sometimes called the ``structural credit-assignment'' problem. This section examines approaches to generating actions or evaluations as a function of a description of the agent's current state.
The first group of techniques covered here is specialized to the case when reward is not delayed; the second group is more generally applicable.