Abstract: Multi-agent systems are rapidly finding applications
in a variety of domains, including robotics, distributed control, telecommunications, and economics. The complexity of many tasks arising in these domains makes them difficult to solve with preprogrammed agent behaviors. The agents must instead discover a solution on their own, using learning. A significant part of the research on multi-agent learning concerns reinforcementlearning techniques. This paper provides a comprehensive survey of multi-agent reinforcementlearning (MARL). A central issue in the field is the formal statement of the multi-agent learning goal. Different viewpoints on this issue have led to the proposal of many different goals, among which two focal points can be distinguished: stability of the agents' learning dynamics, and adaptation to the changing behavior of the other agents. The MARL algorithms described in the literature aim - either explicitly or implicitly - at one of these two goals or at a combination of both, in a fully cooperative, fully competitive, or more general setting. A representative selection of these algorithms is discussed in detail in this paper, together with the specific issues that arise in each category. Additionally, the benefits and challenges of MARL are described along with some of the problem domains where MARL techniques have been applied. Finally, an outlook for the field is provided.

Abstract: Control theory is a mathematical description of how to act
optimally to gain future rewards. In this paper We discuss
a class of non-linear stochastic control problems that can be
eﬃciently solved using a path integral. In this control formalism, the central concept of cost-to-go or value function
becomes a free energy and methods and concepts from statistical physics can be readily applied, such as Monte Carlo
sampling or the Laplace approximation. When applied to a
receding horizon problem in a stationary environment, the
solution resembles the one obtained by traditional reinforcementlearning with discounted reward. It is shown that this
solution can be computed more eﬃciently than in the discounted reward framework. As shown in previous work, the
approach is easily generalized to time-dependent tasks and
is therefore of great relevance for modeling real-time interactions between agents.

Abstract: Reinforcementlearning (RL) is a widely used
paradigm for learning control. Computing exact RL solutions is
generally only possible when process states and control actions
take values in a small discrete set. In practice, approximate
algorithms are necessary. In this paper, we propose an approximate, model-based Q-iteration algorithm that relies on
a fuzzy partition of the state space, and a discretization of
the action space. Using assumptions on the continuity of the
dynamics and of the reward function, we show that the resulting
algorithm is consistent, i.e., that the optimal solution is obtained
asymptotically as the approximation accuracy increases. An
experimental study indicates that a continuous reward function
is also important for a predictable improvement in performance
as the approximation accuracy increases.

Abstract: Reinforcementlearning (RL) is a widely used learning paradigm for adaptive agents. There exist several convergent and consistent
RL algorithms which have been intensively studied. In their original form,
these algorithms require that the environment states and agent actions
take values in a relatively small discrete set. Fuzzy representations for
approximate, model-free RL have been proposed in the literature for the
more difficult case where the state-action space is continuous. In this
work, we propose a fuzzy approximation architecture similar to those
previously used for Q-learning, but we combine it with the model-based
Q-value iteration algorithm. We prove that the resulting algorithm converges. We also give a modified, asynchronous variant of the algorithm
that converges at least as fast as the original version. An illustrative
simulation example is provided.

Abstract: Abstract. Reinforcementlearning (RL) is a widely used learning paradigm for adaptive agents. Well-understood RL algorithms with good convergence and consistency properties exist. In their original form, these algorithms require that the environment states and agent actions take values in a relatively small discrete set. Fuzzy representations for approximate, model-free RL have been proposed in the literature for the more dificult case where the state-action space is continuous. In this work, we propose a fuzzy approximation structure similar to those previously used for Q-learning, but we combine it with the model-based Q-value iteration algorithm. We show that the resulting algorithm converges. We also give a modied, serial variant of the algorithm that converges at least as fast as the original version. An illustrative simulation example is provided.

Abstract: Multi-agent systems are rapidly finding applications in a variety of domains, including robotics, distributed control, telecommunications, etc. Learning approaches to multi-agent control, many of them based on reinforcementlearning (RL), are investigated in complex domains such as teams of mobile robots. However, the application of decentralized RL to low-level control tasks is not as intensively studied. In this paper, we investigate centralized and decentralized RL, emphasizing the challenges and potential advantages of the latter. These are then illustrated on an example: learning to control a two-link rigid manipulator. Some open issues and future research directions in decentralized RL are outlined.

Abstract: Reinforcementlearning (RL) is a learning control paradigm that provides well-understood algorithms with good convergence and consistency properties. Unfortunately, these algorithms require that process states and control actions take only discrete values. Approximate solutions using fuzzy representations have been proposed in the literature for the case when the states and possibly the actions are continuous. However, the link between these mainly heuristic solutions and the larger body of work on approximate RL, including convergence results, has not been made explicit. In this paper, we propose a fuzzy approximation structure for the Q-value iteration algorithm, and show that the resulting algorithm is convergent. The proof is based on an extension of previous results in approximate RL. We then propose a modified, serial version of the algorithm that is guaranteed to converge at least as fast as the original algorithm. An illustrative simulation example is also provided.

Abstract: Abstract: Reinforcementlearning (RL) is a widely used learning paradigm for adaptive agents.
Because exact RL can only be applied to very simple problems, approximate algorithms are
usually necessary in practice. Many algorithms for approximate RL rely on basis-function
representations of the value function (or of the Q-function). Designing a good set of basis
functions without any prior knowledge of the value function (or of the Q-function) can be a
diﬃcult task. In this paper, we propose instead a technique to optimize the shape of a constant
number of basis functions for the approximate, fuzzy Q-iteration algorithm. In contrast to other
approaches to adapt basis functions for RL, our optimization criterion measures the actual
performance of the computed policies in the task, using simulation from a representative set
of initial states. A complete algorithm, using cross-entropy optimization of triangular fuzzy
membership functions, is given and applied to the car-on-the-hill example.

Abstract: Multi-agent systems are rapidly finding applications in a variety of domains, including robotics, distributed control, telecommunications, economics. Many tasks arising in these domains require that the agents learn behaviors online. A significant part of the research on multi-agent learning concerns reinforcementlearning techniques. However, due to different viewpoints on central issues, such as the formal statement of the learning goal, a large number of different methods and approaches have been introduced. In this paper we aim to present an integrated survey of the field. First, the issue of the multi-agent learning goal is discussed, after which a representative selection of algorithms is reviewed. Finally, open issues are identified and future research directions are outlined.

Abstract: Abstract: Reinforcementlearning (RL) comprises an array of techniques that learn a control
policy soas to maximize a reward signal. When applied to the control of elevator systems, RL
has the potential of ﬁnding better control policies than classical heuristic, suboptimal policies.
On theother hand, elevator systems oﬀer an interesting benchmark application for the study
of RL. In this paper, RL is applied toa single-elevator system. The mathematical model of
the elevator system is described in detail, making the system easy to re-implement and re-use.
An experimental comparison is made between the performance of the Q-value iteration and
Q-learning RL algorithms, when applied to the elevator system.

Abstract: Multi-agent systems are rapidly finding applications in a variety of domains, including robotics, distributed control, telecommunications, etc. Although the individual agents can be programmed in advance, many tasks require that they learn behaviors online. A significant part of the research on multi-agent learning concerns reinforcementlearning techniques. This paper gives a survey of multi-agent reinforcementlearning, starting with a review of the different viewpoints on the learning goal, which is a central issue in the field. Two generic goals are distinguished: stability of the learning dynamics, and adaptation to the other agents' dynamic behavior . The focus on one of these goals, or a combination of both, leads to a categorization of the methods and approaches in the field. The challenges and benefits of multi-agent reinforcementlearning are outlined along with open issues and future research directions.

Abstract: This paper proposes a reinforcementlearning architecture containing multiple "experts", each of which is a specialist in a different region in the overall state space. The central idea is that the different experts use qualitatively different (but sufficiently Markov) state representations, each of
which captures different information regarding the true underlying world state, and which for that reason is suitable for a different part of the state space. The experts themselves learn to switch to another state representation (other expert) by having switching actions. Value functions can be learned using standard reinforcementlearning algorithms. Experiments in a small, proof-of-principle experiment as well as a larger, more realistic experiment illustrate the validity of this approach.

Abstract: In this paper we discuss how the design of an Intelligent Companion constitutes a challenge and a test-bed for computer-based technologies aimed at improving the user's cognitive abilities. We conceive an Intelligent Companion to be an autonomous cognitive system (ACS) that should be capable of naturally interacting and communicating in real-world environments. It should do so by embodying (reinforcement) learning of physically grounded conceptualizations of multimodal perception, decision making, planning and actuation, with the aim of supporting human cognition in both an intelligent and intelligible way.

Abstract: This paper describes the optimization of traffic light controllers using a model-based reinforcementlearning approach. Traffic lights are optimized using mostly local, low-level information, but some high-level information concerning the general traffic situation at neighboring traffic junctions is take into account, enhancing situation awareness and improving decision making. We show, using experiments performed with a traffic simulator, that this approach outperforms existing methods.