Abstract: In this article we consider the issue of optimalcontrolin collaborative multi-agent systems with stochastic dynamics. The agents have a joint task in which they have to reach a number of target states. The dynamics of the agents contains additive control and additive noise, and the autonomous part factorizes over the agents. Full observation of the global state is assumed. The goal is to minimize the accumulated joint cost, which consists of integrated instantaneous costs and a joint end cost. The joint end cost expresses the joint task of the agents. The instantaneous costs are quadratic in the control and factorize over the agents. The optimalcontrol is given as a weighted linear combination of single-agent to single-target controls. The single-agent to single-target controls are expressed in terms of diffusion processes. These controls, when not closed form expressions, are formulated in terms of path integrals, which are calculated approximately by Metropolis-Hastings sampling. The weights in the control are interpreted as marginals of a joint distribution over agent to target assignments. The structure of the latter is represented by a graphical model, and the marginals are obtained by graphical model inference. Exact inference of the graphical model will break down in large systems, and so approximate inference methods are needed. We use naive mean field approximation and belief propagation to approximate the optimalcontrolin systems with linear dynamics. We compare the approximate inference methods with the exact solution, and we show that they can accurately compute the optimalcontrol. Finally, we demonstrate the control method in multi-agent systems with nonlinear dynamics consisting of up to 80 agents that have to reach an equal number of target states.

Abstract: We study optimalcontrolin large stochastic multi-agent systems in continuous space and time. We consider multi-agent systems where agents have independent dynamics with additive noise and control. The goal is to minimize the joint cost, which consists of a state dependent term and a term quadratic in the control. The system is described by a mathematical model, and an explicit solution is given. We focus on large systems where agents have to distribute themselves over a number of targets with minimal cost. In such a setting the optimalcontrol problem is equivalent to a graphical model inference problem. Exact inference will be intractable, and we use the mean field approximation to compute accurate approximations of the optimalcontrols. We conclude that near to optimalcontrolin large stochastic multi-agent systems is possible with this approach.

Abstract: Optimalcontrol theory is a mathematical description of how to act optimally
to gain future rewards. In this paper I give an introduction to
deterministic and stochastic control theory; partial observability,
learning and the combined problem of inference and control. Subsequently, I
discuss a new class of non-linear stochastic
control problems for which the Bellman equation becomes linear in the
control and that can be efficiently solved using a path integral.
In this control formalism the central concept of cost-to-go becomes a
free energy and methods and concepts from probabilistic graphical
models and statistical physics can be readily applied. I illustrate the
theory with a number of examples.

Abstract: Recently, a theory for stochastic optimalcontrolin non-linear dynamical systems in continuous space-time has been developed (Kappen, 2005). We apply this theory to collaborative multi-agent systems. The agents
evolve according to a given non-linear dynamics with additive Wiener noise. Each
agent can control its own dynamics. The goal
is to minimize the accumulated joint cost,
which consists of a state dependent term and
a term that is quadratic in the control. We focus on systems of non-interacting agents that
have to distribute themselves optimally over
a number of targets, given a set of end-costs
for the different possible agent-target combinations. We show that optimalcontrol is
the combinatorial sum of independent single-
agent single-target optimalcontrols weighted
by a factor proportional to the end-costs
of the different combinations. Thus, multi-
agent control is related to a standard graphical model inference problem. The additional
computational cost compared to single-agent
control is exponential in the tree-width of the
graph specifying the combinatorial sum times
the number of targets. We illustrate the result by simulations of systems with up to 42
agents.