Abstract: We consider a stochastic nonlinear dynamical process with annihilation of particles. This process can be viewed as the continuous time version of the extended
Kalman ﬁlter/smoother. It also plays an important role in stochasticoptimalcontrol theory. We derive a Gaussian approximation for this process. With the use
of the path integral formalism we derive Euler-Lagrange equations for the mode.
Furthermore, we derive a linear noise approximation to estimate the size of the
ﬂuctuations around the mode, and estimates of the partition function, based on the
mode and Gaussian corrections. Numerical experiments conﬁrm the validity of
the approximation method. In addition, they show that the Gaussian correction
provides a signiﬁcant improvement of the estimate of the partition function.

Abstract: We address the role of noise and the issue of efficient computation in stochasticoptimalcontrol
problems. We consider a class of nonlinear control problems that can be formulated as a path integral and
where the noise plays the role of temperature. The path integral displays symmetry breaking and there
exists a critical noise value that separates regimes where optimalcontrol yields qualitatively different
solutions. The path integral can be computed efficiently by Monte Carlo integration or by a Laplace
approximation, and can therefore be used to solve high dimensional stochasticcontrol problems

Abstract: Control theory is a mathematical description of how to act
optimally to gain future rewards. In this paper We discuss
a class of non-linear stochasticcontrol problems that can be
eﬃciently solved using a path integral. In this control formalism, the central concept of cost-to-go or value function
becomes a free energy and methods and concepts from statistical physics can be readily applied, such as Monte Carlo
sampling or the Laplace approximation. When applied to a
receding horizon problem in a stationary environment, the
solution resembles the one obtained by traditional reinforcement learning with discounted reward. It is shown that this
solution can be computed more eﬃciently than in the discounted reward framework. As shown in previous work, the
approach is easily generalized to time-dependent tasks and
is therefore of great relevance for modeling real-time interactions between agents.

Abstract: Control theory is a mathematical description of how to act optimally to gain future rewards. In this paper I give an introduction to deterministic and stochasticcontrol theory and I give an overview of the possible application of control theory to the modeling of animal behavior and learning. I discuss a class of non-linear stochasticcontrol problems that can be efficiently solved using a path integral or by MC sampling. In this control formalism the central concept of cost-to-go becomes a free energy and methods and concepts from statistical physics can be readily applied.

Abstract: In this article we consider the issue of optimalcontrol in collaborative multi-agent systems with stochastic dynamics. The agents have a joint task in which they have to reach a number of target states. The dynamics of the agents contains additive control and additive noise, and the autonomous part factorizes over the agents. Full observation of the global state is assumed. The goal is to minimize the accumulated joint cost, which consists of integrated instantaneous costs and a joint end cost. The joint end cost expresses the joint task of the agents. The instantaneous costs are quadratic in the control and factorize over the agents. The optimalcontrol is given as a weighted linear combination of single-agent to single-target controls. The single-agent to single-target controls are expressed in terms of diffusion processes. These controls, when not closed form expressions, are formulated in terms of path integrals, which are calculated approximately by Metropolis-Hastings sampling. The weights in the control are interpreted as marginals of a joint distribution over agent to target assignments. The structure of the latter is represented by a graphical model, and the marginals are obtained by graphical model inference. Exact inference of the graphical model will break down in large systems, and so approximate inference methods are needed. We use naive mean field approximation and belief propagation to approximate the optimalcontrol in systems with linear dynamics. We compare the approximate inference methods with the exact solution, and we show that they can accurately compute the optimalcontrol. Finally, we demonstrate the control method in multi-agent systems with nonlinear dynamics consisting of up to 80 agents that have to reach an equal number of target states.

Abstract: Purpose - In this paper, a novel Ant Colony Optimization (ACO) approach to optimalcontrol is proposed. The standard ACO algorithms have proven to be very powerful optimization metaheuristic for combinatorial optimization problems. They have been demonstrated to work well when applied to various NP-complete problems, such as the traveling salesman problem. In this paper, ACO is reformulated as a model-free learning algorithm and its properties are discussed.
Design/methodology/approach - First, it is described how quantizing the state space of a dynamic system introduces stochasticity in the state transitions and transforms the optimalcontrol problem into a stochastic combinatorial optimization problem, motivating the ACO approach. The algorithm is presented and is applied to the time-optimal swing-up and stabilization of an underactuated pendulum. In particular, the effect of different numbers of ants on the performance of the algorithm is studied.
Findings - The simulations show that the algorithm finds good control policies reasonably fast. An increasing number of ants results in increasingly better policies. The simulations also show that although the policy converges, the ants keep on exploring the state space thereby capable of adapting to variations in the system dynamics.
Research limitations/implications - This research introduces a novel ACO approach to optimalcontrol and as such marks the starting point for more research of its properties. In particular, quantization issues must be studied in relation to the performance of the algorithm.
Originality/value - The work presented is original as it presents the first application of ACO to optimalcontrol problems.

Abstract: We study optimalcontrol in large stochastic multi-agent systems in continuous space and time. We consider multi-agent systems where agents have independent dynamics with additive noise and control. The goal is to minimize the joint cost, which consists of a state dependent term and a term quadratic in the control. The system is described by a mathematical model, and an explicit solution is given. We focus on large systems where agents have to distribute themselves over a number of targets with minimal cost. In such a setting the optimalcontrol problem is equivalent to a graphical model inference problem. Exact inference will be intractable, and we use the mean field approximation to compute accurate approximations of the optimalcontrols. We conclude that near to optimalcontrol in large stochastic multi-agent systems is possible with this approach.

Abstract: Optimalcontrol theory is a mathematical description of how to act optimally
to gain future rewards. In this paper I give an introduction to
deterministic and stochasticcontrol theory; partial observability,
learning and the combined problem of inference and control. Subsequently, I
discuss a new class of non-linear stochasticcontrol problems for which the Bellman equation becomes linear in the
control and that can be efficiently solved using a path integral.
In this control formalism the central concept of cost-to-go becomes a
free energy and methods and concepts from probabilistic graphical
models and statistical physics can be readily applied. I illustrate the
theory with a number of examples.

Abstract: We consider multiagent systems with stochastic non-linear dynamics in continuous space-time. We focus on systems of agents that aim to visit a number of given target locations at given points in time at minimal control cost. The online optimization of which agent has to visit which target requires the solution of the Hamilton-Jacobi-Bellman (HJB) equation, which is a non-linear partial differential equation (PDE). Under some conditions, the log-transform can be applied to turn the HJB equation into a linear PDE. We then show that the optimal solution in the multiagent scheduling problem can be expressed in closed form as a sum of single schedule solutions.

Abstract: Recently, a theory for stochasticoptimalcontrol in non-linear dynamical systems in continuous space-time has been developed (Kappen, 2005). We apply this theory to collaborative multi-agent systems. The agents
evolve according to a given non-linear dynamics with additive Wiener noise. Each
agent can control its own dynamics. The goal
is to minimize the accumulated joint cost,
which consists of a state dependent term and
a term that is quadratic in the control. We focus on systems of non-interacting agents that
have to distribute themselves optimally over
a number of targets, given a set of end-costs
for the different possible agent-target combinations. We show that optimalcontrol is
the combinatorial sum of independent single-
agent single-target optimalcontrols weighted
by a factor proportional to the end-costs
of the different combinations. Thus, multi-
agent control is related to a standard graphical model inference problem. The additional
computational cost compared to single-agent
control is exponential in the tree-width of the
graph specifying the combinatorial sum times
the number of targets. We illustrate the result by simulations of systems with up to 42
agents.