Abstract: We consider a stochastic nonlinear dynamical process with annihilation of particles. This process can be viewed as the continuous time version of the extended
Kalman ﬁlter/smoother. It also plays an important role in stochastic optimal control theory. We derive a Gaussian approximation for this process. With the use
of the path integral formalism we derive Euler-Lagrange equations for the mode.
Furthermore, we derive a linear noise approximation to estimate the size of the
ﬂuctuations around the mode, and estimates of the partition function, based on the
mode and Gaussian corrections. Numerical experiments conﬁrm the validity of
the approximation method. In addition, they show that the Gaussian correction
provides a signiﬁcant improvement of the estimate of the partition function.

Abstract: We address the role of noise and the issue of efficient computation in stochastic optimal control
problems. We consider a class of nonlinear control problems that can be formulated as a path integral and
where the noise plays the role of temperature. The path integral displays symmetry breaking and there
exists a critical noise value that separates regimes where optimal control yields qualitatively different
solutions. The path integral can be computed efficiently by Monte Carlo integration or by a Laplace
approximation, and can therefore be used to solve high dimensional stochastic control problems

Abstract: Control theory is a mathematical description of how to act
optimally to gain future rewards. In this paper We discuss
a class of non-linear stochastic control problems that can be
eﬃciently solved using a path integral. In this control formalism, the central concept of cost-to-go or value function
becomes a free energy and methods and concepts from statistical physics can be readily applied, such as Monte Carlo
sampling or the Laplace approximation. When applied to a
receding horizon problem in a stationary environment, the
solution resembles the one obtained by traditional reinforcement learning with discounted reward. It is shown that this
solution can be computed more eﬃciently than in the discounted reward framework. As shown in previous work, the
approach is easily generalized to time-dependent tasks and
is therefore of great relevance for modeling real-time interactions between agents.

Abstract: This report presents an overview of the state-of-the-art methods and models for planning for teams of embodied agents. Due to the nature of the real world, this means we focus on multi-agent planning in stochastic, partially observable systems. In particular we focus on decentralized partially observable Markov decision processes (Dec-POMDPs), partially observable stochastic games (POSGs) and related models. Regarding such models, we review complexity results and recently proposed methods for finding (approximate) solutions.

Abstract: Control theory is a mathematical description of how to act optimally to gain future rewards. In this paper I give an introduction to deterministic and stochastic control theory and I give an overview of the possible application of control theory to the modeling of animal behavior and learning. I discuss a class of non-linear stochastic control problems that can be efficiently solved using a path integral or by MC sampling. In this control formalism the central concept of cost-to-go becomes a free energy and methods and concepts from statistical physics can be readily applied.

Abstract: In this article we consider the issue of optimal control in collaborative multi-agent systems with stochastic dynamics. The agents have a joint task in which they have to reach a number of target states. The dynamics of the agents contains additive control and additive noise, and the autonomous part factorizes over the agents. Full observation of the global state is assumed. The goal is to minimize the accumulated joint cost, which consists of integrated instantaneous costs and a joint end cost. The joint end cost expresses the joint task of the agents. The instantaneous costs are quadratic in the control and factorize over the agents. The optimal control is given as a weighted linear combination of single-agent to single-target controls. The single-agent to single-target controls are expressed in terms of diffusion processes. These controls, when not closed form expressions, are formulated in terms of path integrals, which are calculated approximately by Metropolis-Hastings sampling. The weights in the control are interpreted as marginals of a joint distribution over agent to target assignments. The structure of the latter is represented by a graphical model, and the marginals are obtained by graphical model inference. Exact inference of the graphical model will break down in large systems, and so approximate inference methods are needed. We use naive mean field approximation and belief propagation to approximate the optimal control in systems with linear dynamics. We compare the approximate inference methods with the exact solution, and we show that they can accurately compute the optimal control. Finally, we demonstrate the control method in multi-agent systems with nonlinear dynamics consisting of up to 80 agents that have to reach an equal number of target states.

Abstract: In this paper time-augmented Petri nets are used to model people in the transit hall of an airport. Their behavior is strongly influenced
by an event with a clear deadline (their flight), but typically
there is so much time left that they linger and can be tempted to show
random other behaviors, often induced by the location (encountering a
coffee corner or a toilet). All behaviors are stochastic, but the firing rate
is made a function of both location and time. This framework allows to
show a rich set of behaviors; the diversity of the emergent behaviors is
initiated with probabilities from observations in an actual transit hall of
an airport.

Abstract: Purpose - In this paper, a novel Ant Colony Optimization (ACO) approach to optimal control is proposed. The standard ACO algorithms have proven to be very powerful optimization metaheuristic for combinatorial optimization problems. They have been demonstrated to work well when applied to various NP-complete problems, such as the traveling salesman problem. In this paper, ACO is reformulated as a model-free learning algorithm and its properties are discussed.
Design/methodology/approach - First, it is described how quantizing the state space of a dynamic system introduces stochasticity in the state transitions and transforms the optimal control problem into a stochastic combinatorial optimization problem, motivating the ACO approach. The algorithm is presented and is applied to the time-optimal swing-up and stabilization of an underactuated pendulum. In particular, the effect of different numbers of ants on the performance of the algorithm is studied.
Findings - The simulations show that the algorithm finds good control policies reasonably fast. An increasing number of ants results in increasingly better policies. The simulations also show that although the policy converges, the ants keep on exploring the state space thereby capable of adapting to variations in the system dynamics.
Research limitations/implications - This research introduces a novel ACO approach to optimal control and as such marks the starting point for more research of its properties. In particular, quantization issues must be studied in relation to the performance of the algorithm.
Originality/value - The work presented is original as it presents the first application of ACO to optimal control problems.

Abstract: We study optimal control in large stochastic multi-agent systems in continuous space and time. We consider multi-agent systems where agents have independent dynamics with additive noise and control. The goal is to minimize the joint cost, which consists of a state dependent term and a term quadratic in the control. The system is described by a mathematical model, and an explicit solution is given. We focus on large systems where agents have to distribute themselves over a number of targets with minimal cost. In such a setting the optimal control problem is equivalent to a graphical model inference problem. Exact inference will be intractable, and we use the mean field approximation to compute accurate approximations of the optimal controls. We conclude that near to optimal control in large stochastic multi-agent systems is possible with this approach.

Abstract: Optimal control theory is a mathematical description of how to act optimally
to gain future rewards. In this paper I give an introduction to
deterministic and stochastic control theory; partial observability,
learning and the combined problem of inference and control. Subsequently, I
discuss a new class of non-linear stochastic
control problems for which the Bellman equation becomes linear in the
control and that can be efficiently solved using a path integral.
In this control formalism the central concept of cost-to-go becomes a
free energy and methods and concepts from probabilistic graphical
models and statistical physics can be readily applied. I illustrate the
theory with a number of examples.

Abstract: We consider multiagent systems with stochastic non-linear dynamics in continuous space-time. We focus on systems of agents that aim to visit a number of given target locations at given points in time at minimal control cost. The online optimization of which agent has to visit which target requires the solution of the Hamilton-Jacobi-Bellman (HJB) equation, which is a non-linear partial differential equation (PDE). Under some conditions, the log-transform can be applied to turn the HJB equation into a linear PDE. We then show that the optimal solution in the multiagent scheduling problem can be expressed in closed form as a sum of single schedule solutions.

Abstract: This paper considers linear-quadratic control of a non-linear dynamical system subject to arbitrary cost. I show that for this class of stochastic control problems the non-linear Hamilton–Jacobi–Bellman equation can be transformed into a linear equation. The transformation is similar to the transformation used to relate the classical Hamilton–Jacobi equation to the Schr{\"o}dinger equation. As a result of the linearity, the usual backward computation can be replaced by a forward diffusion process that can be computed by stochastic integration or by the evaluation of a path integral. It is shown how in the deterministic limit the Pontryagin minimum principle formalism is recovered. The significance of the path integral approach is that it forms the basis for a number of efficient computational methods, such as Monte Carlo sampling, the Laplace approximation and the variational approximation. We show the effectiveness of the first two methods in a number of examples. Examples are given that show the qualitative difference between stochastic and deterministic control and the occurrence of symmetry breaking as a function of the noise.

Abstract: Recently, a theory for stochastic optimal control in non-linear dynamical systems in continuous space-time has been developed (Kappen, 2005). We apply this theory to collaborative multi-agent systems. The agents
evolve according to a given non-linear dynamics with additive Wiener noise. Each
agent can control its own dynamics. The goal
is to minimize the accumulated joint cost,
which consists of a state dependent term and
a term that is quadratic in the control. We focus on systems of non-interacting agents that
have to distribute themselves optimally over
a number of targets, given a set of end-costs
for the different possible agent-target combinations. We show that optimal control is
the combinatorial sum of independent single-
agent single-target optimal controls weighted
by a factor proportional to the end-costs
of the different combinations. Thus, multi-
agent control is related to a standard graphical model inference problem. The additional
computational cost compared to single-agent
control is exponential in the tree-width of the
graph specifying the combinatorial sum times
the number of targets. We illustrate the result by simulations of systems with up to 42
agents.

Abstract: Decentralized POMDPs (Dec-POMDPs) are becoming increasingly popular as models for multiagent planning under uncertainty, but solving a Dec-POMDP exactly is known to be an intractable combinatorial optimization problem. In this paper we apply the Cross-Entropy (CE) method, a recently introduced method for combinatorial optimization, to Dec-POMDPs, resulting in a randomized (sampling-based) algorithm for approximately solving Dec-POMDPs. This algorithm operates by sampling
pure policies from an appropriately parametrized stochastic policy, and then evaluates these policies either exactly or approximately in order to define the next stochastic policy to sample from, and so on until
convergence. Experimental results demonstrate that the CE method can search huge spaces efficiently, supporting our claim that combinatorial optimization methods can bring leverage to the approximate solution of Dec-POMDPs.