Abstract: We consider a stochastic nonlinear dynamical process with annihilation of particles. This process can be viewed as the continuous time version of the extended
Kalman ﬁlter/smoother. It also plays an important role in stochastic optimal controltheory. We derive a Gaussian approximation for this process. With the use
of the path integral formalism we derive Euler-Lagrange equations for the mode.
Furthermore, we derive a linear noise approximation to estimate the size of the
ﬂuctuations around the mode, and estimates of the partition function, based on the
mode and Gaussian corrections. Numerical experiments conﬁrm the validity of
the approximation method. In addition, they show that the Gaussian correction
provides a signiﬁcant improvement of the estimate of the partition function.
Abstract: Controltheory is a mathematical description of how to act
optimally to gain future rewards. In this paper We discuss
a class of non-linear stochasticcontrol problems that can be
eﬃciently solved using a path integral. In this control formalism, the central concept of cost-to-go or value function
becomes a free energy and methods and concepts from statistical physics can be readily applied, such as Monte Carlo
sampling or the Laplace approximation. When applied to a
receding horizon problem in a stationary environment, the
solution resembles the one obtained by traditional reinforcement learning with discounted reward. It is shown that this
solution can be computed more eﬃciently than in the discounted reward framework. As shown in previous work, the
approach is easily generalized to time-dependent tasks and
is therefore of great relevance for modeling real-time interactions between agents.
Abstract: Controltheory is a mathematical description of how to act optimally to gain future rewards. In this paper I give an introduction to deterministic and stochasticcontroltheory and I give an overview of the possible application of controltheory to the modeling of animal behavior and learning. I discuss a class of non-linear stochasticcontrol problems that can be efficiently solved using a path integral or by MC sampling. In this control formalism the central concept of cost-to-go becomes a free energy and methods and concepts from statistical physics can be readily applied.
Abstract: Optimal controltheory is a mathematical description of how to act optimally
to gain future rewards. In this paper I give an introduction to
deterministic and stochasticcontroltheory; partial observability,
learning and the combined problem of inference and control. Subsequently, I
discuss a new class of non-linear stochasticcontrol problems for which the Bellman equation becomes linear in the
control and that can be efficiently solved using a path integral.
In this control formalism the central concept of cost-to-go becomes a
free energy and methods and concepts from probabilistic graphical
models and statistical physics can be readily applied. I illustrate the
theory with a number of examples.
Abstract: Recently, a theory for stochastic optimal control in non-linear dynamical systems in continuous space-time has been developed (Kappen, 2005). We apply this theory to collaborative multi-agent systems. The agents
evolve according to a given non-linear dynamics with additive Wiener noise. Each
agent can control its own dynamics. The goal
is to minimize the accumulated joint cost,
which consists of a state dependent term and
a term that is quadratic in the control. We focus on systems of non-interacting agents that
have to distribute themselves optimally over
a number of targets, given a set of end-costs
for the different possible agent-target combinations. We show that optimal control is
the combinatorial sum of independent single-
agent single-target optimal controls weighted
by a factor proportional to the end-costs
of the different combinations. Thus, multi-
agent control is related to a standard graphical model inference problem. The additional
computational cost compared to single-agent
control is exponential in the tree-width of the
graph specifying the combinatorial sum times
the number of targets. We illustrate the result by simulations of systems with up to 42