Abstract: Integration of UAVs with Air Traffic Control (ATC) is a world wide problem. ATC is already troubled by capacity
problems due to a vast amount of air traffic. In the future when large numbers of Unmanned Aerial Vehicles
(UAVs) will participate in the same airspace, the situation cannot afford to have UAVs that need special attention.
Regulations for UAV flights in civil airspace are still being developed but it is expected that authorities will require
UAVs to operate ‘like manned aircraft’. The implication is that UAVs need to become full participants of a complex
socio-technical environment and need to generate ‘man like’ decisions and behavior. In order to deal with the
complexity a novel approach to developing UAV autonomy is needed, aimed to create an environment that fosters
shared situation awareness between the UAVs, pilots and controllers. The underlying principle is to develop an
understanding of the work domain that can be shared between people and UAVs. A powerful framework to
represent the meaningful structure of the environment is Rasmussen’s abstraction hierarchy. This paper proposes
that autonomous UAVs can base their reasoning, decisions and actions on the abstraction hierarchy framework and
communicate about their goals and intentions with human operators. It is hypothesized that the properties of the
framework can create ‘shared situation awareness’ between the artificial and human operators despite the
differences in their internal workings.

Abstract: Purpose - In this paper, a novel Ant Colony Optimization (ACO) approach to optimal control is proposed. The standard ACO algorithms have proven to be very powerful optimization metaheuristic for combinatorial optimization problems. They have been demonstrated to work well when applied to various NP-complete problems, such as the traveling salesman problem. In this paper, ACO is reformulated as a model-free learning algorithm and its properties are discussed.
Design/methodology/approach - First, it is described how quantizing the state space of a dynamic system introduces stochasticity in the state transitions and transforms the optimal control problem into a stochastic combinatorial optimization problem, motivating the ACO approach. The algorithm is presented and is applied to the time-optimal swing-up and stabilization of an underactuated pendulum. In particular, the effect of different numbers of ants on the performance of the algorithm is studied.
Findings - The simulations show that the algorithm finds good control policies reasonably fast. An increasing number of ants results in increasingly better policies. The simulations also show that although the policy converges, the ants keep on exploring the state space thereby capable of adapting to variations in the system dynamics.
Research limitations/implications - This research introduces a novel ACO approach to optimal control and as such marks the starting point for more research of its properties. In particular, quantization issues must be studied in relation to the performance of the algorithm.
Originality/value - The work presented is original as it presents the first application of ACO to optimal control problems.

Abstract: We study optimal control in large stochastic multi-agent systems in continuous space and time. We consider multi-agent systems where agents have independent dynamics with additive noise and control. The goal is to minimize the joint cost, which consists of a state dependent term and a term quadratic in the control. The system is described by a mathematical model, and an explicit solution is given. We focus on large systems where agents have to distribute themselves over a number of targets with minimal cost. In such a setting the optimal control problem is equivalent to a graphical model inference problem. Exact inference will be intractable, and we use the mean field approximation to compute accurate approximations of the optimal controls. We conclude that near to optimal control in large stochastic multi-agent systems is possible with this approach.

Abstract: We consider multiagent systems with stochastic non-linear dynamics in continuous space-time. We focus on systems of agents that aim to visit a number of given target locations at given points in time at minimal control cost. The online optimization of which agent has to visit which target requires the solution of the Hamilton-Jacobi-Bellman (HJB) equation, which is a non-linear partial differential equation (PDE). Under some conditions, the log-transform can be applied to turn the HJB equation into a linear PDE. We then show that the optimal solution in the multiagent scheduling problem can be expressed in closed form as a sum of single schedule solutions.

Abstract: Recently, a theory for stochastic optimal control in non-linear dynamical systems in continuous space-time has been developed (Kappen, 2005). We apply this theory to collaborative multi-agent systems. The agents
evolve according to a given non-linear dynamics with additive Wiener noise. Each
agent can control its own dynamics. The goal
is to minimize the accumulated joint cost,
which consists of a state dependent term and
a term that is quadratic in the control. We focus on systems of non-interacting agents that
have to distribute themselves optimally over
a number of targets, given a set of end-costs
for the different possible agent-target combinations. We show that optimal control is
the combinatorial sum of independent single-
agent single-target optimal controls weighted
by a factor proportional to the end-costs
of the different combinations. Thus, multi-
agent control is related to a standard graphical model inference problem. The additional
computational cost compared to single-agent
control is exponential in the tree-width of the
graph specifying the combinatorial sum times
the number of targets. We illustrate the result by simulations of systems with up to 42
agents.