Abstract: Multi-agent systems are rapidly finding applications
in a variety of domains, including robotics, distributed control, telecommunications, and economics. The complexity of many tasks arising in these domains makes them difficult to solve with preprogrammed agent behaviors. The agents must instead discover a solution on their own, using learning. A significant part of the research on multi-agent learning concerns reinforcement learning techniques. This paper provides a comprehensive survey of multi-agent reinforcement learning (MARL). A central issue in the field is the formal statement of the multi-agent learning goal. Different viewpoints on this issue have led to the proposal of many different goals, among which two focal points can be distinguished: stability of the agents' learning dynamics, and adaptation to the changing behavior of the other agents. The MARL algorithms described in the literature aim - either explicitly or implicitly - at one of these two goals or at a combination of both, in a fully cooperative, fully competitive, or more general setting. A representative selection of these algorithms is discussed in detail in this paper, together with the specific issues that arise in each category. Additionally, the benefits and challenges of MARL are described along with some of the problem domains where MARL techniques have been applied. Finally, an outlook for the field is provided.

Abstract: The multidimensional assignment problem (MAP) is a combinatorial optimization problem arising in many applications, for instance in multi-target multi-sensor tracking problems. It is well-known that the MAP is NP-hard. The objective of a MAP is to match d-tuples of objects in such a way that the solution with the optimum total cost is found. In this paper a new class of approximation algorithms to solve the MAP is presented, named K-SGTS, and its effectiveness in multi-target multi-sensor tracking situations is shown. Its computational complexity is proven to be polynomial. Experimental results on the accuracy and speed of K-SGTS are provided in the last section of the paper

Abstract: We describe several principles for designing Actor-Agent Communities (AAC) as collectives of autonomous problem solving entities (software agents and human experts) that self-organize and collaborate at solving complex problems. One of the main distinctive aspects of the AAC is their ability to integrate in a meaningful way the expertise and reasoning of humans with different information processing algorithms performed by software agents, without requiring a unique and complete description of the problem and solution spaces.

Abstract: In tracking algorithms where measurements from various sensors are combined the track state representation is usually dependent on the type of sensor information that is received. When a multi-hypothesis tracking algorithm is used the probabilities
of the different hypotheses containing tracks in different representations need to be re-evaluated when track state representations are changed. For the particular case of trilateration a method is presented to adapt the state representation as more information becomes available. A method is presented to re-evaluate the probabilities of the
hypotheses leading to a method for the trilateration case. This is illustrated by a simple example.

Abstract: Localization is one of the most important basic skills of a mobile robot. Most approaches, however, still rely either on special sensors or require artificial environments. In this article, a novel approach is presented that can provide compass information for localization, purely based on the visual appearance of a room. A robot using such a visual compass can quickly learn a cylindrical map of the environment, consisting of simple statistical features that can be computed very quickly. The visual compass algorithm is efficient, scalable and can therefore be used in real-time on almost any contemporary robotic platform. Extensive experiments on a Sony Aibo robot have validated that the approach works in a vast variety of environments.

Abstract: Tasking is the process by which resources are allocated to a process or project that needs to be undertaken by one or more resources. In this report, an overview of the candidate approaches to developing tasking algorithms is provided. The algorithms themselves are to be implemented and evaluated in subsequent work packages.

Abstract: We introduce novel results for approximate
inference on planar graphical models using
the loop calculus framework. The loop cal-
culus (Chertkov and Chernyak, 2006b) allows
to express the exact partition function Z of
a graphical model as a finite sum of terms
that can be evaluated once the belief prop-
agation (BP) solution is known. In general,
full summation over all correction terms is
intractable. We develop an algorithm for
the approach presented in Chertkov et al.
(2008) which represents an efficient trunca-
tion scheme on planar graphs and a new rep-
resentation of the series in terms of Pfaffians
of matrices. We analyze in detail both the
loop series and the Pfaffian series for mod-
els with binary variables and pairwise in-
teractions, and show that the first term of
the Pfaffian series can provide very accurate
approximations. The algorithm outperforms
previous truncation schemes of the loop series
and is competitive with other state-of-the-art
methods for approximate inference.

Abstract: Our software demo package consists of an implementation for an automatic human emotion recognition system. The system is bi-modal and is based on fusing of data regarding facial expressions and emotion that has been extracted from speech signal. We have integrated Viola&Jones face detector (OpenCV), Active Appearance Model AAM (AAM-API) for extracting the face shape and Support Vector Machines (LibSVM) for the classification of emotion patterns. We have used Optical Flow algorithm for computing the features needed for the classification of facial expressions. Beside the integration of all processing components, the software system accommodates our implementation for the data fusion algorithm. Our C++ implementation has a working frame-rate of about 5fps.

Abstract: Our work addresses the problem of autonomous concept formation from a design point of view, providing an initial answer to the question: What are the design features of an architecture supporting the acquisition of different types of concepts by an autonomous agent?
Autonomous agents, that is systems capable of interacting independently with their environment in the pursuit of their own goals, will provide the framework in which we study the problem of autonomous concept formation. Humans and most animals may in this sense also be regarded as autonomous agents, but our concern will be with artiﬁcial autonomous agents. A detailed survey and discussion of the many issues surrounding the notion of ‘artiﬁcial agency’ is beyond the scope of this thesis and a good overview can be found in [Wooldridge and Jennings, 1995]. Instead we will focus on how artiﬁcial agents could be endowed with representational and modelling capabilities.
The ability to form concepts is an important and recognised cognitive ability, thought to play an essential role in related abilities such as categorisation, language understanding, object identiﬁcation and recognition, reasoning, all of which can be seen as different aspects of intelligence. Concepts and categories are studied within cognitive science, where scientists are concerned with human conceptual abilities and mental representations of categories, but they have been addressed also in the rather different domain of machine learning and classiﬁcatory data analysis, where the focus is on the development of algorithms for clustering problems and induction problems [Mechelen et al., 1993]. The two ﬁelds are well distinct and only recently have started to interact, but even though the importance of concepts have been recognised, the nature of concepts is controversial, in the sense that there is no commonly agreed theory of concepts, and it is still far from obvious which representational means are most suited to capture the many cognitive functions that concepts are involved in.
Among the goals of this thesis there is the attempt to bring together different lines of argumentation that have emerged within philosophy, cognitive science and AI, in order to establish a solid foundation for further research into the representation and acquisition of concepts by autonomous agents. Thus, our results and conclusions will often be stated in terms of new insights and ideas, rather than resulting in new algorithms or formal methods.
Our focus will be on affordance concepts — discussed in detail in Chapter 4 — and our main contributions will be:
* An argument showing that concepts should be thought of as belonging to different kinds, where the differences among these kinds are to be captured in terms of architecture features supporting their acquisition.
* A description (and partial implementation) of a minimal architecture (the Innate Adaptive Behaviour architecture – IAB architecture for short) supporting the acquisition of affordance concepts; the IAB architecture is actually a proposal for a sustaining mechanism, in the sense of [Margolis, 1999], for affordances, and makes clear the necessity of a minimal structure for the representation of affordances.
When addressing concept formation in AI, what can be called the ‘system level’ is often overlooked, which means that concepts and categories are rarely studied from the point of view of a system, autonomous and complete, that might need such constructs and can acquire them only by means of interactions with its environment, under the constraints of its cognitive architecture. Also within psychology, the focus is usually on structural aspects of concepts rather than on developmental issues [Smith and Medin, 1981]. Our approach – an architecture-based approach – is an attempt (i) to show that a system level perspective on concept formation is indeed possible and worth exploring, and (ii) to provide an initial, maybe simple, but concrete example of the insights that can be gained from such an approach. Since the methodology that we propose to study concept formation is a general one, and can be applied also to other types of concepts, we decided to mention broadly ‘autonomous concept formation’ rather than ‘autonomous affordance-concepts formation’ in the title of the thesis.

Abstract: This article investigates the prerequisites for a global exploration strategy in an unknown environment on a virtual disaster site. Assume that a robot equipped with a laser range scanner can build a detailed map of a previous unknown environment. The remaining question is how to use this information on this map for further exploration. On a map several interesting locations can be present where the exploration can be continued, referred as exploration frontiers. Typically, a greedy algorithm is used for the decision which frontier to explore next. Such a greedy algorithm only considers interesting locations locally, focused to reduce the movement costs. More sophisticated algorithms also take into account the information that can be gained along each frontier. This shifts the problem to estimate the amount of unexplored area behind the frontiers on the global map. Our algorithm exploits the long range of current laser scanners. Typically, during the previous exploration a small number of laser rays already passed the frontier, but this number is too low to have major impact on the generated map. Yet, the few rays through a frontier can be used to estimate the potential information gain from unexplored area beyond the frontier.

Abstract: This demo concerns the application of biological organizational models in distributed sensor- and actuator networks. These novel organizational models are applicable in distributed evacuation routing: a fully distributed and adaptive approach to evacuation routing using a network of low-cost, smart devices that control sensors (e.g. smoke sensors) and actuators (e.g. direction signs or sound devices). In case of an incident, e.g. a fire in a building, the algorithm we developed will let exit signs point to the nearest exit, even if sections have become inaccessible due to hazardous conditions. We have developed a bio-inspired algorithm for adaptive evacuation routing, a software simulation to evaluate the algorithm and a hardware prototype of "smart" exit signs.

Abstract: Describe the way the simulation model will be broken down into usable parts
for the simulator.
This is an important design document that tries to abstract from a specific
implementation. So methods and algorithms are described in pseudo code. It
does contain details on the actual implementation in a separate chapter.
This document is written as a reference document. So note that properties
and methods are added as they are found to be necessary. Not all properties
have been described directly in their accompanying text, because their
necessity becomes clear later in the document. For clarity these properties
and methods are still shown in the paragraph on the object itself.

Abstract: Reinforcement learning (RL) is a widely used
paradigm for learning control. Computing exact RL solutions is
generally only possible when process states and control actions
take values in a small discrete set. In practice, approximate
algorithms are necessary. In this paper, we propose an approximate, model-based Q-iteration algorithm that relies on
a fuzzy partition of the state space, and a discretization of
the action space. Using assumptions on the continuity of the
dynamics and of the reward function, we show that the resulting
algorithm is consistent, i.e., that the optimal solution is obtained
asymptotically as the approximation accuracy increases. An
experimental study indicates that a continuous reward function
is also important for a predictable improvement in performance
as the approximation accuracy increases.

Abstract: Reinforcement learning (RL) is a widely used learning paradigm for adaptive agents. There exist several convergent and consistent
RL algorithms which have been intensively studied. In their original form,
these algorithms require that the environment states and agent actions
take values in a relatively small discrete set. Fuzzy representations for
approximate, model-free RL have been proposed in the literature for the
more difficult case where the state-action space is continuous. In this
work, we propose a fuzzy approximation architecture similar to those
previously used for Q-learning, but we combine it with the model-based
Q-value iteration algorithm. We prove that the resulting algorithm converges. We also give a modified, asynchronous variant of the algorithm
that converges at least as fast as the original version. An illustrative
simulation example is provided.

Abstract: Abstract. Reinforcement learning (RL) is a widely used learning paradigm for adaptive agents. Well-understood RL algorithms with good convergence and consistency properties exist. In their original form, these algorithms require that the environment states and agent actions take values in a relatively small discrete set. Fuzzy representations for approximate, model-free RL have been proposed in the literature for the more dificult case where the state-action space is continuous. In this work, we propose a fuzzy approximation structure similar to those previously used for Q-learning, but we combine it with the model-based Q-value iteration algorithm. We show that the resulting algorithm converges. We also give a modied, serial variant of the algorithm that converges at least as fast as the original version. An illustrative simulation example is provided.

Abstract: Ant Colony Optimization (ACO) has proven to be a very powerful optimization heuristic for Combinatorial Optimization Problems. While being very successful for various NP-complete optimization problems, ACO is not trivially applicable to control problems. In this paper a novel ACO algorithm is introduced for the automated design of optimal control policies for continuous-state dynamic systems. The so called Fuzzy ACO algorithm integrates the multi-agent optimization heuristic of ACO with a fuzzy partitioning of the state space of the system. A simulated control problem is presented to demonstrate the functioning of the proposed algorithm.

Abstract: Reinforcement learning (RL) is a learning control paradigm that provides well-understood algorithms with good convergence and consistency properties. Unfortunately, these algorithms require that process states and control actions take only discrete values. Approximate solutions using fuzzy representations have been proposed in the literature for the case when the states and possibly the actions are continuous. However, the link between these mainly heuristic solutions and the larger body of work on approximate RL, including convergence results, has not been made explicit. In this paper, we propose a fuzzy approximation structure for the Q-value iteration algorithm, and show that the resulting algorithm is convergent. The proof is based on an extension of previous results in approximate RL. We then propose a modified, serial version of the algorithm that is guaranteed to converge at least as fast as the original algorithm. An illustrative simulation example is also provided.

Abstract: Abstract: Reinforcement learning (RL) is a widely used learning paradigm for adaptive agents.
Because exact RL can only be applied to very simple problems, approximate algorithms are
usually necessary in practice. Many algorithms for approximate RL rely on basis-function
representations of the value function (or of the Q-function). Designing a good set of basis
functions without any prior knowledge of the value function (or of the Q-function) can be a
diﬃcult task. In this paper, we propose instead a technique to optimize the shape of a constant
number of basis functions for the approximate, fuzzy Q-iteration algorithm. In contrast to other
approaches to adapt basis functions for RL, our optimization criterion measures the actual
performance of the computed policies in the task, using simulation from a representative set
of initial states. A complete algorithm, using cross-entropy optimization of triangular fuzzy
membership functions, is given and applied to the car-on-the-hill example.

Abstract: Variational Bayesian inference and (collapsed) Gibbs sampling are the two important classes of inference algorithms for Bayesian networks. Both have their advantages and disadvantages: collapsed Gibbs sampling is unbiased but is also inefficient for large count values and requires averaging over many samples to reduce variance. On the other hand, variational Bayesian inference is efficient and accurate for large count values but suffers from bias for small counts. We propose a hybrid algorithm that combines the best of both worlds: it samples very small counts and applies variational updates to large counts. This hybridization is shown to significantly improve testset perplexity relative to variational inference at no computational cost.

Abstract: This article investigates the effect of incorporating knowledge about the communication possibilities in an exploration algorithm used to map an unknown environment. The mission is to explore a hypothetical disaster site with a small team of robots. The challenge faced by the robot team is to coordinate their actions such that they efficiently explore the environment in their search for victims. The coordination can only be optimal when the robots share the same map. With a limited communication range the map cannot be shared in all circumstances. This article concentrates on the effect of a distributed map, where each robot has only has knowledge of a part of the global map and has no guaranteed connection to the other robot or the operator.

Abstract: We propose a method for improving Belief Propagation (BP) that takes into account the influence of loops in the graphical model. The method is a variation on and generalization of the method recently introduced by Montanari and Rizzo [2005]. It consists of two steps: (i) standard BP is used to calculate cavity distributions for each variable (i.e. probability distributions on the Markov blanket of a variable for a modified graphical model, in which the factors involving that variable have been removed); (ii) all cavity distributions are combined by a messagepassing algorithm to obtain consistent single node marginals. The method is exact if the graphical model contains a single loop. The complexity of the method is exponential in the size of the Markov blankets. The results are very accurate in general: the error is often several orders of magnitude smaller than that of standard BP, as illustrated by numerical experiments.

Abstract: We propose a method for improving approximate inference methods that corrects for the influence of loops in the graphical model. The method is applicable to arbitrary factor graphs, provided that the size of the Markov blankets is not too large. It is an alternative implementation of an idea introduced recently by Montanari and Rizzo (2005). In its simplest form, which amounts to the assumption that no loops are present, the method reduces to the minimal Cluster Variation Method approximation (which uses maximal factors as outer clusters). On the other hand, using estimates of the effect of loops (obtained by some approximate inference algorithm) and applying the Loop Correcting (LC) method usually gives significantly better results than applying the approximate inference algorithm directly without loop corrections. Indeed, we often observe that the loop corrected error is approximately the square of the error of the approximate inference method used to estimate the effect of loops. We compare different variants of the Loop Correcting method with other approximate inference methods on a variety of graphical models, including "real world" networks, and conclude that the LC approach generally obtains the most accurate results.

Abstract: We propose a method to improve approximate inference methods by correcting for the influence of
loops in the graphical model. The method is a generalization and alternative implementation of a recent idea from Montanari and Rizzo (2005). It is applicable to arbitrary factor graphs, provided that
the size of the Markov blankets is not too large. It consists of two steps: (i) an approximate infer-
ence method, for example, belief propagation, is used to approximate cavity distributions for each
variable (i.e., probability distributions on the Markov blanket of a variable for a modified graphical
model in which the factors involving that variable have been removed); (ii) all cavity distributions
are improved by a message-passing algorithm that cancels out approximation errors by imposing
certain consistency constraints. This loop correction (LC) method usually gives significantly better
results than the original, uncorrected, approximate inference algorithm that is used to estimate the
effect of loops. Indeed, we often observe that the loop-corrected error is approximately the square
of the error of the uncorrected approximate inference method. In this article, we compare different
variants of the loop correction method with other approximate inference methods on a variety of
graphical models, including “real world” networks, and conclude that the LC method generally
obtains the most accurate results

Abstract: Facial related analysis represented milestones in the fields of computer vision for many decades. Lots of methods have been designed and implemented so as to solve the specific requirements. In the current paper we present three different classification algorithms that we use to fulfill the tasks concerning face detection and facial expression recognition.
One of the methods, Relevance Vector Machines (RVM) stands for a novel supervised learning technique that is based on a probabilistic approach of Support Vector Machines. The mathematical base of the models is presented. The data for testing were selected from the Cohn-Kanade Facial Expression Database. We report recognition rates for six universal expressions based on a range of experiments. Some discussions on the comparison of different classification methods are included.

Abstract: A novel algorithm to associate and trilaterate detections from multiple distributed radars is presented. The algorithm provides for flexible track state representations. The coordinate system of a track is switched from the measurement coordinates (range-Doppler) to cartesian coordinates when a detection from another sensor is associated to the track. In the case of multiple targets and false alarms we run into the complication of multiple association possibilities. These can be resolved by using a multi-hypothesis algorithm. In general, correctly formed tracks will have more likely associations. Therefore, hypotheses describing these tracks will be favored. Simulations with one or two targets and different false alarm rates show the need
to preserve multiple hypotheses of the world state. Tracking performance for various false alarms rates is evaluated.

Abstract: Multi-agent systems are rapidly finding applications in a variety of domains, including robotics, distributed control, telecommunications, economics. Many tasks arising in these domains require that the agents learn behaviors online. A significant part of the research on multi-agent learning concerns reinforcement learning techniques. However, due to different viewpoints on central issues, such as the formal statement of the learning goal, a large number of different methods and approaches have been introduced. In this paper we aim to present an integrated survey of the field. First, the issue of the multi-agent learning goal is discussed, after which a representative selection of algorithms is reviewed. Finally, open issues are identified and future research directions are outlined.

Abstract: Correct traffic light phase sequences are critical to efficient traffic flow through road
intersections. If the timing of just one set of lights at a junction is one second from
optimal, the results can lead to major traffic congestion and delays. Currently, traffic
light phases are set using a mixture of experience and empirical techniques to achieve
an optimal phase sequence and, in the worst cases, this process can take many
months. Any technique to speed up the identification of optimal phasings would have
clear benefits to government agencies everywhere.
Thales UK’s Research and Technology Group have investigated how Genetic
Algorithms can solve “real-world” problems and have obtained some promising results
with optimising traffic light phasings. Working with their colleagues at Thales Research
and Technology Netherlands they have developed a demonstration solution based on
the concept of “software agent communities” – that is, groups of software programs
working together for their greater good. This extends the earlier Genetic Algorithms
work by using agent-based technologies to enable the rapid optimisation of larger
traffic light networks, potentially covering an entire town.

Abstract: The system being described in the paper presents a Web interface for a fully automatic audio-video human emotion recognition. The analysis is focused on the set of six basic emotions plus the neutral type. Different classifiers are involved in the process of face detection (AdaBoost), facial expression recognition (SVM and other models) and emotion recognition from speech (GentleBoost). The Active Appearance Model - AAM is used to get the information related to the shapes of the faces to be analyzed. The facial expression recognition is frame based and no temporal patterns of emotions are managed. The emotion recognition from movies is done separately on sound and video frames. The algorithm does not handle the dependencies between audio and video during the analysis. The methodologies for data processing are explained and specific performance measures for the emotion recognition are presented.

Abstract: Generation algorithms allow for the generation of Virtual Neurons (VNs) from a small set of morphological properties. The set describes the morphological properties of real neurons in terms of statistical descriptors such as the number of branches and segment lengths (among others). The majority of reconstruction algorithms use the observed properties to estimate the parameters of a priori fixed probability distributions in order to construct statistical descriptors that fit well with the observed data. In this article, we present a non-parametric generation algorithm based on kernel density estimators (KDEs). The new algorithm is called KDE-Neuron and has three advantages over parametric reconstruction algorithms: (1) no a priori specifications about the distributions underlying the real data, (2) peculiarities in the biological data will be reflected in the VNs, and (3) ability to reconstruct different cell types. We experimentally generated motor neurons and granule cells, and statistically validated the obtained results. Moreover, we assessed the quality of the prototype data set and observed that our generated neurons are as good as the prototype data in terms of the used statistical descriptors. The opportunities and limitations of data-driven algorithmic reconstruction of neurons are discussed.

Abstract: Purpose - In this paper, a novel Ant Colony Optimization (ACO) approach to optimal control is proposed. The standard ACO algorithms have proven to be very powerful optimization metaheuristic for combinatorial optimization problems. They have been demonstrated to work well when applied to various NP-complete problems, such as the traveling salesman problem. In this paper, ACO is reformulated as a model-free learning algorithm and its properties are discussed.
Design/methodology/approach - First, it is described how quantizing the state space of a dynamic system introduces stochasticity in the state transitions and transforms the optimal control problem into a stochastic combinatorial optimization problem, motivating the ACO approach. The algorithm is presented and is applied to the time-optimal swing-up and stabilization of an underactuated pendulum. In particular, the effect of different numbers of ants on the performance of the algorithm is studied.
Findings - The simulations show that the algorithm finds good control policies reasonably fast. An increasing number of ants results in increasingly better policies. The simulations also show that although the policy converges, the ants keep on exploring the state space thereby capable of adapting to variations in the system dynamics.
Research limitations/implications - This research introduces a novel ACO approach to optimal control and as such marks the starting point for more research of its properties. In particular, quantization issues must be studied in relation to the performance of the algorithm.
Originality/value - The work presented is original as it presents the first application of ACO to optimal control problems.

Abstract: We analyse the local stability of the high-temperature fixed point of the loopy belief propagation (LBP) algorithm and how this relates to the properties of the Bethe free energy which LBP tries to minimize. We focus on the case of binary networks with pairwise interactions. In particular, we state sufficient conditions for convergence of LBP to a unique fixed point and show that these are sharp for purely ferromagnetic interactions. In contrast, in the purely antiferromagnetic case, the undamped parallel LBP algorithm is suboptimal in the sense that the stability of the fixed point breaks down much earlier than for damped or sequential LBP; we observe that the onset of instability for the latter algorithms is related to the properties of the Bethe free energy. For spin-glass interactions, damping LBP only helps slightly. We estimate analytically the temperature at which the high-temperature LBP fixed point becomes unstable for random graphs with arbitrary degree distributions and random interactions.

Abstract: We derive sufficient conditions for the uniqueness of loopy belief propagation fixed points. These conditions depend on both the structure of the graph and the strength of the potentials and naturally extend those for convexity of the Bethe free energy. We compare them with (a strengthened version of) conditions derived elsewhere for pairwise potentials. We discuss possible implications for convergent algorithms, as well as for other approximate free energies.

Abstract: Probabilistic graphical models, and in particular Bayesian networks, are nowadays well established as a modeling tool for domains with
uncertainty. A drawback is that large, complex graphical models are intractable for exact computation. Therefore there is a lot of research interest in approximate inference.
The lack of open source "reference" implementations hampers progress in research on approximate inference. Methods differ widely in terms of quality and performance characteristics, which also depend in different ways on various properties of the graphical models. Finding the best approximate inference method for a particular application therefore often requires empirical comparisons. However, implementing and debugging these methods takes a lot of time which could otherwise be spent on research. Therefore we have developed libDAI. libDAI is a free/open source C++ library (licensed under GPL) that provides implementations of various (deterministic) approximate inference methods for discrete graphical models. libDAI supports arbitrary `factor graphs` with discrete variables (this includes discrete Markov Random Fields and Bayesian Networks).
This release is an additional contribution to the LibDAI library. This code implements the Z2 algorithm, a particular way of correcting the Belief Propagation (BP) solution, developed in the ICIS project SNN1 (see Gomez (2009), Approximate inference on planar graphs using Loop Calculus and Belief Propagation).

Abstract: Planning in single-agent models like MDPs and POMDPs can be carried out by resorting to Q-value functions: a (near-) optimal Q-value function is computed in a recursive manner by dynamic programming, and then a policy is extracted from this value function. In this paper we study whether similar Q-value functions can be defined in decentralized POMDP models (Dec-POMDPs), what the cost of computing such value functions is, and how policies can be extracted from such value functions. Using the framework of Bayesian games, we argue that searching for the optimal Q-value function may be as costly as exhaustive policy search. Then we analyze various approximate Q-value functions that allow efficient computation. Finally, we describe a family of algorithms for extracting policies from such Q-value functions.

Abstract: The abilities of mobile robots depend greatly on the performance of basic skills such as vision and localization. Although great progress has been made in the 4-Legged league in the past years, the performance of many of those approaches completely depends on the artificial environment conditions established on a 4-Legged soccer field. In this article, an algorithm is introduced that can provide localization information based on the natural appearance of the surroundings of the field. The algorithm starts making a scan of the surroundings by turning head and body of the robot on a certain spot. The robot learns the appearance of the surroundings at that spot by storing color transitions at different angles in a panoramic index. The stored panoramic appearance can be used to determine the rotation (including a confidence value) relative to the learned spot for other points on the field. The applicability of this kind of localization for more natural environments is demonstrated in two environments other than the official 4-Legged league field.

Abstract: The proliferation of small mobile devices and wireless networks has resulted in an increasing demand to support the applications found in wired environments on mobile devices. In real time replication systems, such as collaborative systems, this trend gives some new problems to address. The properties of wireless networks are low bandwidth and high latency, which change dynamically over time. The risk of the network getting congested is therefore high with the result that the user will not receive the important information in time. Consequently there is a need to develop algorithms and methods for adaptive work environments and adaptive data distribution, to minimise the traffic load. An architecture based on multi- and mobile-agents is proposed as a solution. Personalized behaviour is included in a flexible and extensible system. A prototype of the architecture has been implemented in a crisis environment and was used for an evaluation. It is assumed that each individual in the field is equipped with a PDA that can communicate with other PDA's in the surrounding and remote servers. Users can report about their environment using a personalized iconic language. Each user is supervised by a personal agent. In case of emergency users are routed outside a dangerous area using a personalised dynamic routing system, called PIRA "Personal Intelligent Routing Assistant". The system and results of testing will be presented in this paper.

Abstract: This paper introduces a novel algorithm for approximate policy search in continuous-state discrete-action Markov decision processes (MDPs). Previous policy search approaches have typically used ad-hoc parameterizations developed for specific MDPs. In contrast, the novel algorithm employs a flexible policy parameterization, suitable for solving general discrete-action MDPs. The algorithm looks for the best closed-loop policy that can be represented using a given number of basis functions, where a discrete action is assigned to each basis function. The locations and shapes of the basis functions are optimized, together with the action assignments. This allows a large class of policies to be represented. The optimization is carried out with the cross-entropy method and evaluates the policies by their empirical return from a representative set of initial states. We report simulation experiments in which the algorithm reliably obtains good policies with only a small number of basis functions, albeit at sizable computational costs.

Abstract: Planning in single-agent models like MDPs and POMDPs can be carried out by resorting to Q-value functions: a (near-) optimal Q-value function is computed in a recursive manner by dynamic programming, and then a policy is extracted from this value function. In this paper we study whether similar Q-value functions can be defined in decentralized POMDP models (Dec-POMDPs), what the cost of computing such value functions is, and how policies can be extracted from such value functions. Using the framework of Bayesian games, we argue that searching for the optimal Q-value function may be as costly as exhaustive policy search. Then we analyze various approximate Q-value functions that allow efficient computation. Finally, we describe a family of algorithms for extracting policies from such Q-value functions.

Abstract: The Dec-POMDP is a model for multi-agent planning under uncertainty that has received increasingly more attention over the recent years. In this work we propose a new heuristic QBG that can be used in various algorithms for Dec-POMDPs and describe differences and similarities with QMDP and QPOMDP. An experimental evaluation shows that, at the price of some computation, QBG gives a consistently tighter upper bound to the maximum value obtainable.

Abstract: Abstract: Reinforcement learning (RL) comprises an array of techniques that learn a control
policy soas to maximize a reward signal. When applied to the control of elevator systems, RL
has the potential of ﬁnding better control policies than classical heuristic, suboptimal policies.
On theother hand, elevator systems oﬀer an interesting benchmark application for the study
of RL. In this paper, RL is applied toa single-elevator system. The mathematical model of
the elevator system is described in detail, making the system easy to re-implement and re-use.
An experimental comparison is made between the performance of the Q-value iteration and
Q-learning RL algorithms, when applied to the elevator system.

Abstract: This paper proposes a reinforcement learning architecture containing multiple "experts", each of which is a specialist in a different region in the overall state space. The central idea is that the different experts use qualitatively different (but sufficiently Markov) state representations, each of
which captures different information regarding the true underlying world state, and which for that reason is suitable for a different part of the state space. The experts themselves learn to switch to another state representation (other expert) by having switching actions. Value functions can be learned using standard reinforcement learning algorithms. Experiments in a small, proof-of-principle experiment as well as a larger, more realistic experiment illustrate the validity of this approach.

Abstract: This paper presents the improvement of the robustness and
accuracy of the weighted scan matching algorithm matching against the
union of earlier acquired scans. The approach allows to reduce the correspondence
error, which is explicitly modeled in the weighted scan matching
algorithm, by providing a more complete and denser frame of reference
to match new scans. By making use of the efficient quadtree data
structure, earlier acquired scans can be stored with millimeter accuracy
for environments with dimensions larger than 100x100 meter. This can
be realized with the preservation of real-time performance. In our experiments
we illustrate the significant gains in robustness and accuracy that
can be the result with this approach.

Abstract: Thanks to advances in both computer science and
engineering, the divide between robotics and multi-agent systems
is shrinking. Robots are capable of performing an ever wider
range of tasks, and there is an increasing need for solutions
to high-level problems such as multi-agent coordination. In this
paper we examine the problem of finding a robust exploration
strategy for a team of mobile robots that takes into account
communication limitations.We propose four performance metrics
to evaluate and compare existing multi-robot exploration algorithms,
and present a role-based approach in which robots either
act as explorers or as relays. The result is a complete exploration
of the environment in which information is efficiently returned
to a central command centre, which is particularly applicable to
the domain of rescue robotics.

Abstract: The paper describes a novel technique for the recognition of emotions from multimodal data. We focus on the recognition of the six prototypic emotions. The results from the facial expression recognition and from the emotion recognition from speech are combined using a bi-modal multimodal semantic data fusion model that determines the most probable emotion of the subject. Two types of models based on geometric face features for facial expression recognition are being used, depending on the presence or absence of speech. In our approach we define an algorithm that is robust to changes of face shape that occur during regular speech. The influence of phoneme generation on the face shape during speech is removed by using features that are only related to the eyes and the eyebrows. The paper includes results from testing the presented models.

Abstract: Analysis and observer design for nonlinear systems have long been investigated, but no generally applicable methods exist as yet. A large class of nonlinear systems can be well approximated by Takagi-Sugeno fuzzy models, for which methods and algorithms have been developed to analyze their stability and to design observers. However, results obtained for Takagi-Sugeno fuzzy models are in general not directly applicable to the original nonlinear system. In this paper, we investigate what conclusions can be drawn and what guarantees can be expected when an observer is designed based on an approximate fuzzy model and applied to the original nonlinear system. It is shown that in general, exponential stability of the estimation error dynamics cannot be obtained. However, the
estimation error is in bounded. This bound is computed based on the approximation error and the Lyapunov function used. The results are illustrated using simulation examples.

Abstract: A large class of nonlinear systems can be well
approximated by Takagi-Sugeno fuzzy models, for which methods and algorithms have been developed to analyze their stability and to design observers and controllers. However, results obtained for Takagi-Sugeno fuzzy models are in general not directly applicable to the original nonlinear system. In this paper, we investigate what conclusions can be drawn when an observer-based controller is designed for an approximate model and then applied to the original nonlinear system. In particular, we consider the case when the scheduling vector depends on the states that have to be estimated, and in the membership functions of the observer estimated scheduling vectors are used. The results are illustrated throughout the
paper using simulation examples.

Abstract: Gas distribution models can provide comprehensive information about a large
number of gas concentration measurements, highlighting, for example, areas of unusual
gas accumulation. They can also help to locate gas sources and to plan where
future measurements should be carried out. Current physical modeling methods,
however, are computationally expensive and not applicable for real world scenarios
with real-time and high resolution demands. This chapter reviews kernel methods
that statistically model gas distribution. Gas measurements are treated as random
variables, and the gas distribution is predicted at unseen locations either using a
kernel density estimation or a kernel regression approach. The resulting statistical apmodels
do not make strong assumptions about the functional form of the gas distribution,
such as the number or locations of gas sources, for example. The major
focus of this chapter is on two-dimensional models that provide estimates for the
means and predictive variances of the distribution. Furthermore, three extensions
to the presented kernel density estimation algorithm are described, which allow to
include wind information, to extend the model to three dimensions, and to reflect
time-dependent changes of the random process that generates the gas distribution
measurements. All methods are discussed based on experimental validation using
real sensor data.

Abstract: In highly dynamic scheduling, change events occur long before the scheduling horizon is reached. The scheduler is driven by change. The complexity of these problems cannot be described in terms of NP-hardness, as no exact optimal solution exists for a dynamic problem. Nevertheless, search space explosions are common. In addition, search time is very limited, since the environment requires immediate solutions. Each problem solving approach will therefore employ heuristics in some sense. The heuristics may be in the algorithm (e.g., metaheuristics), in narrowing down the search space, or in enhancing the fitness functions (e.g., by preferring robust schedules).
Evolutionary algorithms (EA) are often applied metaheuristics, both in static and dynamic environments. They have innate robust tendencies, being able to adapt to changing environments. However, we argue they are of limited use in highly dynamic problems. As revised schedules preferably stay close to the original, there is no need for broad sampled neighbourhoods such as provided by EAs. Furthermore, schedules in dynamic environments have a great internal coherence, which an EA is prone to break. Lastly, in such problems, the time to perform EA computations simply is not there. The optimisation approach needed in such environments often involves load balancing, which can be performed by exchanging tasks between machines or queues. Therefore, more straightforward element-swapping heuristics such as k-opt search, will prove more valuable.
Elevator dispatching was used as a case study in this research. It is a typical example of a highly dynamic scheduling problem. Experiments using real world passenger data show that EAs are outperformed by k-opt search, even when search time was unlimited.

Abstract: We derive novel suficient conditions for convergence of Loopy Belief Propagation (also
known as the Sum-Product algorithm) to
a unique xed point. Our results improve
upon previously known conditions. For binary variables with (anti-)ferromagnetic interactions, our conditions seem to be sharp.

Abstract: We derive novel conditions that guarantee convergence of the Sum-Product algorithm (also known as Loopy Belief Propagation
or simply Belief Propagation) to a unique fixed point, irrespective of the initial messages. The computational complexity of the
conditions is polynomial in the number of variables. In contrast with previously existing conditions, our results are directly
applicable to arbitrary factor graphs (with discrete variables) and are shown to be valid also in the case of factors containing
zeros, under some additional conditions. We compare our bounds with existing ones, numerically and, if possible, analytically.
For binary variables with pairwise interactions, we derive sufficient conditions that take into account local evidence (i.e. single
variable factors) and the type of pair interactions (attractive or repulsive). It is shown empirically that this bound outperforms
existing bounds.

Abstract: This report is the second of four documents, which describe the distributed tasking
architecture that has been developed by Thales Research and Technology (UK) Ltd.
This architecture has been developed to test generic tasking and re-tasking policies as
part of the Tasking and Re-tasking sub-project for the Interactive Collaborative
Information Systems (ICIS) programme. This second document focuses on how objects and interfaces have been implemented
to enable centralised tasking algorithms processes to be utilised in the tasking
architecture. Centralised tasking is a traditional, top down approach to tasking whereby
a centralised actor or agent has access to all information about tasks and resources and
is responsible for making all task allocations.

Abstract: Decentralized POMDPs (Dec-POMDPs) are becoming increasingly popular as models for multiagent planning under uncertainty, but solving a Dec-POMDP exactly is known to be an intractable combinatorial optimization problem. In this paper we apply the Cross-Entropy (CE) method, a recently introduced method for combinatorial optimization, to Dec-POMDPs, resulting in a randomized (sampling-based) algorithm for approximately solving Dec-POMDPs. This algorithm operates by sampling
pure policies from an appropriately parametrized stochastic policy, and then evaluates these policies either exactly or approximately in order to define the next stochastic policy to sample from, and so on until
convergence. Experimental results demonstrate that the CE method can search huge spaces efficiently, supporting our claim that combinatorial optimization methods can bring leverage to the approximate solution of Dec-POMDPs.

Abstract: This paper introduces the MultiAgent Decision Process software toolbox, an open source C++ library for decision-theoretic planning under uncertainty in multiagent systems. It provides support for several multiagent models, such as POSGs, Dec-POMDPs and MMDPs. The toolbox aims to reduce development time for planning algorithms and to provide a benchmarking platform by providing a number of commonly used problem descriptions. It features a parser for a text-based ﬁle format for discrete Dec-POMDPs, shared functionality for planning algorithms, as well as the implementation of several Dec-POMDP planners. We describe design goals and architecture of the toolbox, and provide an overview of its functionality, illustrated by some usage examples. Finally we report on current and future work.

Abstract: Coevolutionary algorithms search for test cases as part of the search process. The resulting adaptive evaluation function takes away the need to define a fixed evaluation function, but may also be unstable and thereby prevent reliable progress. Recent work in coevolution has therefore focused on algorithms that guarantee progress with respect to a given solution concept. The Nash Memory archive guarantees monotonicity with respect to the game-theoretic solution concept of the Nash equilibrium, but is limited to symmetric games. We present an extension of the Nash Memory that guarantees monotonicity for asymmetric games. The Parallel Nash Memory is demonstrated in experiments, and its performance on general sum games is discussed.

Abstract: In the past a crisis event was notified by local witnesses that use to make phone calls to the special services. They reported by speech according to their observation on the crisis site. The recent improvements in the area of human computer interfaces make possible the development of context-aware systems for crisis management that support people in escaping a crisis even before external help is available at site. Apart from collecting the people's reports on the crisis, these systems are assumed to automatically extract useful clues during typical human computer interaction sessions. The novelty of the current research resides in the attempt to involve computer vision techniques for performing an automatic evaluation of facial expressions during human-computer interaction sessions with a crisis management system. The current paper details an approach for an automatic facial expression recognition module that may be included in crisis-oriented applications. The algorithm uses Active Appearance Model for facial shape extraction and SVM classifier for Action Units detection and facial expression recognition.

Abstract: Urban Search and Rescue is a growing area of robotic research. The RoboCup Federation has recognized this, and has created the new Virtual Robots competition to complement its existing physical robot and agent competitions. In order to successfully compete in this competition, teams need to field multi-robot solutions that cooperatively explore and map an environment while searching for victims. This paper presents the results of the first annual RoboCup Rescue Virtual competition. It provides details on the metrics used to judge the contestants as well as summaries of the algorithms used by the top four teams. This allows readers to compare and contrast these effective approaches. Furthermore, the simulation engine itself is examined and real-world validation results on the engine and algorithms are offered.

Abstract: At TUDelft there is a project aiming at the realization of a fully automatic emotion recognition system on the basis of facial analysis. The exploited approach splits the system into four components. Face detection, facial characteristic point extraction, tracking and classification. The focus in this paper will only be on the first two components. Face
detection is employed by boosting simple rectangle Haar-like features that give a decent representation of the face. These features also allow the differentiation between a face and a non-face. The boosting algorithm is combined with an
Evolutionary Search to speed up the overall search time. Facial characteristic points (FCP) are extracted from the detected faces. The same technique applied on faces is utilized for this purpose. Additionally, FCP extraction using corner detection methods and brightness distribution has also been considered. Finally, after retrieving the required FCPs the emotion of the facial expression can be determined. The classification of the Haar-like features is done by the Relevance Vector Machine (RVM).

Abstract: In this paper we present WILLEM, a system for dynamic evacuation routing in buildings, using a wireless sensor network. Dynamic evacuation routing is the process of dynamically determining the fastest routes to the exits. The routes may be changed in case a ﬁre occurs somewhere. We also present an algorithm for detecting congestions in corridors during evacuation, and a means of providing the people in those congestions an alternative route towards the exit. Each phase of the method is described extensively: the deployment of the wireless sensor network, the automatic topology learning of the network and the actual evacuation routing methods. We have built a simulation framework in which all types of evacuation routing can be simulated. The results of our experiments were surprising in the sense that dynamic evacuation routing turned out not to be faster than static evacuation routing in every setup; however, we did ﬁnd out why this is the case. We also performed some experiments on a real wireless sensor network, in order to ﬁnd out if our automatic conﬁguration method could work in real life. The results are promising. We also present an algorithm for mapping the learned topology of the wireless sensor network upon a virtual map. This way, the network topology can be visualised – which is an important feature for emergency services.