Abstract: In this work we consider the problem of multiagentplanningunder sensing and acting uncertainty with a one time-step delay in communication. We adopt decentralized partially observable Markov processes (Dec-POMDPs) as our planning framework. When instantaneous and noise-free communication is available, agents can instantly share local observations. This eﬀectively reduces the decentralized planning problem to a centralized one, with a signiﬁcant decrease in planning complexity. However, instantaneous communication is a strong assumption, as it requires the agents to synchronize at every time step. Therefore, we explore planning in Dec-POMDP settings in which communication is delayed by one time step. We show that such situations can be modeled by Bayesian games in which the types of the agents are deﬁned by their last private observation. We will apply Bayesian games to deﬁne a value function QBG on the joint belief space, and we will show that it is the optimal payoﬀ function for our Dec-POMDP setting with one time-step delayed communication. The QBG -value function is piecewise linear and convex over the joint belief space, which we will use to deﬁne QBG -value iteration. Finally, we will adapt Perseus, an approximate POMDP solver, to compute QBG -value functions, and we will use it to perform some proof-of-concept experiments.
Abstract: Decentralized partially observable Markov decision processes (Dec-POMDPs) constitute a generic and expressive framework for multiagentplanningunderuncertainty. However, planning optimally is difficult because solutions map local observation histories to actions, and the number of such histories grows exponentially in the planning horizon. In this work, we identify a criterion that allows for lossless clustering of observation histories: i.e., we prove that when two histories satisfy the criterion, they have the same optimal value and thus can be treated as one. We show how this result can be exploited in optimal policy search and demonstrate empirically that it can provide a speed-up of multiple orders of magnitude, allowing the optimal solution of significantly larger problems. We also perform an empirical analysis of the generality of our clustering method, which suggests that it may also be useful in other (approximate) Dec-POMDP solution methods.
Abstract: We consider the problem of cooperative multiagentplanningunderuncertainty, formalized as a decentralized partially observable Markov decision process (Dec-POMDP). Unfortunately, in these models optimal planning is provably intractable. By communicating their local observations before they take actions, agents synchronize their knowledge of the environment, and the planning problem reduces to a centralized POMDP. As such, relying on communication significantly reduces the complexity of planning. In the real world however, such communication might fail temporarily. We present a step towards more realistic communication models for Dec-POMDPs by proposing a model that: (1) allows that communication might be delayed by one or more time steps, and (2) explicitly considers future probabilities of successful communication. For our model, we discuss how to efﬁciently compute an (approximate) value function and corresponding policies, and we demonstrate our theoretical results with encouraging experiments.
Abstract: Decentralized POMDPs (Dec-POMDPs) are becoming increasingly popular as models for multiagentplanningunderuncertainty, but solving a Dec-POMDP exactly is known to be an intractable combinatorial optimization problem. In this paper we apply the Cross-Entropy (CE) method, a recently introduced method for combinatorial optimization, to Dec-POMDPs, resulting in a randomized (sampling-based) algorithm for approximately solving Dec-POMDPs. This algorithm operates by sampling
pure policies from an appropriately parametrized stochastic policy, and then evaluates these policies either exactly or approximately in order to define the next stochastic policy to sample from, and so on until
convergence. Experimental results demonstrate that the CE method can search huge spaces efficiently, supporting our claim that combinatorial optimization methods can bring leverage to the approximate solution of Dec-POMDPs.
Abstract: This paper introduces the MultiAgent Decision Process software toolbox, an open source C++ library for decision-theoretic planningunderuncertainty in multiagent systems. It provides support for several multiagent models, such as POSGs, Dec-POMDPs and MMDPs. The toolbox aims to reduce development time for planning algorithms and to provide a benchmarking platform by providing a number of commonly used problem descriptions. It features a parser for a text-based ﬁle format for discrete Dec-POMDPs, shared functionality for planning algorithms, as well as the implementation of several Dec-POMDP planners. We describe design goals and architecture of the toolbox, and provide an overview of its functionality, illustrated by some usage examples. Finally we report on current and future work.