Abstract: In this work we consider the problem of multiagent planning under sensing and acting uncertainty with a one time-step delay in communication. We adopt decentralized partially observable Markov processes (Dec-POMDPs) as our planning framework. When instantaneous and noise-free communication is available, agents can instantly share local observations. This eﬀectively reduces the decentralized planning problem to a centralized one, with a signiﬁcant decrease in planning complexity. However, instantaneous communication is a strong assumption, as it requires the agents to synchronize at every time step. Therefore, we explore planning in Dec-POMDP settings in which communication is delayed by one time step. We show that such situations can be modeled by Bayesian games in which the types of the agents are deﬁned by their last private observation. We will apply Bayesian games to deﬁne a value function QBG on the joint belief space, and we will show that it is the optimal payoﬀ function for our Dec-POMDP setting with one time-step delayedcommunication. The QBG -value function is piecewise linear and convex over the joint belief space, which we will use to deﬁne QBG -value iteration. Finally, we will adapt Perseus, an approximate POMDP solver, to compute QBG -value functions, and we will use it to perform some proof-of-concept experiments.
Abstract: We consider the problem of cooperative multiagent planning under uncertainty, formalized as a decentralized partially observable Markov decision process (Dec-POMDP). Unfortunately, in these models optimal planning is provably intractable. By communicating their local observations before they take actions, agents synchronize their knowledge of the environment, and the planning problem reduces to a centralized POMDP. As such, relying on communication significantly reduces the complexity of planning. In the real world however, such communication might fail temporarily. We present a step towards more realistic communication models for Dec-POMDPs by proposing a model that: (1) allows that communication might be delayed by one or more time steps, and (2) explicitly considers future probabilities of successful communication. For our model, we discuss how to efﬁciently compute an (approximate) value function and corresponding policies, and we demonstrate our theoretical results with encouraging experiments.