Abstract: A large class of nonlinear systems can be well approximated by Takagi-Sugeno (TS) fuzzy models, with linear or affine consequents. However, in practical applications, the process under consideration may be affected by unknown inputs, such as disturbances, faults or unmodeled dynamics. In this paper, we consider the problem of simultaneously estimating the state and unknown inputs in TS systems. The inputs considered in this paper are 1) polynomials in time (such as a bias in the model or an unknown ramp input acting on the model) and 2) unmodeled dynamics. The proposed observer is designed based on the known part of the fuzzy model. Conditions on the asymptotic convergence of the observer are presented and the design guarantees an ultimate bound on the
error signal. The results are illustrated on a simulation example.
Abstract: Abstract. Reinforcement learning (RL) is a widely used learning paradigm for adaptive agents. Well-understood RL algorithms with good convergence and consistency properties exist. In their original form, these algorithms require that the environment states and agent actions take values in a relatively small discrete set. Fuzzy representations for approximate, model-free RL have been proposed in the literature for the more dificult case where the state-action space is continuous. In this work, we propose a fuzzy approximation structure similar to those previously used for Q-learning, but we combine it with the model-based Q-value iteration algorithm. We show that the resulting algorithm converges. We also give a modied, serial variant of the algorithm that converges at least as fast as the original version. An illustrative simulation example is provided.
Abstract: Reinforcement learning (RL) is a learning control paradigm that provides well-understood algorithms with good convergence and consistency properties. Unfortunately, these algorithms require that process states and control actions take only discrete values. Approximate solutions using fuzzy representations have been proposed in the literature for the case when the states and possibly the actions are continuous. However, the link between these mainly heuristic solutions and the larger body of work on approximate RL, including convergence results, has not been made explicit. In this paper, we propose a fuzzy approximation structure for the Q-value iteration algorithm, and show that the resulting algorithm is convergent. The proof is based on an extension of previous results in approximate RL. We then propose a modified, serial version of the algorithm that is guaranteed to converge at least as fast as the original algorithm. An illustrative simulation example is also provided.
Abstract: In realistic multiagent systems, learning on the basis of complete state information is not feasible. We introduce adaptive state focus Q-learning, a class of methods derived from Q-learning that start learning with only the state information that is strictly necessary for a single agent to perform the task, and that monitor the convergence of learning. If lack of convergence is detected, the learner dynamically expands its state space to incorporate more state information (e.g., states of other agents). Learning is faster and takes less resources than if the complete state were considered from the start, while being able to handle situations where agents interfere in pursuing their goals. We illustrate our approach by instantiating a simple version of such a method, and by showing that it outperforms learning with full state information without being hindered by the deciencies of learning on the basis of a single agent's state.
Abstract: We analyse the local stability of the high-temperature fixed point of the loopy belief propagation (LBP) algorithm and how this relates to the properties of the Bethe free energy which LBP tries to minimize. We focus on the case of binary networks with pairwise interactions. In particular, we state sufficient conditions for convergence of LBP to a unique fixed point and show that these are sharp for purely ferromagnetic interactions. In contrast, in the purely antiferromagnetic case, the undamped parallel LBP algorithm is suboptimal in the sense that the stability of the fixed point breaks down much earlier than for damped or sequential LBP; we observe that the onset of instability for the latter algorithms is related to the properties of the Bethe free energy. For spin-glass interactions, damping LBP only helps slightly. We estimate analytically the temperature at which the high-temperature LBP fixed point becomes unstable for random graphs with arbitrary degree distributions and random interactions.
Abstract: We derive novel suficient conditions for convergence of Loopy Belief Propagation (also
known as the Sum-Product algorithm) to
a unique xed point. Our results improve
upon previously known conditions. For binary variables with (anti-)ferromagnetic interactions, our conditions seem to be sharp.
Abstract: We derive novel conditions that guarantee convergence of the Sum-Product algorithm (also known as Loopy Belief Propagation
or simply Belief Propagation) to a unique fixed point, irrespective of the initial messages. The computational complexity of the
conditions is polynomial in the number of variables. In contrast with previously existing conditions, our results are directly
applicable to arbitrary factor graphs (with discrete variables) and are shown to be valid also in the case of factors containing
zeros, under some additional conditions. We compare our bounds with existing ones, numerically and, if possible, analytically.
For binary variables with pairwise interactions, we derive sufficient conditions that take into account local evidence (i.e. single
variable factors) and the type of pair interactions (attractive or repulsive). It is shown empirically that this bound outperforms
Abstract: Decentralized POMDPs (Dec-POMDPs) are becoming increasingly popular as models for multiagent planning under uncertainty, but solving a Dec-POMDP exactly is known to be an intractable combinatorial optimization problem. In this paper we apply the Cross-Entropy (CE) method, a recently introduced method for combinatorial optimization, to Dec-POMDPs, resulting in a randomized (sampling-based) algorithm for approximately solving Dec-POMDPs. This algorithm operates by sampling
pure policies from an appropriately parametrized stochastic policy, and then evaluates these policies either exactly or approximately in order to define the next stochastic policy to sample from, and so on until
convergence. Experimental results demonstrate that the CE method can search huge spaces efficiently, supporting our claim that combinatorial optimization methods can bring leverage to the approximate solution of Dec-POMDPs.