Abstract: Abstract: Reinforcement learning (RL) is a widely used learning paradigm for adaptive agents.
Because exact RL can only be applied to very simple problems, approximate algorithms are
usually necessary in practice. Many algorithms for approximate RL rely on basis-function
representations of the value function (or of the Q-function). Designing a good set of basis
functions without any prior knowledge of the value function (or of the Q-function) can be a
diﬃcult task. In this paper, we propose instead a technique to optimize the shape of a constant
number of basis functions for the approximate, fuzzy Q-iteration algorithm. In contrast to other
approaches to adapt basis functions for RL, our optimization criterion measures the actual
performance of the computed policies in the task, using simulation from a representative set
of initial states. A complete algorithm, using cross-entropy optimization of triangular fuzzy
membership functions, is given and applied to the car-on-the-hill example.

Abstract: This paper introduces a novel algorithm for approximate policy search in continuous-state discrete-action Markov decision processes (MDPs). Previous policy search approaches have typically used ad-hoc parameterizations developed for specific MDPs. In contrast, the novel algorithm employs a flexible policy parameterization, suitable for solving general discrete-action MDPs. The algorithm looks for the best closed-loop policy that can be represented using a given number of basis functions, where a discrete action is assigned to each basis function. The locations and shapes of the basis functions are optimized, together with the action assignments. This allows a large class of policies to be represented. The optimization is carried out with the cross-entropy method and evaluates the policies by their empirical return from a representative set of initial states. We report simulation experiments in which the algorithm reliably obtains good policies with only a small number of basis functions, albeit at sizable computational costs.