Abstract: In recent years, we have developed a framework of humancomputer interaction that offers recognition of various communication modalities including speech, lip movement, facial expression, handwriting and drawing, body gesture, text and visual symbols. The framework allows the rapid construction of a multimodal, multi-devices, and multi-user communication system within crisis management. This paper reports the multimodal information presentation module combining language, speech, visual-language and graphics, which can be used in isolation, but also as part of the framework. It provides a communication channel between the system and users with different communication devices. The module is able to specify and produce context-sensitive and user-tailored output. By the employment of ontology, it receives the system�s view about the world and dialogue actions from a dialogue manager and generates appropriate multimodal responses.
Abstract: Our work addresses the problem of autonomous concept formation from a design point of view, providing an initial answer to the question: What are the design features of an architecture supporting the acquisition of different types of concepts by an autonomous agent?
Autonomous agents, that is systems capable of interacting independently with their environment in the pursuit of their own goals, will provide the framework in which we study the problem of autonomous concept formation. Humans and most animals may in this sense also be regarded as autonomous agents, but our concern will be with artiﬁcial autonomous agents. A detailed survey and discussion of the many issues surrounding the notion of ‘artiﬁcial agency’ is beyond the scope of this thesis and a good overview can be found in [Wooldridge and Jennings, 1995]. Instead we will focus on how artiﬁcial agents could be endowed with representational and modelling capabilities.
The ability to form concepts is an important and recognised cognitive ability, thought to play an essential role in related abilities such as categorisation, language understanding, object identiﬁcation and recognition, reasoning, all of which can be seen as different aspects of intelligence. Concepts and categories are studied within cognitive science, where scientists are concerned with human conceptual abilities and mental representations of categories, but they have been addressed also in the rather different domain of machine learning and classiﬁcatory data analysis, where the focus is on the development of algorithms for clustering problems and induction problems [Mechelen et al., 1993]. The two ﬁelds are well distinct and only recently have started to interact, but even though the importance of concepts have been recognised, the nature of concepts is controversial, in the sense that there is no commonly agreed theory of concepts, and it is still far from obvious which representational means are most suited to capture the many cognitive functions that concepts are involved in.
Among the goals of this thesis there is the attempt to bring together different lines of argumentation that have emerged within philosophy, cognitive science and AI, in order to establish a solid foundation for further research into the representation and acquisition of concepts by autonomous agents. Thus, our results and conclusions will often be stated in terms of new insights and ideas, rather than resulting in new algorithms or formal methods.
Our focus will be on affordance concepts — discussed in detail in Chapter 4 — and our main contributions will be:
* An argument showing that concepts should be thought of as belonging to different kinds, where the differences among these kinds are to be captured in terms of architecture features supporting their acquisition.
* A description (and partial implementation) of a minimal architecture (the Innate Adaptive Behaviour architecture – IAB architecture for short) supporting the acquisition of affordance concepts; the IAB architecture is actually a proposal for a sustaining mechanism, in the sense of [Margolis, 1999], for affordances, and makes clear the necessity of a minimal structure for the representation of affordances.
When addressing concept formation in AI, what can be called the ‘system level’ is often overlooked, which means that concepts and categories are rarely studied from the point of view of a system, autonomous and complete, that might need such constructs and can acquire them only by means of interactions with its environment, under the constraints of its cognitive architecture. Also within psychology, the focus is usually on structural aspects of concepts rather than on developmental issues [Smith and Medin, 1981]. Our approach – an architecture-based approach – is an attempt (i) to show that a system level perspective on concept formation is indeed possible and worth exploring, and (ii) to provide an initial, maybe simple, but concrete example of the insights that can be gained from such an approach. Since the methodology that we propose to study concept formation is a general one, and can be applied also to other types of concepts, we decided to mention broadly ‘autonomous concept formation’ rather than ‘autonomous affordance-concepts formation’ in the title of the thesis.
Abstract: We present a discriminative approach to humanactionrecognition. At the heart of our approach is the use of common spatial patterns (CSP), a spatial filter technique that transforms temporal feature data by using differences in variance between two classes. Such a transformation focuses on differences between classes, rather than on modeling each class individually. As a result, to distinguish between two classes, we can use simple distance metrics in the low-dimensional transformed space. The most likely class is found by pairwise evaluation of all discriminant functions, which can be done in real-time. Our image representations are silhouette boundary gradients, spatially binned into cells. We achieve scores of approximately 96% on the Weizmann humanaction dataset, and show that reasonable results can be obtained when training on only a single subject. We further compare our results with a recent examplarbased approach. Future work is aimed at combining our approach with automatic human detection.
Abstract: In recent years, we have developed a framework of human-computer interaction that offers recognition of various communication modalities including speech, lip movement, facial expression, handwriting and drawing, body gesture, text and visual symbols. The framework allows the rapid construction of a multimodal, multi-devices, and multi-user communication system within crisis management. This paper reports the approaches used in multi-user information integration and multimodal presentation modules, which can be used in isolation, but also as part of the framework. The latter is able to specify and produce context-sensitive and user-tailored output combining language, speech, visual-language and graphics. These modules provide a communication channel between the system and users with different communication devices. By the employment of ontology, the system's view about the world is constructed from multi-user observations and appropriate multimodal responses are generated.
Abstract: We present a discriminative approach to humanactionrecognition. At the heart of our approach is the use of common spatial patterns (CSP), a spatial filter technique that transforms temporal feature data by using differences in variance between two classes. Such a transformation focusses on differences between classes, rather than on modelling each class individually. As a results, to distinguish between two classes, we can use simple distance metrics in the low-dimensional transformed space. The most likely class is found by pairwise evaluation of all discriminant functions. Our image representations are silhouette boundary gradients, spatially binned into cells. We achieve scores of approximately 96% on a standard action dataset, and show that reasonable results can be obtained when training on only a single subject. Future work is aimed at combining our approach with automatic human detection.
Abstract: The study of human facial expressions is one of the most challenging domains in pattern research community. Each facial expression is generated by non-rigid object deformations and these deformations are person-dependent. Automatic recognition of facial expressions is a process primarily based on analysis of permanent and transient features of the face, which can be only assessed with errors of some degree. The expression recognition model is oriented on the specification of Facial Action Coding System (FACS) of Ekman and Friesen [Ekman, Friesen 1978]. The hard constraints on the scene processing and recording conditions set a limited robustness to the analysis. In order to manage the uncertainties and lack of information, we set a probabilistic oriented framework up. The goal of the project was to design and implement a system for automatic recognition of human facial expression in video streams. The results of the project are of a great importance for a broad area of applications that relate to both research and applied topics.
Abstract: The recognition of the internal emotional state of one person plays an important role in several human-related fields. Among them, human-computer interaction has recently received special attention. The current research is aimed at the analysis of segmentation methods and of the performance of
the GentleBoost classifier on emotion recognition from speech. The data set used for emotion analysis is Berlin - a database of German emotional speech. A second data set is DES – Danish Emotional Speech
data set is used for comparison purposes. Our contribution for the research community consists in a novel extensive study on the efficiency of using distinct numbers of frames per speech utterance for emotion recognition. Eventually, a set of GentleBoost 'committees' with optimal classification rates is determined based on an exhaustive study on the generated classifiers and on different types of segmentation.
Abstract: In the past a crisis event was notified by local witnesses that use to make phone calls to the special services. They reported by speech according to their observation on the crisis site. The recent improvements in the area of human computer interfaces make possible the development of context-aware systems for crisis management that support people in escaping a crisis even before external help is available at site. Apart from collecting the people's reports on the crisis, these systems are assumed to automatically extract useful clues during typical human computer interaction sessions. The novelty of the current research resides in the attempt to involve computer vision techniques for performing an automatic evaluation of facial expressions during human-computer interaction sessions with a crisis management system. The current paper details an approach for an automatic facial expression recognition module that may be included in crisis-oriented applications. The algorithm uses Active Appearance Model for facial shape extraction and SVM classifier for Action Units detection and facial expression recognition.