Abstract: In recent years, we have developed a framework of humancomputer interaction that offers recognition of various communication modalities including speech, lip movement, facial expression, handwriting and drawing, body gesture, text and visual symbols. The framework allows the rapid construction of a multimodal, multi-devices, and multi-user communication system within crisis management. This paper reports the multimodal information presentation module combining language, speech, visual-language and graphics, which can be used in isolation, but also as part of the framework. It provides a communication channel between the system and users with different communication devices. The module is able to specify and produce context-sensitive and user-tailored output. By the employment of ontology, it receives the system�s view about the world and dialogue actions from a dialogue manager and generates appropriate multimodal responses.
Abstract: Our software demo package consists of an implementation for an automatic human emotion recognition system. The system is bi-modal and is based on fusing of data regarding facial expressions and emotion that has been extracted from speech signal. We have integrated Viola&Jones face detector (OpenCV), Active Appearance Model AAM (AAM-API) for extracting the face shape and Support Vector Machines (LibSVM) for the classification of emotion patterns. We have used Optical Flow algorithm for computing the features needed for the classification of facial expressions. Beside the integration of all processing components, the software system accommodates our implementation for the data fusion algorithm. Our C++ implementation has a working frame-rate of about 5fps.
Abstract: Our work addresses the problem of autonomous concept formation from a design point of view, providing an initial answer to the question: What are the design features of an architecture supporting the acquisition of different types of concepts by an autonomous agent?
Autonomous agents, that is systems capable of interacting independently with their environment in the pursuit of their own goals, will provide the framework in which we study the problem of autonomous concept formation. Humans and most animals may in this sense also be regarded as autonomous agents, but our concern will be with artificial autonomous agents. A detailed survey and discussion of the many issues surrounding the notion of ‘artificial agency’ is beyond the scope of this thesis and a good overview can be found in [Wooldridge and Jennings, 1995]. Instead we will focus on how artificial agents could be endowed with representational and modelling capabilities.
The ability to form concepts is an important and recognised cognitive ability, thought to play an essential role in related abilities such as categorisation, language understanding, object identification and recognition, reasoning, all of which can be seen as different aspects of intelligence. Concepts and categories are studied within cognitive science, where scientists are concerned with human conceptual abilities and mental representations of categories, but they have been addressed also in the rather different domain of machine learning and classificatory data analysis, where the focus is on the development of algorithms for clustering problems and induction problems [Mechelen et al., 1993]. The two fields are well distinct and only recently have started to interact, but even though the importance of concepts have been recognised, the nature of concepts is controversial, in the sense that there is no commonly agreed theory of concepts, and it is still far from obvious which representational means are most suited to capture the many cognitive functions that concepts are involved in.
Among the goals of this thesis there is the attempt to bring together different lines of argumentation that have emerged within philosophy, cognitive science and AI, in order to establish a solid foundation for further research into the representation and acquisition of concepts by autonomous agents. Thus, our results and conclusions will often be stated in terms of new insights and ideas, rather than resulting in new algorithms or formal methods.
Our focus will be on affordance concepts — discussed in detail in Chapter 4 — and our main contributions will be:
* An argument showing that concepts should be thought of as belonging to different kinds, where the differences among these kinds are to be captured in terms of architecture features supporting their acquisition.
* A description (and partial implementation) of a minimal architecture (the Innate Adaptive Behaviour architecture – IAB architecture for short) supporting the acquisition of affordance concepts; the IAB architecture is actually a proposal for a sustaining mechanism, in the sense of [Margolis, 1999], for affordances, and makes clear the necessity of a minimal structure for the representation of affordances.
When addressing concept formation in AI, what can be called the ‘system level’ is often overlooked, which means that concepts and categories are rarely studied from the point of view of a system, autonomous and complete, that might need such constructs and can acquire them only by means of interactions with its environment, under the constraints of its cognitive architecture. Also within psychology, the focus is usually on structural aspects of concepts rather than on developmental issues [Smith and Medin, 1981]. Our approach – an architecture-based approach – is an attempt (i) to show that a system level perspective on concept formation is indeed possible and worth exploring, and (ii) to provide an initial, maybe simple, but concrete example of the insights that can be gained from such an approach. Since the methodology that we propose to study concept formation is a general one, and can be applied also to other types of concepts, we decided to mention broadly ‘autonomous concept formation’ rather than ‘autonomous affordance-concepts formation’ in the title of the thesis.
Abstract: We present a discriminative approach to human action recognition. At the heart of our approach is the use of common spatial patterns (CSP), a spatial filter technique that transforms temporal feature data by using differences in variance between two classes. Such a transformation focuses on differences between classes, rather than on modeling each class individually. As a result, to distinguish between two classes, we can use simple distance metrics in the low-dimensional transformed space. The most likely class is found by pairwise evaluation of all discriminant functions, which can be done in real-time. Our image representations are silhouette boundary gradients, spatially binned into cells. We achieve scores of approximately 96% on the Weizmann human action dataset, and show that reasonable results can be obtained when training on only a single subject. We further compare our results with a recent examplarbased approach. Future work is aimed at combining our approach with automatic human detection.
Abstract: In recent years, we have developed a framework of human-computer interaction that offers recognition of various communication modalities including speech, lip movement, facial expression, handwriting and drawing, body gesture, text and visual symbols. The framework allows the rapid construction of a multimodal, multi-devices, and multi-user communication system within crisis management. This paper reports the approaches used in multi-user information integration and multimodal presentation modules, which can be used in isolation, but also as part of the framework. The latter is able to specify and produce context-sensitive and user-tailored output combining language, speech, visual-language and graphics. These modules provide a communication channel between the system and users with different communication devices. By the employment of ontology, the system's view about the world is constructed from multi-user observations and appropriate multimodal responses are generated.
Abstract: We present a discriminative approach to human action recognition. At the heart of our approach is the use of common spatial patterns (CSP), a spatial filter technique that transforms temporal feature data by using differences in variance between two classes. Such a transformation focusses on differences between classes, rather than on modelling each class individually. As a results, to distinguish between two classes, we can use simple distance metrics in the low-dimensional transformed space. The most likely class is found by pairwise evaluation of all discriminant functions. Our image representations are silhouette boundary gradients, spatially binned into cells. We achieve scores of approximately 96% on a standard action dataset, and show that reasonable results can be obtained when training on only a single subject. Future work is aimed at combining our approach with automatic human detection.
Abstract: This report describes strengths and limitations of humancognition in relation to computers and automation. In this sense, the report may be interesting in any context in which humans work together with (intelligent) machines. A theoretical standpoint is presented, which allows useful interpretation of experimental results on humancognition. A method is presented by which human cognitive strength can be measured.
Abstract: In actor-agent teams human and artificial entities interact and cooperate in order to enhance and augment their individual and joint cognitive ergonomic and problem solving capabilities. Also actor-agent communities can benefit from ‘ambient cognition’, a novel further reaching concept than ambient intelligence that hardly takes into account the resource limitations and capabilities changing over time of both humans and agents in collaborative settings. The Dutch Companion project aims at the
realization of an agent that takes advantage of the ambient cognition concerning actor-agent system dynamics such that natural emotion-sensitive interaction with an actor over a longer period of time can be sustained. We elaborate on our vision of
pursuing ambient cognition within actor-agent systems and present the plans and expected results of the Dutch Companion project.
Abstract: In actor-agent teams human and artificial entities interact and cooperate in order to enhance and augment their individual and joint cognitive ergonomic and problem solving capabilities. Also actor-agent communities can benefit from ‘ambient cognition’, a novel further reaching concept than ambient intelligence that hardly takes into account the resource limitations and capabilities changing over time of both humans and agents in collaborative settings. The Dutch Companion project aims at the realization of an agent that takes advantage of the ambient cognition concerning actor-agent system dynamics such that natural social and emotion-sensitive interaction with an actor over a longer period of time can be sustained. We elaborate on our vision of pursuing ambient cognition within actor-agent systems and briefly describe the goals of the Dutch Companion project.
Abstract: previous research addressed change blindness and the role of humancognition, but not yet the influence of mood on change blindness. The levels-of-focus hypothesis and attentional flexibility research support the hypothesis that a positive mood increases performance for detecting peripheral changes in a change blindness task. Two studies revealed increased performance in a positive mood; the first showed better detection of central changes, whereas in the second study this was found for peripheral changes. Both studies revealed evidence for visual sensing. Although the hypothesis that a positive mood leads to a broader visual attention focus or higher visual attentional flexibility was not supported, the results suggest that people in a positive mood rely more on the process of visual sensing.
Abstract: Previous research addressed the role of humancognition in change blindness, but not yet the influence of mood on change blindness. The levels-of-focus hypothesis and attentional flexibility research support the hypothesis that a positive mood enhances detection of peripheral changes in a change blindness task. The conducted study revealed increased performance on peripheral changes for participants in a positive mood. Although the hypothesis that a positive mood leads to a broader visual attention focus or higher visual attentional flexibility was not supported, the results suggest that people in a positive mood rely more on the process of visual sensing.
Abstract: The increasing complexity of our world demands new perspectives on the role of technology in human decision making. We need new technology to cope with the increasingly complex and information-rich nature of our modern society. This is particularly true for critical environments such as crisis management and traffic management, where humans need to engage in close collaborations with artificial systems to observe and understand the situation and respond in a sensible way. The book Interactive Collaborative Information Systems addresses techniques that support humans in situations in which complex information handling is required and that facilitate distributed decision-making. The theme integrates research from information technology, artificial intelligence and human sciences to obtain a multidisciplinary foundation from which innovative actor-agent systems for critical environments can emerge. It emphasizes the importance of building actor-agent communities: close collaborations between human and artificial actors that highlight their complementary capabilities in situations where task distribution is flexible and adaptive. This book focuses on the employment of innovative agent technology, advanced machine learning techniques, and cognition-based interface technology for the use in collaborative decision support systems.
Abstract: Vigilance concerns the basic human capacity for information processing and is therefore essential to any form of humancognition. Both physical and mental effort are thought to affect vigilance. Mental effort is known for its vigilance declining effects, but the effects of physical effort are less clear. This study investigated whether these two forms of effort affect the EEG (Electro-EncephaloGram; measure of brain activity) and subjective alertness differently. Participants performed a physical task and were subsequently presented with a mental task, or vice versa. Mental effort decreased subjective alertness and increased theta power (i.e. waves with low frequency) in the EEG. Both results suggest a vigilance decline. Physical effort, however, increased subjective alertness and alpha and beta1 power in the EEG. These findings point towards an increase in vigilance. Beta2 power was reduced after physical effort, which may reflect a decrease in active cognitive processing. No transfer effects were found between the effort conditions, suggesting that the effects of mental and physical effort are distinct. It is concluded that mental effort decreases vigilance, whereas physical effort increases vigilance without improving subsequent task performance.
Abstract: The system being described in the paper presents a Web interface for a fully automatic audio-video human emotion recognition. The analysis is focused on the set of six basic emotions plus the neutral type. Different classifiers are involved in the process of face detection (AdaBoost), facial expression recognition (SVM and other models) and emotion recognition from speech (GentleBoost). The Active Appearance Model - AAM is used to get the information related to the shapes of the faces to be analyzed. The facial expression recognition is frame based and no temporal patterns of emotions are managed. The emotion recognition from movies is done separately on sound and video frames. The algorithm does not handle the dependencies between audio and video during the analysis. The methodologies for data processing are explained and specific performance measures for the emotion recognition are presented.
Abstract: The study of human facial expressions is one of the most challenging domains in pattern research community. Each facial expression is generated by non-rigid object deformations and these deformations are person-dependent. Automatic recognition of facial expressions is a process primarily based on analysis of permanent and transient features of the face, which can be only assessed with errors of some degree. The expression recognition model is oriented on the specification of Facial Action Coding System (FACS) of Ekman and Friesen [Ekman, Friesen 1978]. The hard constraints on the scene processing and recording conditions set a limited robustness to the analysis. In order to manage the uncertainties and lack of information, we set a probabilistic oriented framework up. The goal of the project was to design and implement a system for automatic recognition of human facial expression in video streams. The results of the project are of a great importance for a broad area of applications that relate to both research and applied topics.
Abstract: In this paper we discuss how the design of an Intelligent Companion constitutes a challenge and a test-bed for computer-based technologies aimed at improving the user's cognitive abilities. We conceive an Intelligent Companion to be an autonomous cognitive system (ACS) that should be capable of naturally interacting and communicating in real-world environments. It should do so by embodying (reinforcement) learning of physically grounded conceptualizations of multimodal perception, decision making, planning and actuation, with the aim of supporting humancognition in both an intelligent and intelligible way.
Abstract: The recognition of the internal emotional state of one person plays an important role in several human-related fields. Among them, human-computer interaction has recently received special attention. The current research is aimed at the analysis of segmentation methods and of the performance of
the GentleBoost classifier on emotion recognition from speech. The data set used for emotion analysis is Berlin - a database of German emotional speech. A second data set is DES – Danish Emotional Speech
data set is used for comparison purposes. Our contribution for the research community consists in a novel extensive study on the efficiency of using distinct numbers of frames per speech utterance for emotion recognition. Eventually, a set of GentleBoost 'committees' with optimal classification rates is determined based on an exhaustive study on the generated classifiers and on different types of segmentation.
Abstract: In the past a crisis event was notified by local witnesses that use to make phone calls to the special services. They reported by speech according to their observation on the crisis site. The recent improvements in the area of human computer interfaces make possible the development of context-aware systems for crisis management that support people in escaping a crisis even before external help is available at site. Apart from collecting the people's reports on the crisis, these systems are assumed to automatically extract useful clues during typical human computer interaction sessions. The novelty of the current research resides in the attempt to involve computer vision techniques for performing an automatic evaluation of facial expressions during human-computer interaction sessions with a crisis management system. The current paper details an approach for an automatic facial expression recognition module that may be included in crisis-oriented applications. The algorithm uses Active Appearance Model for facial shape extraction and SVM classifier for Action Units detection and facial expression recognition.