Abstract: In recent years, we have developed a framework of humancomputer interaction that offers recognition of various communication modalities including speech, lip movement, facialexpression, handwriting and drawing, body gesture, text and visual symbols. The framework allows the rapid construction of a multimodal, multi-devices, and multi-user communication system within crisis management. This paper reports the multimodal information presentation module combining language, speech, visual-language and graphics, which can be used in isolation, but also as part of the framework. It provides a communication channel between the system and users with different communication devices. The module is able to specify and produce context-sensitive and user-tailored output. By the employment of ontology, it receives the system�s view about the world and dialogue actions from a dialogue manager and generates appropriate multimodal responses.
Abstract: Our software demo package consists of an implementation for an automatic human emotion recognition system. The system is bi-modal and is based on fusing of data regarding facialexpressions and emotion that has been extracted from speech signal. We have integrated Viola&Jones face detector (OpenCV), Active Appearance Model AAM (AAM-API) for extracting the face shape and Support Vector Machines (LibSVM) for the classification of emotion patterns. We have used Optical Flow algorithm for computing the features needed for the classification of facialexpressions. Beside the integration of all processing components, the software system accommodates our implementation for the data fusion algorithm. Our C++ implementation has a working frame-rate of about 5fps.
Abstract: The current paper addresses the aspects related to the development of an automatic probabilistic recognition system for facialexpressions in video streams. The face analysis component integrates an eye tracking mechanism based on Kalman filter. The visual feature detection includes PCA oriented recognition for ranking the activity in certain facial areas. The description of the facialexpressions is given according to sets of atomic Action Units (AU) from the Facial Action Coding System (FACS). The base for the expressionrecognition engine is supported through a BBN model that also handles the time behavior of the visual features.
Abstract: Multimodal emotion recognition gets increasingly more attention from the scientific society. Fusing together information coming on different channels of communication, while taking into account the context seems the right thing to do. During social interaction the affective load of the interlocutors plays a major role. In the current paper we present a detailed analysis of the process of building an advanced multimodal data corpus for affective state recognition and related domains. This data corpus contains synchronized dual view acquired using high speed camera and high quality audio devices. We paid careful attention to the emotional content of the corpus in all aspects such as language content and facialexpressions. For recordings we implemented a TV prompter like software which controlled the recording devices and instructed the actors to assure the uniformity of the recordings. In this way we achieved a high quality controlled emotional data corpus.
Abstract: In recent years, we have developed a framework of human-computer interaction that offers recognition of various communication modalities including speech, lip movement, facialexpression, handwriting and drawing, body gesture, text and visual symbols. The framework allows the rapid construction of a multimodal, multi-devices, and multi-user communication system within crisis management. This paper reports the approaches used in multi-user information integration and multimodal presentation modules, which can be used in isolation, but also as part of the framework. The latter is able to specify and produce context-sensitive and user-tailored output combining language, speech, visual-language and graphics. These modules provide a communication channel between the system and users with different communication devices. By the employment of ontology, the system's view about the world is constructed from multi-user observations and appropriate multimodal responses are generated.
Abstract: For many decades automatic facialexpressionrecognition has scientifically been considered a real challenging problem in the fields of pattern recognition or robotic vision. The current research aims at proposing Relevance Vector Machines (RVM) as a novel classification technique for the recognition of facialexpressions in static images. The aspects related to the use of Support Vector Machines are also presented. The data for testing were selected from the Cohn-Kanade FacialExpression Database. We report 90.84% recognition rates for RVM for six universal expressions based on a range of experiments. Some discussions on the comparison of
different classification methods are included.
Abstract: Facial related analysis represented milestones in the fields of computer vision for many decades. Lots of methods have been designed and implemented so as to solve the specific requirements. In the current paper we present three different classification algorithms that we use to fulfill the tasks concerning face detection and facialexpressionrecognition.
One of the methods, Relevance Vector Machines (RVM) stands for a novel supervised learning technique that is based on a probabilistic approach of Support Vector Machines. The mathematical base of the models is presented. The data for testing were selected from the Cohn-Kanade FacialExpression Database. We report recognition rates for six universal expressions based on a range of experiments. Some discussions on the comparison of different classification methods are included.
Abstract: The system being described in the paper presents a Web interface for a fully automatic audio-video human emotion recognition. The analysis is focused on the set of six basic emotions plus the neutral type. Different classifiers are involved in the process of face detection (AdaBoost), facialexpressionrecognition (SVM and other models) and emotion recognition from speech (GentleBoost). The Active Appearance Model - AAM is used to get the information related to the shapes of the faces to be analyzed. The facialexpressionrecognition is frame based and no temporal patterns of emotions are managed. The emotion recognition from movies is done separately on sound and video frames. The algorithm does not handle the dependencies between audio and video during the analysis. The methodologies for data processing are explained and specific performance measures for the emotion recognition are presented.
Abstract: The study of human facialexpressions is one of the most challenging domains in pattern research community. Each facialexpression is generated by non-rigid object deformations and these deformations are person-dependent. Automatic recognition of facialexpressions is a process primarily based on analysis of permanent and transient features of the face, which can be only assessed with errors of some degree. The expressionrecognition model is oriented on the specification of Facial Action Coding System (FACS) of Ekman and Friesen [Ekman, Friesen 1978]. The hard constraints on the scene processing and recording conditions set a limited robustness to the analysis. In order to manage the uncertainties and lack of information, we set a probabilistic oriented framework up. The goal of the project was to design and implement a system for automatic recognition of human facialexpression in video streams. The results of the project are of a great importance for a broad area of applications that relate to both research and applied topics.
Abstract: The paper describes a novel technique for the recognition of emotions from multimodal data. We focus on the recognition of the six prototypic emotions. The results from the facialexpressionrecognition and from the emotion recognition from speech are combined using a bi-modal multimodal semantic data fusion model that determines the most probable emotion of the subject. Two types of models based on geometric face features for facialexpressionrecognition are being used, depending on the presence or absence of speech. In our approach we define an algorithm that is robust to changes of face shape that occur during regular speech. The influence of phoneme generation on the face shape during speech is removed by using features that are only related to the eyes and the eyebrows. The paper includes results from testing the presented models.
Abstract: In the past a crisis event was notified by local witnesses that use to make phone calls to the special services. They reported by speech according to their observation on the crisis site. The recent improvements in the area of human computer interfaces make possible the development of context-aware systems for crisis management that support people in escaping a crisis even before external help is available at site. Apart from collecting the people's reports on the crisis, these systems are assumed to automatically extract useful clues during typical human computer interaction sessions. The novelty of the current research resides in the attempt to involve computer vision techniques for performing an automatic evaluation of facialexpressions during human-computer interaction sessions with a crisis management system. The current paper details an approach for an automatic facialexpressionrecognition module that may be included in crisis-oriented applications. The algorithm uses Active Appearance Model for facial shape extraction and SVM classifier for Action Units detection and facialexpressionrecognition.
Abstract: At TUDelft there is a project aiming at the realization of a fully automatic emotion recognition system on the basis of facial analysis. The exploited approach splits the system into four components. Face detection, facial characteristic point extraction, tracking and classification. The focus in this paper will only be on the first two components. Face
detection is employed by boosting simple rectangle Haar-like features that give a decent representation of the face. These features also allow the differentiation between a face and a non-face. The boosting algorithm is combined with an
Evolutionary Search to speed up the overall search time. Facial characteristic points (FCP) are extracted from the detected faces. The same technique applied on faces is utilized for this purpose. Additionally, FCP extraction using corner detection methods and brightness distribution has also been considered. Finally, after retrieving the required FCPs the emotion of the facialexpression can be determined. The classification of the Haar-like features is done by the Relevance Vector Machine (RVM).