lipreading, audio-visual speech recognition, audio-visual data corpus, multimodal data corpus, high speed camera, dual view recording, frontal view, side view, active appearance models, optical flow and lipreading, emotional speech.
Abstract: Our software demo package consists of an implementation for an automatic human emotionrecognition system. The system is bi-modal and is based on fusing of data regarding facial expressions and emotion that has been extracted from speech signal. We have integrated Viola&Jones face detector (OpenCV), Active Appearance Model AAM (AAM-API) for extracting the face shape and Support Vector Machines (LibSVM) for the classification of emotion patterns. We have used Optical Flow algorithm for computing the features needed for the classification of facial expressions. Beside the integration of all processing components, the software system accommodates our implementation for the data fusion algorithm. Our C++ implementation has a working frame-rate of about 5fps.
Abstract: Multimodal emotionrecognition gets increasingly more attention from the scientific society. Fusing together information coming on different channels of communication, while taking into account the context seems the right thing to do. During social interaction the affective load of the interlocutors plays a major role. In the current paper we present a detailed analysis of the process of building an advanced multimodal data corpus for affective state recognition and related domains. This data corpus contains synchronized dual view acquired using high speed camera and high quality audio devices. We paid careful attention to the emotional content of the corpus in all aspects such as language content and facial expressions. For recordings we implemented a TV prompter like software which controlled the recording devices and instructed the actors to assure the uniformity of the recordings. In this way we achieved a high quality controlled emotional data corpus.
Abstract: The system being described in the paper presents a Web interface for a fully automatic audio-video human emotionrecognition. The analysis is focused on the set of six basic emotions plus the neutral type. Different classifiers are involved in the process of face detection (AdaBoost), facial expression recognition (SVM and other models) and emotionrecognition from speech (GentleBoost). The Active Appearance Model - AAM is used to get the information related to the shapes of the faces to be analyzed. The facial expression recognition is frame based and no temporal patterns of emotions are managed. The emotionrecognition from movies is done separately on sound and video frames. The algorithm does not handle the dependencies between audio and video during the analysis. The methodologies for data processing are explained and specific performance measures for the emotionrecognition are presented.
Abstract: The paper describes a novel technique for the recognition of emotions from multimodal data. We focus on the recognition of the six prototypic emotions. The results from the facial expression recognition and from the emotionrecognition from speech are combined using a bi-modal multimodal semantic data fusion model that determines the most probable emotion of the subject. Two types of models based on geometric face features for facial expression recognition are being used, depending on the presence or absence of speech. In our approach we define an algorithm that is robust to changes of face shape that occur during regular speech. The influence of phoneme generation on the face shape during speech is removed by using features that are only related to the eyes and the eyebrows. The paper includes results from testing the presented models.
Abstract: The recognition of the internal emotional state of one person plays an important role in several human-related fields. Among them, human-computer interaction has recently received special attention. The current research is aimed at the analysis of segmentation methods and of the performance of
the GentleBoost classifier on emotionrecognition from speech. The data set used for emotion analysis is Berlin - a database of German emotional speech. A second data set is DES – Danish Emotional Speech
data set is used for comparison purposes. Our contribution for the research community consists in a novel extensive study on the efficiency of using distinct numbers of frames per speech utterance for emotionrecognition. Eventually, a set of GentleBoost 'committees' with optimal classification rates is determined based on an exhaustive study on the generated classifiers and on different types of segmentation.
Abstract: At TUDelft there is a project aiming at the realization of a fully automatic emotionrecognition system on the basis of facial analysis. The exploited approach splits the system into four components. Face detection, facial characteristic point extraction, tracking and classification. The focus in this paper will only be on the first two components. Face
detection is employed by boosting simple rectangle Haar-like features that give a decent representation of the face. These features also allow the differentiation between a face and a non-face. The boosting algorithm is combined with an
Evolutionary Search to speed up the overall search time. Facial characteristic points (FCP) are extracted from the detected faces. The same technique applied on faces is utilized for this purpose. Additionally, FCP extraction using corner detection methods and brightness distribution has also been considered. Finally, after retrieving the required FCPs the emotion of the facial expression can be determined. The classification of the Haar-like features is done by the Relevance Vector Machine (RVM).