Abstract: The current audio-only speech recognition still lacks the expected robustness when the Signal to Noise Ratio (SNR) decreases. The video information is not affected by noise which makes it an ideal candidate for data fusion for speech recognition benefit. In the paper  the authors have shown that most of the techniques used for extraction of static visual features result in equivalent features or at least the most informative features exhibit this property. We argue that one of the main problems of existing methods is that the resulting features contain no information about the motion of the speaker's lips. Therefore, in this paper we will analyze the importance of motion detection for speech recognition. For this we will first present the Lip Geometry Estimation(LGE) method for static feature extraction. This method combines an appearance based approach with a statistical based approach for extracting the shape of the mouth. The method was introduced in  and explored in detail in . Further more, we introduce a second method based on a novel approach that captures the relevant motion information with respect to speech recognition by performing optical flow analysis on the contour of the speaker's mouth. For completion, a middle way approach is also analyzed. This third method considers recovering the motion information by computing the first derivatives of the static visual features. All methods were tested and compared on a continuous speech recognizer for Dutch. The evaluation of these methods is done under different noise conditions. We show that the audio-video recognition based on the true motion features, namely obtained by performing optical flow analysis, outperforms the other settings in low SNR conditions.
Abstract: The investigations presented in this thesis are part of the 'Integrated Collaborative Information Systems' (ICIS) project, focussing on the 'Enhanced Situation Awareness' (ESA). As a partner in ths project, we investigated the feasibility of using morphologicallt elaborate model neurons to enhance robustness and adaptivity in robotic systems.
Abstract: The study of human facial expressions is one of the most challenging domains in pattern research community. Each facial expression is generated by non-rigid object deformations and these deformations are person-dependent. Automatic recognition of facial expressions is a process primarily based on analysis of permanent and transient features of the face, which can be only assessed with errors of some degree. The expression recognition model is oriented on the specification of Facial Action Coding System (FACS) of Ekman and Friesen [Ekman, Friesen 1978]. The hard constraints on the scene processing and recording conditions set a limited robustness to the analysis. In order to manage the uncertainties and lack of information, we set a probabilistic oriented framework up. The goal of the project was to design and implement a system for automatic recognition of human facial expression in video streams. The results of the project are of a great importance for a broad area of applications that relate to both research and applied topics.
Abstract: This paper presents the improvement of the robustness and
accuracy of the weighted scan matching algorithm matching against the
union of earlier acquired scans. The approach allows to reduce the correspondence
error, which is explicitly modeled in the weighted scan matching
algorithm, by providing a more complete and denser frame of reference
to match new scans. By making use of the efficient quadtree data
structure, earlier acquired scans can be stored with millimeter accuracy
for environments with dimensions larger than 100x100 meter. This can
be realized with the preservation of real-time performance. In our experiments
we illustrate the significant gains in robustness and accuracy that
can be the result with this approach.