Estimation of Face Orientation to support Multimodal Interaction in Smart Environments

AutorMarco Kunze

In this thesis a system for the estimation of the face orientation and position of one or multiple persons observed by multiple cameras has been developed. It allows to obtain the field of vision or focus of attention of the observed persons. The system consists of three modules. A face extraction unit has been realized, extracting faces from the visual data delivered by the cameras. It is based on a face detection algorithm and an algorithm for object tracking. A localization module determines the position of the persons using triangulation. If several persons are present in the scene, the detected faces are grouped according to the persons they belong to. This is done by minimizing the triangulation error. The estimation of face orientation is done without exploitation of the information about the users’ positions. The face poses are estimated separately for each face extracted from the camera images. This has been realized using a support vector regression which is able to estimate the face orientation based on low resolution face images. Therefore it is possible to estimate the face orientation of persons in large distances as they occur in housings or apartments. Afterwards the single estimations are merged according to the formerly obtained localization information to determine the face orientation of the users in relation to the world coordinate system. Together with the position of the users, they state the output of the system. The developed system will be integrated into a multimodal human computer interface within a smart environment in order to obtain the user’s object of interest according to his face orientation.