Scientific Publications

FaceXHuBERT (ICMI 2023)

Title: FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation Learning

Authors: Kazi Injamamul Haque and Zerrin Yumak

Abstract: This paper presents FaceXHuBERT, a text-less speech-driven 3D facial animation generation method that generates facial cues driven by an emotional expressiveness condition. In addition, it can handle audio recorded in a variety of situations (e.g. background noise, multiple people speaking). Recent approaches employ end-to-end deep learning taking into account both audio and text as input to generate 3D facial animation. However, scarcity of publicly available expressive audio-3D facial animation datasets poses a major bottleneck. The resulting animations still have issues regarding accurate lip-syncing, emotional expressivity, person-specific facial cues and generalizability. In this work, we first achieve better results than state-of-the-art on the speech-driven 3D facial animation generation task by effectively employing the self-supervised pretrained HuBERT speech model that allows to incorporate both lexical and non-lexical information in the audio without using a large lexicon. Second, we incorporate emotional expressiveness modality by guiding the network with a binary emotion condition. We carried out extensive objective and subjective evaluations in comparison to ground-truth and state-of-the-art. A perceptual user study demonstrates that expressively generated facial animations using our approach are indeed perceived more realistic and are preferred over the non-expressive ones. In addition, we show that having a strong audio encoder alone eliminates the need of a complex decoder for the network architecture, reducing the network complexity and training time significantly. We provide the code publicly and recommend watching the video.

FaceDiffuser (MIG 2023)

Title: FaceDiffuser: Speech-Driven 3D Facial Animation Using Diffusion

Authors: Stefan Stan, Kazi Injamamul Haque and Zerrin Yumak

Abstract: Speech-driven 3D facial animation synthesis has been a challenging task both in industry and research. Recent methods mostly focus on deterministic deep learning methods meaning that given a speech input, the output is always the same. However, in reality, the non-verbal facial cues that residethroughout the face are non-deterministic in nature. In addition, majority of the approaches focus on 3D vertex based datasets and methods that are compatible with existing facial animation pipelines with rigged characters is scarce. To eliminate these issues, we present FaceDiffuser, a non-deterministic deep learning model to generate speech-driven facial animations that is trained with both 3D vertex and blendshape based datasets. Our method is based on the diffusion technique and uses the pre-trained large speech representation model HuBERT to encode the audio input. To the best of our knowledge, we are the first to employ the diffusion method for the task of speech-driven 3D facial animation synthesis. We have run extensive objective and subjective analyses and show that our approach achieves better or comparable results in comparison to the state-of-the-art methods. We also introduce a new in-house dataset that is based on a blendshape based rigged character. We recommend watching the accompanying supplementary video.

Doctoral Consortium (SIGGRAPH Asia 2023)

Title: Data-Driven Expressive 3D Facial Animation Synthesis for Digital Humans

Authors: Kazi Injamamul Haque

Abstract: This doctoral research focuses generating expressive 3D facial animation for digital humans by studying and employing data-driven techniques. Faces are the first point of interest during human interaction, and it is not any different for interacting with digital humans. Even minor inconsistencies in facial animation can disrupt user immersion. Traditional animation workflows prove realistic but time-consuming and labor-intensive that cannot meet the ever-increasing demand for 3D contents in recent years. Moreover, recent data-driven approaches focus on speech-driven lip synchrony, leaving out facial expressiveness that resides throughout the face. To address the emerging demand and reduce production efforts, we explore data-driven deep learning techniques for generating controllable, emotionally expressive facial animation. We evaluate the proposed models against state-of-the-art methods and ground-truth, quantitatively, qualitatively, and perceptually. We also emphasize the need for non-deterministic approaches in addition to deterministic methods in order to ensure natural randomness in the non-verbal cues of facial animation.

Appearance and Animation Realism Study (IVA 2023)

Title: Effect of Appearance and Animation Realism on the Perception of Emotionally Expressive Virtual Humans

Authors: Nabila Amadou, Kazi Injamamul Haque and Zerrin Yumak

Abstract: 3D Virtual Human technology is growing with several potential applications in health, education, business and telecommunications. Investigating the perception of these virtual humans can help guide to develop better and more effective applications. Recent developments show that the appearance of the virtual humans reached to a very realistic level. However, there is not yet adequate analysis on the perception of appearance and animation realism for emotionally expressive virtual humans. In this paper, we designed a user experiment and analyzed the effect of a realistic virtual human's appearance realism and animation realism in varying emotion conditions. We found that higher appearance realism and higher animation realism leads to higher social presence and higher attractiveness ratings. We also found significant effects of animation realism on perceived realism and emotion intensity levels. Our study sheds light into how appearance and animation realism effects the perception of highly realistic virtual humans in emotionally expressive scenarios and points out to future directions.

Musicality (GALA 2019)

Title: Musicality: A Game to Improve Musical Perception

Authors: Nouri Khalass, Georgia Zarnomitrou, Kazi Injamamul Haque, Salim Salmi, Simon Maulini, Tanja Linkermann, Nestor Z. Salamon, J. Timothy Balint and Rafael Bidarra

Abstract: Musicality is the concept that refers to a person’s ability to perceive and reproduce music. Due to its complexity, it can be best defined by different aspects of music like pitch, harmony, etc. Scientists believe that musicality is not an inherent trait possessed only by musicians but something anyone can nurture and train in themselves. In this paper we present a new game, named Musicality, that aims at measuring and improving the musicality of any person with some interest in music. Our application offers users a fun, quick, interactive way to accomplish this goal at their own pace. Specifically, our game focuses on three of the most basic aspects of musicality: instrument recognition, tempo and tone. For each aspect we created different mini-games in order to make training a varied and attractive activity.