The information conveyed through facial expressions accounts for a large proportion of the total information and can effectively express people's intentions and emotions. Facial expression recognition has laid the foundation for fields such as human-computer interaction, facial emotion prediction, and artificial intelligence, and has become an important research object in computer vision. This article proposes a facial expression recognition method based on the MobileNetV3 network for face images from different angles. The method uses depth-wise separable convolution, introduces attention mechanism and new activation function to update blocks, and redesigns the time-consuming layer structure at the end. The dataset used in this article is the KDEF, which includes 4,900 color images with a size of 562*762 pixels. Through extensive experiments, it has been shown that the proposed structure in this article improves the accuracy of facial expression recognition from different angles compared to other network structures, achieving 94.7%, and has a smaller parameter count, which is beneficial for further research on facial expressions.
Lipreading aims to decode the speech content from a moving mouth. It is a very challenging task because lip appearance variations and speech contents are coupled together in the subtle movements of lip region. Especially in the speaker-independent recognition scenario, training and testing data are totally different in distribution due to the diverse speaker identities, making the learned model generalize poorly in the testing task. We propose a Siamese decoupling lipreading network (SDLipNet) to address this problem. Specially, we exploit an encoder–decoder framework to establish a collaborative representation of speaker identities and speech contents, and utilize the identity-specific information to regularize the content feature space. The identity features are derived from a Siamese identity encoder trained with paired visual speech data from different speakers. In addition, we align the content representation with a prior Gaussian distribution by imposing a Kullback–Leibler divergence constraint between the two outputs of the Siamese content encoder. In this way, the learned content feature space is supposed to be universal to the target speaker domain. Extensive experiments on two lipreading benchmarks demonstrate that our proposed SDLipNet can achieve better performance in the speaker-independent recognition task compared with the state-of-the-art lipreading methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.